pt/pf_app

Paul Trowbridge 11f5b02fc4 Spec: add OR filter groups, raw_where escape hatch, and Arrow IPC streaming for large datasets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-27 22:50:02 -04:00

39 KiB

Raw Blame History

Pivot Forecast — Application Spec

Overview

A web application for building named forecast scenarios against any PostgreSQL table. The core workflow is: load known historical actuals as a baseline, shift those dates forward by a specified interval into the forecast period to establish a no-change starting point, then apply incremental adjustments (scale, recode, clone) to build the plan. An admin configures a source table, generates a baseline, and opens it for users to make adjustments. Users interact with a pivot table to select slices of data and apply forecast operations. All changes are incremental (append-only), fully audited, and reversible.

Tech Stack

Backend: Node.js / Express
Database: PostgreSQL — isolated pf schema, installs into any existing DB
Frontend: React + Vite + Tailwind CSS; Perspective (forecast pivot)
Pattern: Follows fc_webapp (shell) + pivot_forecast (operations)

Database Schema: `pf`

Everything lives in the pf schema. Install via sequential SQL scripts.

`pf.source`

Registered source tables available for forecasting.

CREATE TABLE pf.source (
    id          serial PRIMARY KEY,
    schema      text NOT NULL,
    tname       text NOT NULL,
    label       text,                   -- friendly display name
    status      text DEFAULT 'active',  -- active | archived
    created_at  timestamptz DEFAULT now(),
    created_by  text,
    UNIQUE (schema, tname)
);

`pf.col_meta`

Column configuration for each registered source table. Determines how the app treats each column.

CREATE TABLE pf.col_meta (
    id          serial PRIMARY KEY,
    source_id   integer REFERENCES pf.source(id),
    cname       text NOT NULL,          -- column name in source table
    label       text,                   -- friendly display name
    role        text NOT NULL,          -- 'dimension' | 'value' | 'units' | 'date' | 'ignore'
    is_key      boolean DEFAULT false,  -- true = part of natural key (used in WHERE slice)
    opos        integer,                -- ordinal position (for ordering)
    UNIQUE (source_id, cname)
);

Roles:

dimension — categorical field (customer, part, channel, rep, geography, etc.) — appears as pivot rows/cols, used in WHERE filters for operations
value — the money/revenue field to scale
units — the quantity field to scale
date — the primary date field; used for baseline/reference date range and stored in the forecast table
filter — columns available as filter conditions in the Baseline Workbench (e.g. order status, ship date, open flag); used in baseline WHERE clauses but not stored in the forecast table
ignore — exclude from forecast table entirely

`pf.version`

Named forecast scenarios. One forecast table (pf.fc_{tname}_{version_id}) is created per version.

CREATE TABLE pf.version (
    id              serial PRIMARY KEY,
    source_id       integer REFERENCES pf.source(id),
    name            text NOT NULL,
    description     text,
    status          text DEFAULT 'open',        -- open | closed
    exclude_iters   jsonb DEFAULT '["reference"]', -- iter values excluded from all operations
    created_at      timestamptz DEFAULT now(),
    created_by      text,
    closed_at       timestamptz,
    closed_by       text,
    UNIQUE (source_id, name)
);

exclude_iters: jsonb array of iter values that are excluded from operation WHERE clauses. Defaults to ["reference"]. Reference rows are still returned by get_data (visible in pivot) but are never touched by scale/recode/clone. Additional iters can be added to lock them from further adjustment.

Forecast table naming: pf.fc_{tname}_{version_id} — e.g., pf.fc_sales_3. One table per version, physically isolated. Contains both operational rows and reference rows.

Creating a version → CREATE TABLE pf.fc_{tname}_{version_id} (...) Deleting a version → DROP TABLE pf.fc_{tname}_{version_id} + delete from pf.version + delete from pf.log

`pf.log`

Audit log. Every write operation gets one entry here.

CREATE TABLE pf.log (
    id          bigserial PRIMARY KEY,
    version_id  integer REFERENCES pf.version(id),
    pf_user     text NOT NULL,
    stamp       timestamptz DEFAULT now(),
    operation   text NOT NULL,  -- 'baseline' | 'reference' | 'scale' | 'recode' | 'clone'
    slice       jsonb,          -- the WHERE conditions that defined the selection
    params      jsonb,          -- operation parameters (increments, new values, scale factor, etc.)
    note        text            -- user-provided comment
);

`pf.fc_{tname}_{version_id}` (dynamic, one per version)

Created when a version is created. Mirrors source table dimension/value/units/date columns plus forecast metadata. Contains both operational rows (iter = 'baseline' | 'scale' | 'recode' | 'clone') and reference rows (iter = 'reference').

-- Example: source table "sales", version id 3 → pf.fc_sales_3
CREATE TABLE pf.fc_sales_3 (
    id          bigserial PRIMARY KEY,

    -- mirrored from source (role = dimension | value | units | date only):
    customer    text,
    channel     text,
    part        text,
    geography   text,
    order_date  date,
    units       numeric,
    value       numeric,

    -- forecast metadata:
    iter        text,       -- 'baseline' | 'reference' | 'scale' | 'recode' | 'clone'
    logid       bigint REFERENCES pf.log(id),
    pf_user     text,
    created_at  timestamptz DEFAULT now()
);

Note: no version_id column on the forecast table — it's implied by the table itself.

`pf.sql`

Generated SQL stored per source and operation. Built once when col_meta is finalized, fetched at request time.

CREATE TABLE pf.sql (
    id           serial PRIMARY KEY,
    source_id    integer REFERENCES pf.source(id),
    operation    text NOT NULL,  -- 'baseline' | 'reference' | 'scale' | 'recode' | 'clone' | 'get_data' | 'undo'
    sql          text NOT NULL,
    generated_at timestamptz DEFAULT now(),
    UNIQUE (source_id, operation)
);

Column names are baked in at generation time. Runtime substitution tokens:

Token	Resolved from
`{{fc_table}}`	`pf.fc_{tname}_{version_id}` — derived at request time
`{{where_clause}}`	built from `slice` JSON by `build_where()` in JS
`{{exclude_clause}}`	built from `version.exclude_iters` — e.g. `AND iter NOT IN ('reference')`
`{{logid}}`	newly inserted `pf.log` id
`{{pf_user}}`	from request body
`{{date_from}}` / `{{date_to}}`	baseline/reference date range (source period)
`{{date_offset}}`	PostgreSQL interval string to shift dates into the forecast period — e.g. `1 year`, `6 months`, `2 years 3 months` (baseline only; empty string = no shift)
`{{value_incr}}` / `{{units_incr}}`	scale operation increments
`{{pct}}`	scale mode: absolute or percentage
`{{set_clause}}`	recode/clone dimension overrides
`{{scale_factor}}`	clone multiplier

Request-time flow:

Fetch SQL from pf.sql for source_id + operation
Fetch version.exclude_iters, build {{exclude_clause}}
Build {{where_clause}} from slice JSON via build_where()
Substitute all tokens
Execute — single round trip

WHERE clause safety: build_where() validates every key in the slice against col_meta (only role = 'dimension' columns are permitted). Values are sanitized (escaped single quotes). No parameterization — consistent with existing projects, debuggable in Postgres logs.

Setup / Install Scripts

setup_sql/
  01_schema.sql   -- CREATE SCHEMA pf; create all metadata tables (source, col_meta, version, log, sql)

Source registration, col_meta configuration, SQL generation, version creation, and forecast table DDL all happen via API.

API Routes

DB Browser

Method	Route	Description
GET	`/api/tables`	List all tables in the DB with row counts
GET	`/api/tables/:schema/:tname/preview`	Preview columns + sample rows

Source Management

Method	Route	Description
GET	`/api/sources`	List registered sources
POST	`/api/sources`	Register a source table
GET	`/api/sources/:id/cols`	Get col_meta for a source
PUT	`/api/sources/:id/cols`	Save col_meta configuration
POST	`/api/sources/:id/generate-sql`	Generate/regenerate all operation SQL into `pf.sql`
GET	`/api/sources/:id/sql`	View generated SQL for a source (inspection/debug)
DELETE	`/api/sources/:id`	Deregister a source (does not affect existing forecast tables)

Forecast Versions

Method	Route	Description
GET	`/api/sources/:id/versions`	List versions for a source
POST	`/api/sources/:id/versions`	Create a new version (CREATE TABLE for forecast table)
PUT	`/api/versions/:id`	Update version (name, description, exclude_iters)
POST	`/api/versions/:id/close`	Close a version (blocks further edits)
POST	`/api/versions/:id/reopen`	Reopen a closed version
DELETE	`/api/versions/:id`	Delete a version (DROP TABLE + delete log entries)

Baseline & Reference Data

Method	Route	Description
POST	`/api/versions/:id/baseline`	Load one baseline segment (additive — does not clear existing baseline rows)
DELETE	`/api/versions/:id/baseline`	Clear all baseline rows and baseline log entries for this version
POST	`/api/versions/:id/reference`	Load reference rows from source table for a date range (additive)

Baseline load request body:

{
  "date_offset":  "1 year",
  "filters": [
    [
      { "col": "order_date",   "op": "BETWEEN", "values": ["2024-01-01", "2024-12-31"] },
      { "col": "order_status", "op": "IN",      "values": ["OPEN", "PENDING"] }
    ],
    [
      { "col": "order_status", "op": "IS NULL" }
    ]
  ],
  "pf_user":  "admin",
  "note":     "FY2024 actuals + open orders projected to FY2025",
  "replay":   false
}

The example above generates: (order_date BETWEEN '2024-01-01' AND '2024-12-31' AND order_status IN ('OPEN','PENDING')) OR (order_status IS NULL)

date_offset — PostgreSQL interval string applied to the primary role = 'date' column at insert time. Examples: "1 year", "6 months", "2 years 3 months". Defaults to "0 days". Applied to the stored date value only — filter columns are never shifted.
filters — an array of groups. Conditions within a group are AND-ed; groups are OR-ed together. Each group is an array of one or more condition objects:
- col — must be role = 'date' or role = 'filter' in col_meta
- op — one of =, !=, IN, NOT IN, BETWEEN, IS NULL, IS NOT NULL
- values — array of strings; two elements for BETWEEN; multiple for IN/NOT IN; omitted for IS NULL/IS NOT NULL
- Backward compatibility: a flat array of condition objects (non-nested) is treated as a single group (all AND).
At least one group with at least one condition is required.
raw_where — optional string. When present, bypasses filters entirely and injects the value verbatim as the WHERE clause body. Admin-only — rejected with 403 if the requesting pf_user is not in the admin list. Not validated against col_meta. Caller is responsible for correctness and SQL safety. Stored as-is in pf.log.params for audit. Cannot be combined with filters — if both are present the request is rejected with 400.
Baseline loads are additive — existing iter = 'baseline' rows are not touched. Each load is its own log entry and is independently undoable.

replay controls behavior when incremental rows exist (applies to Clear + reload, not individual segments):

replay: false (default) — after clearing, re-load baseline segments, leave incremental rows untouched
replay: true — after clearing, re-load baseline, then re-execute each incremental log entry in chronological order

v1 note: replay: true returns 501 Not Implemented until the replay engine is built.

Clear baseline (DELETE /api/versions/:id/baseline) — deletes all rows where iter = 'baseline' and all operation = 'baseline' log entries. Irreversible (no undo). Returns { rows_deleted, log_entries_deleted }.

Reference request body: same shape as baseline load without replay. Reference dates land verbatim (no offset). Additive — multiple reference loads stack independently, each undoable by logid.

Forecast Data

Method	Route	Description
GET	`/api/versions/:id/data`	Stream all rows for this version as an Arrow IPC binary

Transport format — Apache Arrow IPC stream

The endpoint returns Content-Type: application/vnd.apache.arrow.stream (binary). JSON is not used for this route. The client fetches the response as arrayBuffer() and passes it directly to worker.table(buffer) — Perspective's native ingestion path with no JS deserialization overhead.

Arrow's columnar layout with dictionary encoding on string dimension columns keeps payload size manageable at scale (typically 50–150 MB for 1M rows depending on string cardinality), compared to several times that for equivalent JSON.

Server-side streaming (cursor-based)

For datasets that may reach 1M+ rows, the server must not buffer the full query result in memory before writing the response. Instead:

Open a PostgreSQL cursor over the SELECT * FROM {{fc_table}} query
Fetch rows in batches (target: 10 000 rows per batch)
For each batch, append a serialized Arrow record batch to the HTTP response using chunked transfer encoding
Close the cursor and end the response when all batches are written

This means the first bytes of the Arrow stream reach the client while the server is still reading from the database, and Node.js heap stays bounded regardless of dataset size.

Client-side loading

Moderate datasets (< ~500k rows): accumulate the full arrayBuffer() then call worker.table(buffer) once. Perspective becomes interactive after the stream completes.
Large datasets (≥ ~500k rows): process Arrow record batches incrementally — call worker.table(firstBatch) to create the table, then pspTable.update(batch) for each subsequent batch. Perspective is interactive and browseable while remaining batches are still arriving.

The client detects which path to use by checking the X-Row-Count response header (see below).

Row-count pre-check

Before opening the cursor, the server runs SELECT COUNT(*) FROM {{fc_table}}. The result is attached as the X-Row-Count response header so the client can choose its loading strategy. If the count exceeds 500 000, the UI displays a non-blocking notice ("Loading large dataset — pivot will become interactive as data arrives") rather than a blank screen.

Forecast Operations

All operations share a common request envelope:

{
  "pf_user": "paul.trowbridge",
  "note":    "optional comment",
  "slice": {
    "channel":   "WHS",
    "geography": "WEST"
  }
}

slice keys must be role = 'dimension' columns per col_meta. Stored in pf.log as the implicit link to affected rows.

Scale

POST /api/versions/:id/scale

{
  "pf_user":    "paul.trowbridge",
  "note":       "10% volume lift Q3 West",
  "slice":      { "channel": "WHS", "geography": "WEST" },
  "value_incr": null,
  "units_incr": 5000,
  "pct":        false
}

value_incr / units_incr — absolute amounts to add (positive or negative). Either can be null.
pct: true — treat as percentage of current slice total instead of absolute
Excludes exclude_iters rows from the source selection
Distributes increment proportionally across rows in the slice
Inserts rows tagged iter = 'scale'

Recode

POST /api/versions/:id/recode

{
  "pf_user": "paul.trowbridge",
  "note":    "Part discontinued, replaced by new SKU",
  "slice":   { "part": "OLD-SKU-001" },
  "set":     { "part": "NEW-SKU-002" }
}

set — one or more dimension fields to replace (can swap multiple at once)
Inserts negative rows to zero out the original slice
Inserts positive rows with replaced dimension values
Both sets of rows share the same logid — undone together
Inserts rows tagged iter = 'recode'

Clone

POST /api/versions/:id/clone

{
  "pf_user": "paul.trowbridge",
  "note":    "New customer win, similar profile to existing",
  "slice":   { "customer": "EXISTING CO", "channel": "DIR" },
  "set":     { "customer": "NEW CO" },
  "scale":   0.75
}

set — dimension values to override on cloned rows
scale — optional multiplier on value/units (default 1.0)
Does not offset original slice
Inserts rows tagged iter = 'clone'

Audit & Undo

Method	Route	Description
GET	`/api/versions/:id/log`	List all log entries for a version, newest first
DELETE	`/api/log/:logid`	Undo: delete all forecast rows with this logid, then delete log entry

Frontend (Web UI)

Three-step collapsible sidebar (200 px expanded / 48 px collapsed, state persisted to localStorage):

① Setup — browse DB tables, register sources, configure col_meta, generate SQL. One-time admin task.
② Baseline — create/manage versions, load baseline segments, timeline preview. One-time per version.
③ Forecast — main working view: Perspective pivot + operation panel. Primary ongoing use.

Setup View (① Setup)

Left panel: DB table browser — all tables with row counts; click a table to open a preview modal (column list + sample rows)
Right panel: Registered sources list; click a source to open col_meta editor below
Col_meta editor: inline table — role dropdown per column, is_key checkbox, label text input, ordinal position
"Save" button — upserts col_meta; "Generate SQL" button — triggers generate-sql route, shows confirmation
"Register source" button available in the table preview modal
New columns default to role dimension on registration
Must generate SQL before a version can be created against this source

Baseline View (② Baseline)

Source and version selectors at top. Version management inline: create new version (explains that a forecast table will be created), Close / Reopen / Delete buttons. Delete drops the forecast table and removes all version records.

Baseline Workbench

A dedicated view for constructing the baseline for the selected version. The baseline is built from one or more segments — each segment is an independent query against the source table that appends rows to iter = 'baseline'. Segments are additive; clearing is explicit.

Layout:

┌─────────────────────────────────────────────────────────────┐
│  Baseline — [Version name]              [Clear Baseline]     │
├─────────────────────────────────────────────────────────────┤
│  Segments loaded (from log):                                 │
│  ┌──────┬────────────────┬──────────┬───────┬──────────┐    │
│  │  ID  │  Description   │  Rows    │  By   │  [Undo]  │    │
│  └──────┴────────────────┴──────────┴───────┴──────────┘    │
├─────────────────────────────────────────────────────────────┤
│  Add Segment                                                 │
│                                                              │
│  Description  [_______________________________________]      │
│                                                              │
│  Date range   [date_from] to [date_to]  on [date col ▾]     │
│  Date offset  [0] years  [0] months                         │
│                                                              │
│  Additional filters:                                         │
│  [ + Add filter ]                                            │
│  ┌──────────────────┬──────────┬──────────────┬───────┐     │
│  │  Column          │  Op      │  Value(s)    │  [ x ]│     │
│  └──────────────────┴──────────┴──────────────┴───────┘     │
│                                                              │
│  Preview: [projected month chips]                            │
│                                                              │
│  Note  [___________]          [Load Segment]                 │
└─────────────────────────────────────────────────────────────┘

Segments list — shows all operation = 'baseline' log entries for this version, newest first. Each has an Undo button. Undo removes only that segment's rows (by logid), leaving other segments intact.

Clear Baseline — deletes ALL iter = 'baseline' rows and all operation = 'baseline' log entries for this version. Prompts for confirmation. Used when starting over from scratch.

Add Segment form:

Description — free text label stored as the log note, shown in the segments list
Date offset — years + months spinners; shifts the primary role = 'date' column forward on insert
Filters — one or more filter groups that define what rows to pull. Conditions within a group are AND-ed; groups are OR-ed. There is no separate "date range" section — period selection is just a filter like any other:
- Each group has a header row ("Group 1", "Group 2 — OR", …) and a + Add condition link
- Within a group: Column (any role = 'date' or role = 'filter'), Operator (=, !=, IN, NOT IN, BETWEEN, IS NULL, IS NOT NULL), Value(s)
- Value inputs: BETWEEN → two date/text inputs; IN/NOT IN → comma-separated list; =/!= → single input; omitted for IS NULL/IS NOT NULL
- + Add OR group button appends a new empty group below, joined by an "OR" separator label
- Groups with more than one condition render an "AND" badge between rows to make the logic explicit
- A group can be removed with × on its header (not available when only one group remains)
- At least one group with at least one condition is required to load a segment
Manual WHERE clause (admin only) — a toggle link ("Switch to manual SQL") that replaces the filter builder with a plain textarea. The admin types a raw PostgreSQL WHERE clause body (no WHERE keyword). Switching back to the builder clears the textarea. When active, the filter builder is hidden and the structured filters field is not sent; raw_where is sent instead. A prominent warning banner reads: "Raw SQL is not validated. You are responsible for correctness and security."
Timeline preview — rendered when any condition in any group is a BETWEEN or = on a role = 'date' column. Shows a horizontal bar (number-line style) for the source period and, if offset > 0, a second bar below for the projected period. Each bar shows start date on the left, end date on the right, duration in the centre. The two bars share the same visual width so the shift is immediately apparent. Not shown in manual WHERE mode or when no date condition is present.
Note — optional free text
Load Segment — submits; appends rows, does not clear existing baseline rows

Example — three-segment baseline:

#	Description	Filter logic
1	All orders taken 6/1/25–3/31/26	`order_date BETWEEN 2025-06-01 AND 2026-03-31`
2	Open or unshipped orders (status missing or explicit)	`(status IN ('OPEN','PENDING')) OR (status IS NULL)`
3	Prior year book-and-ship 4/1/25–5/31/25	`order_date BETWEEN 2025-04-01 AND 2025-05-31 AND ship_date BETWEEN 2025-04-01 AND 2025-05-31`

Segment 2 uses two OR groups; segment 3 has two AND conditions in one group. Any combination is valid as long as at least one group with at least one condition is present.

Forecast View

Layout:

┌─────────────────────────────────────────────────────────────────┐
│  [Version label]  [Refresh]  [Save layout]  [Reset layout]       │
├──────────────────────────────────────┬──────────────────────────┤
│                                      │                           │
│  Perspective Viewer                  │  Operation Panel          │
│  (interactive pivot web component)   │  (active when slice set)  │
│                                      │                           │
│                                      │  Slice:                   │
│                                      │    channel = WHS          │
│                                      │    geography = WEST       │
│                                      │                           │
│                                      │  [ Scale ] [ Recode ]     │
│                                      │  [ Clone ]                │
│                                      │                           │
│                                      │  ... operation form ...   │
│                                      │                           │
│                                      │  [ Submit ]               │
│                                      │                           │
└──────────────────────────────────────┴──────────────────────────┘

Pivot control: Perspective 4.4.0, loaded from CDN at runtime. Data is fetched from GET /api/versions/:id/data as an Arrow IPC binary stream and loaded into an in-browser Perspective worker — Perspective's native ingestion path. Supports grouping, splitting, filtering, sorting, and charting interactively. Layout (group_by, split_by, filters, plugin) is saved per version to localStorage via Save layout / Reset layout buttons.

Large-dataset loading sequence:

Client issues GET /api/versions/:id/data
Server responds with X-Row-Count header and begins streaming Arrow record batches
If X-Row-Count ≥ 500 000, UI shows a non-blocking loading banner; otherwise no indicator
Client calls worker.table(firstBatch) on the first batch to make the pivot interactive immediately
Each subsequent batch is applied with pspTable.update(batch) as it arrives
Banner clears when the stream closes

Interaction flow:

Click a cell or row in the pivot — the perspective-click event fires
detail.config.filter from the event is parsed: only == filters on role = dimension columns are extracted as the slice
Slice populates the Operation Panel — pick operation tab, fill in parameters
Submit → POST to API → new rows returned via RETURNING * are streamed directly into the Perspective table (pspTable.update(rows)) — no full reload needed
For recode, both the negative offset rows and positive replacement rows are returned and streamed

Pivot default layout: built from col_meta — first two dimension columns as group_by, date column as split_by. User can rearrange in Perspective settings panel and save.

Reference rows (pf_iter = 'reference') are visible in the pivot for comparison context. Operations never affect them (enforced by exclude_iters in the version).

Log View

AG Grid list of log entries — user, timestamp, operation, slice, note, rows affected. "Undo" button per row → DELETE /api/log/:logid → grid and pivot refresh (full reload of Perspective table).

Forecast SQL Patterns

Column names baked in at generation time. Tokens substituted at request time.

Baseline Load (one segment)

WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'baseline', NULL, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
INSERT INTO {{fc_table}} (
    {dimension_cols}, {value_col}, {units_col}, {date_col},
    iter, logid, pf_user, created_at
)
SELECT
    {dimension_cols}, {value_col}, {units_col},
    ({date_col} + '{{date_offset}}'::interval)::date,
    'baseline', (SELECT id FROM ilog), '{{pf_user}}', now()
FROM
    {schema}.{tname}
WHERE
    {{filter_clause}}

Baseline loads are additive — no DELETE before INSERT. Each segment appends independently.

Token details:

{{date_offset}} — PostgreSQL interval string (e.g. 1 year); defaults to 0 days; applied only to the primary role = 'date' column on insert
{{filter_clause}} — built from filters or raw_where at request time (not baked into stored SQL since conditions vary per segment).
- Structured path (filters): each group becomes a parenthesized AND block; groups are joined with OR. Every column is validated against col_meta (role = 'date' or role = 'filter'). Values are escaped (single quotes doubled). Supported operators: =, !=, IN, NOT IN, BETWEEN, IS NULL, IS NOT NULL.
- Raw path (raw_where): the string is injected verbatim. No col_meta validation. Admin-only.

Clear Baseline

Two queries, run in a transaction:

DELETE FROM {{fc_table}} WHERE iter = 'baseline';
DELETE FROM pf.log WHERE version_id = {{version_id}} AND operation = 'baseline';

Reference Load

WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'reference', NULL, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
INSERT INTO {{fc_table}} (
    {dimension_cols}, {value_col}, {units_col}, {date_col},
    iter, logid, pf_user, created_at
)
SELECT
    {dimension_cols}, {value_col}, {units_col}, {date_col},
    'reference', (SELECT id FROM ilog), '{{pf_user}}', now()
FROM
    {schema}.{tname}
WHERE
    {date_col} BETWEEN '{{date_from}}' AND '{{date_to}}'

No date offset — reference rows land at their original dates for prior-period comparison.

Scale

WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'scale', '{{slice}}'::jsonb, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
,base AS (
    SELECT
        {dimension_cols}, {date_col},
        {value_col}, {units_col},
        sum({value_col}) OVER () AS total_value,
        sum({units_col}) OVER () AS total_units
    FROM {{fc_table}}
    WHERE {{where_clause}}
    {{exclude_clause}}
)
INSERT INTO {{fc_table}} (
    {dimension_cols}, {date_col}, {value_col}, {units_col},
    iter, logid, pf_user, created_at
)
SELECT
    {dimension_cols}, {date_col},
    round(({value_col} / NULLIF(total_value, 0)) * {{value_incr}}, 2),
    round(({units_col} / NULLIF(total_units, 0)) * {{units_incr}}, 5),
    'scale', (SELECT id FROM ilog), '{{pf_user}}', now()
FROM base

{{value_incr}} / {{units_incr}} are pre-computed in JS when pct: true (multiply slice total by pct).

Recode

WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'recode', '{{slice}}'::jsonb, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
,src AS (
    SELECT {dimension_cols}, {date_col}, {value_col}, {units_col}
    FROM {{fc_table}}
    WHERE {{where_clause}}
    {{exclude_clause}}
)
,negatives AS (
    INSERT INTO {{fc_table}} ({dimension_cols}, {date_col}, {value_col}, {units_col}, iter, logid, pf_user, created_at)
    SELECT {dimension_cols}, {date_col}, -{value_col}, -{units_col}, 'recode', (SELECT id FROM ilog), '{{pf_user}}', now()
    FROM src
)
INSERT INTO {{fc_table}} ({dimension_cols}, {date_col}, {value_col}, {units_col}, iter, logid, pf_user, created_at)
SELECT {{set_clause}}, {date_col}, {value_col}, {units_col}, 'recode', (SELECT id FROM ilog), '{{pf_user}}', now()
FROM src

{{set_clause}} replaces the listed dimension columns with new values, passes others through unchanged.

Clone

WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'clone', '{{slice}}'::jsonb, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
INSERT INTO {{fc_table}} ({dimension_cols}, {date_col}, {value_col}, {units_col}, iter, logid, pf_user, created_at)
SELECT
    {{set_clause}}, {date_col},
    round({value_col} * {{scale_factor}}, 2),
    round({units_col} * {{scale_factor}}, 5),
    'clone', (SELECT id FROM ilog), '{{pf_user}}', now()
FROM {{fc_table}}
WHERE {{where_clause}}
{{exclude_clause}}

Undo

DELETE FROM {{fc_table}} WHERE logid = {{logid}};
DELETE FROM pf.log WHERE id = {{logid}};

Admin Setup Flow (end-to-end)

Open Sources view → browse DB tables → register source table
Open col_meta editor → assign roles to columns (dimension, value, units, date, filter, ignore), mark is_key dimensions, set labels
Click Generate SQL → app writes operation SQL to pf.sql
Open Versions view → create a named version (sets exclude_iters, creates forecast table)
Open Baseline Workbench → build the baseline from one or more segments:
- Each segment specifies a date range (on any date/filter column), date offset, and optional additional filter conditions
- Add segments until the baseline is complete; each is independently undoable
- Use "Clear Baseline" to start over if needed
Optionally load Reference → pick prior period date range → inserts iter = 'reference' rows at their original dates (for comparison in the pivot)
Open Forecast view → share with users

User Forecast Flow (end-to-end)

Open Forecast view → select version
Pivot loads — explore data, identify slice to adjust
Select cells → Operation Panel populates with slice
Choose operation → fill in parameters → Submit
Grid refreshes — adjustment visible immediately
Repeat as needed
Admin closes version when forecasting is complete

Open Questions / Future Scope

Baseline replay — re-execute change log against a restated baseline (replay: true); v1 returns 501
Arrow IPC for initial data load — at large row counts (1M+) the /versions/:id/data JSON response becomes a bottleneck. Option: serve Arrow IPC binary instead of JSON; Perspective's worker.table() accepts Arrow buffers natively. Incremental operation rows (scale/recode/clone) can stay as JSON fed to table.update() since they're always small. Could be implemented with pg + apache-arrow in Node, or by adding a server-side DuckDB instance (Postgres scanner → Arrow IPC) if a caching layer is also needed.
Approval workflow — user submits, admin approves before changes are visible to others (deferred)
Territory filtering — restrict what a user can see/edit by dimension value (deferred)
Export — download forecast as CSV or push results to a reporting table
Version comparison — side-by-side view of two versions (facilitated by isolated tables via UNION)
Col meta / version schema drift — if col_meta roles are changed after a version's forecast table is already created, the generated SQL and the table DDL go out of sync (e.g. a column added to SQL that doesn't exist in the table). UI should detect this: compare col_meta against the forecast table's actual columns via information_schema, warn the user, and offer to rebuild the version (drop + recreate table, preserving the version record and log). For now the workaround is to delete and recreate the version manually.
Multi-connection support — currently one DB via .env. Full vision: pf.connection table (host, port, dbname, user, password as env-var ref), connection_id on pf.source, per-connection pg pools at runtime. pf schema stays on a "home" connection; source data can live anywhere. Connections UI in Setup. Safe to defer while in dev — requires clean reinstall when added since it changes the source schema.

Project Status — 2026-04-25

What's working

Full backend: source registration, col_meta, SQL generation, versions, baseline segments, reference load, scale, recode, clone, undo
React + Vite + Tailwind CSS frontend scaffolded in ui/, built output to public/app/, served by Express
3-step collapsible sidebar (Setup / Baseline / Forecast) — addresses prior UX concern about opaque 5-tab nav
Setup view: DB table browser with preview modal, source registration, col_meta editor, SQL generation
Baseline view: version management (create/close/reopen/delete), multi-segment baseline workbench, canvas timeline, filter builder
Perspective pivot in Forecast view: loads all version rows, interactive group/split/filter/chart, layout saved per version
Slice extraction from perspective-click event feeds operation panel directly
Incremental row streaming: operation results (RETURNING *) stream into Perspective table without full reload
Status bar: shows current source · version · baseline row count · status

Known issues / next focus

Forecast view — operation panel (Scale / Recode / Clone) is a stub; needs wiring to API
Status bar — currently hardcoded; needs to reflect actual selected source/version from state
Col_meta / version schema drift — if col_meta changes after a version's forecast table is created, the SQL and table DDL go out of sync. UI should detect this (compare col_meta against information_schema), warn, and offer rebuild. Workaround: delete and recreate the version.
No "current version" persistence — source/version selection resets on page reload; session context not persisted
Perspective slice limitation — computed date columns (Month, YearDate) extracted via split_by don't filter back to raw rows; only native dimension columns work for slice extraction

Branch status

baseline-workbench — merged to origin, stable
perspective-forecast — active development branch; React UI scaffolded, Forecast operation panel pending

39 KiB Raw Blame History Unescape Escape

Pivot Forecast — Application Spec

Overview

Tech Stack

Database Schema: pf

pf.source

pf.col_meta

pf.version

pf.log

pf.fc_{tname}_{version_id} (dynamic, one per version)

pf.sql

Setup / Install Scripts

API Routes

DB Browser

Source Management

Forecast Versions

Baseline & Reference Data

Forecast Data

Forecast Operations

Scale

Recode

Clone

Audit & Undo

Frontend (Web UI)

Navigation (sidebar)

Setup View (① Setup)

Baseline View (② Baseline)

Baseline Workbench

Forecast View

Log View

Forecast SQL Patterns

Baseline Load (one segment)

Clear Baseline

Reference Load

Scale

Recode

Clone

Undo

Admin Setup Flow (end-to-end)

User Forecast Flow (end-to-end)

Open Questions / Future Scope

Project Status — 2026-04-25

What's working

Known issues / next focus

Branch status

39 KiB

Raw Blame History

Database Schema: `pf`

`pf.source`

`pf.col_meta`

`pf.version`

`pf.log`

`pf.fc_{tname}_{version_id}` (dynamic, one per version)

`pf.sql`