pf_app/pf_spec.md

# Pivot Forecast — Application Spec

## Overview

A web application for building named forecast scenarios against any PostgreSQL table. The core workflow is: load known historical actuals as a baseline, shift those dates forward by a specified interval into the forecast period to establish a no-change starting point, then apply incremental adjustments (scale, recode, clone) to build the plan. An admin configures a source table, generates a baseline, and opens it for users to make adjustments. Users interact with a pivot table to select slices of data and apply forecast operations. All changes are incremental (append-only), fully audited, and reversible.

---

## Tech Stack

- **Backend:** Node.js / Express
- **Database:** PostgreSQL — isolated `pf` schema, installs into any existing DB
- **Frontend:** React + Vite + Tailwind CSS; Perspective (forecast pivot)
- **Pattern:** Follows fc_webapp (shell) + pivot_forecast (operations)

---

## Database Schema: `pf`

Everything lives in the `pf` schema. Install via sequential SQL scripts.

### `pf.source`
Registered source tables available for forecasting.

```sql
CREATE TABLE pf.source (
    id              serial PRIMARY KEY,
    schema          text NOT NULL,
    tname           text NOT NULL,
    label           text,                   -- friendly display name
    status          text DEFAULT 'active',  -- active | archived
    default_layout  jsonb,                  -- Perspective view config used as per-source default
    created_at      timestamptz DEFAULT now(),
    created_by      text,
    UNIQUE (schema, tname)
);
```

### `pf.col_meta`
Column configuration for each registered source table. Determines how the app treats each column.

```sql
CREATE TABLE pf.col_meta (
    id              serial PRIMARY KEY,
    source_id       integer REFERENCES pf.source(id),
    cname           text NOT NULL,          -- column name in source table
    label           text,                   -- friendly display name
    role            text NOT NULL,          -- 'dimension' | 'value' | 'units' | 'date' | 'filter' | 'ignore'
    is_key          boolean DEFAULT false,  -- true = part of natural key (used in WHERE slice)
    opos            integer,                -- ordinal position (for ordering)
    dim_group       text,                   -- groups functionally dependent columns (see below)
    dim_period_col  text,                   -- maps this dimension to a pf.dim_period column
    UNIQUE (source_id, cname)
);
```

**Roles:**
- `dimension` — categorical field (customer, part, channel, rep, geography, etc.) — appears as pivot rows/cols, used in WHERE filters for operations
- `value` — the money/revenue field to scale (**required** — SQL generation fails without it)
- `units` — the quantity field to scale (**optional** — if absent, units columns are omitted from the forecast table and all SQL patterns)
- `date` — the primary date field; used for baseline/reference date range and stored in the forecast table (**required**)
- `filter` — columns available as filter conditions in the Baseline Workbench (e.g. order status, ship date, open flag); used in baseline WHERE clauses but **not stored** in the forecast table
- `ignore` — exclude from forecast table entirely

**`dim_group`** — a free-text group name linking a `date` column to its derived dimension siblings. When the `date` column has `is_key = true` and a `dim_group` value, the SQL generator looks for `dimension` columns in the same group that also have a `dim_period_col` value. Those columns are sourced from `pf.dim_period` on baseline/reference load (via a JOIN on `drange @> date`) rather than copied raw from the source table. This allows fiscal year, quarter, and month columns to be stored in the forecast table with calendar-correct values even if those columns don't exist in the source.

**`dim_period_col`** — names the column in `pf.dim_period` to use as the value for this dimension on load. Only meaningful when the column is in a `dim_group` whose `date` key has `is_key = true`. Example: `cal_year`, `fisc_quarter`, `fisc_label`.

### `pf.dim_period`
Calendar lookup table. One row per month from 2018-01-01 through 2035-12-01. Keyed on `sdat` (month start date). Used to derive fiscal/calendar period columns at baseline load time when `dim_group` / `dim_period_col` are configured on col_meta.

Populated by `setup_sql/gen_dim_period.sql` (safe to re-run; `ON CONFLICT DO NOTHING`). Fiscal year start month is configurable at the top of that script (default: June, i.e. fiscal month 1 = June).

Key columns: `sdat`, `edat`, `drange` (GiST-indexed daterange), `cal_year`, `cal_quarter`, `cal_month`, `cal_month_abbr`, `cal_month_name`, `cal_label`, `fisc_year`, `fisc_quarter`, `fisc_quarter_label`, `fisc_month`, `fisc_month_abbr`, `fisc_month_name`, `fisc_label`, `period_key`.

The baseline/reference SQL JOINs this table when `hasDimPeriod` is true: `JOIN pf.dim_period dp ON dp.drange @> (s.{date_col} + '{{date_offset}}'::interval)::date`.

### `pf.version`
Named forecast scenarios. One forecast table (`pf.fc_{tname}_{version_id}`) is created per version.

```sql
CREATE TABLE pf.version (
    id              serial PRIMARY KEY,
    source_id       integer REFERENCES pf.source(id),
    name            text NOT NULL,
    description     text,
    status          text DEFAULT 'open',        -- open | closed
    exclude_iters   jsonb DEFAULT '["reference"]', -- iter values excluded from all operations
    created_at      timestamptz DEFAULT now(),
    created_by      text,
    closed_at       timestamptz,
    closed_by       text,
    UNIQUE (source_id, name)
);
```

**`exclude_iters`:** jsonb array of `iter` values that are excluded from operation WHERE clauses. Defaults to `["reference"]`. Reference rows are still returned by `get_data` (visible in pivot) but are never touched by scale/recode/clone. Additional iters can be added to lock them from further adjustment.

**Forecast table naming:** `pf.fc_{tname}_{version_id}` — e.g., `pf.fc_sales_3`. One table per version, physically isolated. Contains both operational rows and reference rows.

Creating a version → `CREATE TABLE pf.fc_{tname}_{version_id} (...)`
Deleting a version → `DROP TABLE pf.fc_{tname}_{version_id}` + delete from `pf.version` + delete from `pf.log`

### `pf.log`
Audit log. Every write operation gets one entry here.

```sql
CREATE TABLE pf.log (
    id          bigserial PRIMARY KEY,
    version_id  integer REFERENCES pf.version(id),
    pf_user     text NOT NULL,
    stamp       timestamptz DEFAULT now(),
    operation   text NOT NULL,  -- 'baseline' | 'reference' | 'scale' | 'recode' | 'clone'
    slice       jsonb,          -- the WHERE conditions that defined the selection
    params      jsonb,          -- operation parameters (increments, new values, scale factor, etc.)
    note        text            -- user-provided comment
);
```

### `pf.fc_{tname}_{version_id}` (dynamic, one per version)
Created when a version is created. Mirrors source table dimension/value/date columns (and units if configured) plus any `dim_period_col`-derived dimension columns, plus forecast metadata. Contains both operational rows (`pf_iter = 'baseline' | 'scale' | 'recode' | 'clone'`) and reference rows (`pf_iter = 'reference'`).

```sql
-- Example: source table "sales", version id 3 → pf.fc_sales_3
CREATE TABLE pf.fc_sales_3 (
    id              bigserial PRIMARY KEY,

    -- mirrored from source (role = dimension | value | units | date only):
    customer        text,
    channel         text,
    part            text,
    geography       text,
    order_date      date,
    value           numeric,
    units           numeric,    -- omitted if no 'units' role in col_meta

    -- forecast metadata:
    pf_iter         text,       -- 'baseline' | 'reference' | 'scale' | 'recode' | 'clone'
    pf_logid        bigint REFERENCES pf.log(id),
    pf_user         text,
    pf_created_at   timestamptz DEFAULT now()
);
```

Note: no `version_id` column on the forecast table — it's implied by the table itself. The `units` column is only present when a column with `role = 'units'` exists in col_meta.

### `pf.sql`
Generated SQL stored per source and operation. Built once when col_meta is finalized, fetched at request time.

```sql
CREATE TABLE pf.sql (
    id           serial PRIMARY KEY,
    source_id    integer REFERENCES pf.source(id),
    operation    text NOT NULL,  -- 'baseline' | 'reference' | 'scale' | 'recode' | 'clone' | 'get_data' | 'undo'
    sql          text NOT NULL,
    generated_at timestamptz DEFAULT now(),
    UNIQUE (source_id, operation)
);
```

**Column names are baked in at generation time.** Runtime substitution tokens:

| Token | Resolved from |
|-------|--------------|
| `{{fc_table}}` | `pf.fc_{tname}_{version_id}` — derived at request time |
| `{{where_clause}}` | built from `slice` JSON by `build_where()` in JS |
| `{{exclude_clause}}` | built from `version.exclude_iters` — e.g. `AND pf_iter NOT IN ('reference')` |
| `{{logid}}` | newly inserted `pf.log` id |
| `{{pf_user}}` | from request body |
| `{{date_from}}` / `{{date_to}}` | baseline/reference date range (source period) |
| `{{date_offset}}` | PostgreSQL interval string to shift dates into the forecast period — e.g. `1 year`, `6 months`, `2 years 3 months` (baseline only; empty string = no shift) |
| `{{value_incr}}` / `{{units_incr}}` | scale operation increments |
| `{{pct}}` | scale mode: absolute or percentage |
| `{{set_clause}}` | recode/clone dimension overrides |
| `{{scale_factor}}` | clone multiplier |

**Request-time flow:**
1. Fetch SQL from `pf.sql` for `source_id` + `operation`
2. Fetch `version.exclude_iters`, build `{{exclude_clause}}`
3. Build `{{where_clause}}` from `slice` JSON via `build_where()`
4. Substitute all tokens
5. Execute — single round trip

**WHERE clause safety:** `build_where()` validates every key in the slice against col_meta (only `role = 'dimension'` columns are permitted). Values are sanitized (escaped single quotes). No parameterization — consistent with existing projects, debuggable in Postgres logs.

---

## Setup / Install Scripts

```
setup_sql/
  01_schema.sql   -- CREATE SCHEMA pf; create all metadata tables (source, col_meta, version, log, sql)
```

Source registration, col_meta configuration, SQL generation, version creation, and forecast table DDL all happen via API.

---

## API Routes

### DB Browser

| Method | Route | Description |
|--------|-------|-------------|
| GET | `/api/tables` | List all tables in the DB with row counts |
| GET | `/api/tables/:schema/:tname/preview` | Preview columns + sample rows |

### Source Management

| Method | Route | Description |
|--------|-------|-------------|
| GET | `/api/sources` | List registered sources |
| POST | `/api/sources` | Register a source table |
| GET | `/api/sources/:id/cols` | Get col_meta for a source |
| PUT | `/api/sources/:id/cols` | Save col_meta configuration |
| POST | `/api/sources/:id/generate-sql` | Generate/regenerate all operation SQL into `pf.sql` |
| GET | `/api/sources/:id/sql` | View generated SQL for a source (inspection/debug) |
| DELETE | `/api/sources/:id` | Deregister a source (does not affect existing forecast tables) |

### Forecast Versions

| Method | Route | Description |
|--------|-------|-------------|
| GET | `/api/sources/:id/versions` | List versions for a source |
| POST | `/api/sources/:id/versions` | Create a new version (CREATE TABLE for forecast table) |
| PUT | `/api/versions/:id` | Update version (name, description, exclude_iters) |
| POST | `/api/versions/:id/close` | Close a version (blocks further edits) |
| POST | `/api/versions/:id/reopen` | Reopen a closed version |
| DELETE | `/api/versions/:id` | Delete a version (DROP TABLE + delete log entries) |

### Baseline & Reference Data

| Method | Route | Description |
|--------|-------|-------------|
| POST | `/api/versions/:id/baseline` | Load one baseline segment (additive — does not clear existing baseline rows) |
| DELETE | `/api/versions/:id/baseline` | Clear all baseline rows and baseline log entries for this version |
| POST | `/api/versions/:id/reference` | Load reference rows from source table for a date range (additive) |

**Baseline load request body:**
```json
{
  "date_offset":  "1 year",
  "filters": [
    [
      { "col": "order_date",   "op": "BETWEEN", "values": ["2024-01-01", "2024-12-31"] },
      { "col": "order_status", "op": "IN",      "values": ["OPEN", "PENDING"] }
    ],
    [
      { "col": "order_status", "op": "IS NULL" }
    ]
  ],
  "pf_user":  "admin",
  "note":     "FY2024 actuals + open orders projected to FY2025",
  "replay":   false
}
```

The example above generates: `(order_date BETWEEN '2024-01-01' AND '2024-12-31' AND order_status IN ('OPEN','PENDING')) OR (order_status IS NULL)`

- `date_offset` — PostgreSQL interval string applied to the primary `role = 'date'` column at insert time. Examples: `"1 year"`, `"6 months"`, `"2 years 3 months"`. Defaults to `"0 days"`. Applied to the stored date value only — filter columns are never shifted.
- `filters` — an array of **groups**. Conditions within a group are AND-ed; groups are OR-ed together. Each group is an array of one or more condition objects:
  - `col` — must be `role = 'date'` or `role = 'filter'` in col_meta
  - `op` — one of `=`, `!=`, `IN`, `NOT IN`, `BETWEEN`, `IS NULL`, `IS NOT NULL`
  - `values` — array of strings; two elements for `BETWEEN`; multiple for `IN`/`NOT IN`; omitted for `IS NULL`/`IS NOT NULL`
  - Backward compatibility: a flat array of condition objects (non-nested) is treated as a single group (all AND).
- At least one group with at least one condition is required.
- `raw_where` — optional string. When present, bypasses `filters` entirely and injects the value verbatim as the WHERE clause body. **Admin-only** — rejected with `403` if the requesting `pf_user` is not in the admin list. Not validated against col_meta. Caller is responsible for correctness and SQL safety. Stored as-is in `pf.log.params` for audit. Cannot be combined with `filters` — if both are present the request is rejected with `400`.
- Baseline loads are **additive** — existing `iter = 'baseline'` rows are not touched. Each load is its own log entry and is independently undoable.

`replay` controls behavior when incremental rows exist (applies to Clear + reload, not individual segments):

- `replay: false` (default) — after clearing, re-load baseline segments, leave incremental rows untouched
- `replay: true` — after clearing, re-load baseline, then re-execute each incremental log entry in chronological order

**v1 note:** `replay: true` returns `501 Not Implemented` until the replay engine is built.

**Clear baseline (`DELETE /api/versions/:id/baseline`)** — deletes all rows where `iter = 'baseline'` and all `operation = 'baseline'` log entries. Irreversible (no undo). Returns `{ rows_deleted, log_entries_deleted }`.

**Reference request body:** same shape as baseline load without `replay`. Reference dates land verbatim (no offset). Additive — multiple reference loads stack independently, each undoable by logid.

### Forecast Data

| Method | Route | Description |
|--------|-------|-------------|
| GET | `/api/versions/:id/data` | Stream all rows for this version as an Arrow IPC binary |

**Transport format — Apache Arrow IPC stream**

The endpoint returns `Content-Type: application/vnd.apache.arrow.stream` (binary). JSON is not used for this route. The client fetches the response as `arrayBuffer()` and passes it directly to `worker.table(buffer)` — Perspective's native ingestion path with no JS deserialization overhead.

Arrow's columnar layout with dictionary encoding on string dimension columns keeps payload size manageable at scale (typically 50–150 MB for 1M rows depending on string cardinality), compared to several times that for equivalent JSON.

**Server-side streaming (cursor-based)**

For datasets that may reach 1M+ rows, the server must not buffer the full query result in memory before writing the response. Instead:

1. Open a PostgreSQL cursor over the `SELECT * FROM {{fc_table}}` query
2. Fetch rows in batches (target: 10 000 rows per batch)
3. For each batch, append a serialized Arrow record batch to the HTTP response using chunked transfer encoding
4. Close the cursor and end the response when all batches are written

This means the first bytes of the Arrow stream reach the client while the server is still reading from the database, and Node.js heap stays bounded regardless of dataset size.

**Client-side loading**

- **Moderate datasets (< ~500k rows):** accumulate the full `arrayBuffer()` then call `worker.table(buffer)` once. Perspective becomes interactive after the stream completes.
- **Large datasets (≥ ~500k rows):** process Arrow record batches incrementally — call `worker.table(firstBatch)` to create the table, then `pspTable.update(batch)` for each subsequent batch. Perspective is interactive and browseable while remaining batches are still arriving.

The client detects which path to use by checking the `X-Row-Count` response header (see below).

**Row-count pre-check**

Before opening the cursor, the server runs `SELECT COUNT(*) FROM {{fc_table}}`. The result is attached as the `X-Row-Count` response header so the client can choose its loading strategy. If the count exceeds 500 000, the UI displays a non-blocking notice ("Loading large dataset — pivot will become interactive as data arrives") rather than a blank screen.

### Forecast Operations

All operations share a common request envelope:

```json
{
  "pf_user": "paul.trowbridge",
  "note":    "optional comment",
  "slice": {
    "channel":   "WHS",
    "geography": "WEST"
  }
}
```

`slice` keys must be `role = 'dimension'` columns per col_meta. Stored in `pf.log` as the implicit link to affected rows.

#### Scale
`POST /api/versions/:id/scale`

```json
{
  "pf_user":    "paul.trowbridge",
  "note":       "10% volume lift Q3 West",
  "slice":      { "channel": "WHS", "geography": "WEST" },
  "value_incr": null,
  "units_incr": 5000,
  "pct":        false
}
```

- `value_incr` / `units_incr` — absolute amounts to add (positive or negative). Either can be null.
- `pct: true` — treat as percentage of current slice total instead of absolute
- Excludes `exclude_iters` rows from the source selection
- Distributes increment proportionally across rows in the slice
- Inserts rows tagged `iter = 'scale'`

#### Recode
`POST /api/versions/:id/recode`

```json
{
  "pf_user": "paul.trowbridge",
  "note":    "Part discontinued, replaced by new SKU",
  "slice":   { "part": "OLD-SKU-001" },
  "set":     { "part": "NEW-SKU-002" }
}
```

- `set` — one or more dimension fields to replace (can swap multiple at once)
- Inserts negative rows to zero out the original slice
- Inserts positive rows with replaced dimension values
- Both sets of rows share the same `logid` — undone together
- Inserts rows tagged `iter = 'recode'`

#### Clone
`POST /api/versions/:id/clone`

```json
{
  "pf_user": "paul.trowbridge",
  "note":    "New customer win, similar profile to existing",
  "slice":   { "customer": "EXISTING CO", "channel": "DIR" },
  "set":     { "customer": "NEW CO" },
  "scale":   0.75
}
```

- `set` — dimension values to override on cloned rows
- `scale` — optional multiplier on value/units (default 1.0)
- Does not offset original slice
- Inserts rows tagged `iter = 'clone'`

### Audit & Undo

| Method | Route | Description |
|--------|-------|-------------|
| GET | `/api/versions/:id/log` | List all log entries for a version, newest first |
| DELETE | `/api/log/:logid` | Undo: delete all forecast rows with this logid, then delete log entry |

---

## Frontend (Web UI)

### Navigation (sidebar)

Three-step collapsible sidebar (200 px expanded / 48 px collapsed, state persisted to `localStorage`):

1. **① Setup** — browse DB tables, register sources, configure col_meta, generate SQL. One-time admin task.
2. **② Baseline** — create/manage versions, load baseline segments, timeline preview. One-time per version.
3. **③ Forecast** — main working view: Perspective pivot + operation panel. Primary ongoing use.

### Setup View (① Setup)

- Left panel: DB table browser — all tables with row counts; click a table to open a preview modal (column list + sample rows)
- Right panel: Registered sources list; click a source to open col_meta editor below
- Col_meta editor: inline table — role dropdown per column, is_key checkbox, label text input, ordinal position
- "Save" button — upserts col_meta; "Generate SQL" button — triggers generate-sql route, shows confirmation
- "Register source" button available in the table preview modal
- New columns default to role `dimension` on registration
- Must generate SQL before a version can be created against this source

### Baseline View (② Baseline)

Source and version selectors at top. Version management inline: create new version (explains that a forecast table will be created), Close / Reopen / Delete buttons. Delete drops the forecast table and removes all version records.

### Baseline Workbench

A dedicated view for constructing the baseline for the selected version. The baseline is built from one or more **segments** — each segment is an independent query against the source table that appends rows to `iter = 'baseline'`. Segments are additive; clearing is explicit.

**Layout:**
```
┌─────────────────────────────────────────────────────────────┐
│  Baseline — [Version name]              [Clear Baseline]     │
├─────────────────────────────────────────────────────────────┤
│  Segments loaded (from log):                                 │
│  ┌──────┬────────────────┬──────────┬───────┬──────────┐    │
│  │  ID  │  Description   │  Rows    │  By   │  [Undo]  │    │
│  └──────┴────────────────┴──────────┴───────┴──────────┘    │
├─────────────────────────────────────────────────────────────┤
│  Add Segment                                                 │
│                                                              │
│  Description  [_______________________________________]      │
│                                                              │
│  Date range   [date_from] to [date_to]  on [date col ▾]     │
│  Date offset  [0] years  [0] months                         │
│                                                              │
│  Additional filters:                                         │
│  [ + Add filter ]                                            │
│  ┌──────────────────┬──────────┬──────────────┬───────┐     │
│  │  Column          │  Op      │  Value(s)    │  [ x ]│     │
│  └──────────────────┴──────────┴──────────────┴───────┘     │
│                                                              │
│  Preview: [projected month chips]                            │
│                                                              │
│  Note  [___________]          [Load Segment]                 │
└─────────────────────────────────────────────────────────────┘
```

**Segments list** — shows all `operation = 'baseline'` log entries for this version, newest first. Each has an Undo button. Undo removes only that segment's rows (by logid), leaving other segments intact.

**Clear Baseline** — deletes ALL `iter = 'baseline'` rows and all `operation = 'baseline'` log entries for this version. Prompts for confirmation. Used when starting over from scratch.

**Add Segment form:**

- **Description** — free text label stored as the log `note`, shown in the segments list
- **Date offset** — years + months spinners; shifts the primary `role = 'date'` column forward on insert
- **Filters** — one or more filter groups that define what rows to pull. Conditions within a group are AND-ed; groups are OR-ed. There is no separate "date range" section — period selection is just a filter like any other:
  - Each group has a header row ("Group 1", "Group 2 — OR", …) and a `+ Add condition` link
  - Within a group: Column (any `role = 'date'` or `role = 'filter'`), Operator (`=`, `!=`, `IN`, `NOT IN`, `BETWEEN`, `IS NULL`, `IS NOT NULL`), Value(s)
  - Value inputs: `BETWEEN` → two date/text inputs; `IN`/`NOT IN` → comma-separated list; `=`/`!=` → single input; omitted for `IS NULL`/`IS NOT NULL`
  - `+ Add OR group` button appends a new empty group below, joined by an "OR" separator label
  - Groups with more than one condition render an "AND" badge between rows to make the logic explicit
  - A group can be removed with `×` on its header (not available when only one group remains)
  - At least one group with at least one condition is required to load a segment
- **Manual WHERE clause** (admin only) — a toggle link ("Switch to manual SQL") that replaces the filter builder with a plain textarea. The admin types a raw PostgreSQL WHERE clause body (no `WHERE` keyword). Switching back to the builder clears the textarea. When active, the filter builder is hidden and the structured `filters` field is not sent; `raw_where` is sent instead. A prominent warning banner reads: "Raw SQL is not validated. You are responsible for correctness and security."
- **Timeline preview** — rendered when any condition in any group is a `BETWEEN` or `=` on a `role = 'date'` column. Shows a horizontal bar (number-line style) for the source period and, if offset > 0, a second bar below for the projected period. Each bar shows start date on the left, end date on the right, duration in the centre. The two bars share the same visual width so the shift is immediately apparent. Not shown in manual WHERE mode or when no date condition is present.
- **Note** — optional free text
- **Load Segment** — submits; appends rows, does not clear existing baseline rows

**Example — three-segment baseline:**

| # | Description | Filter logic | Offset |
|---|-------------|--------------|--------|
| 1 | All orders taken 6/1/25–3/31/26 | `order_date BETWEEN 2025-06-01 AND 2026-03-31` | 0 |
| 2 | Open or unshipped orders (status missing or explicit) | `(status IN ('OPEN','PENDING')) OR (status IS NULL)` | 0 |
| 3 | Prior year book-and-ship 4/1/25–5/31/25 | `order_date BETWEEN 2025-04-01 AND 2025-05-31 AND ship_date BETWEEN 2025-04-01 AND 2025-05-31` | 0 |

Segment 2 uses two OR groups; segment 3 has two AND conditions in one group. Any combination is valid as long as at least one group with at least one condition is present.

### Forecast View

**Layout:**
```
┌─────────────────────────────────────────────────────────────────┐
│  [Version label]  [Refresh]  [Save layout]  [Reset layout]       │
├──────────────────────────────────────┬──────────────────────────┤
│                                      │                           │
│  Perspective Viewer                  │  Operation Panel          │
│  (interactive pivot web component)   │  (active when slice set)  │
│                                      │                           │
│                                      │  Slice:                   │
│                                      │    channel = WHS          │
│                                      │    geography = WEST       │
│                                      │                           │
│                                      │  [ Scale ] [ Recode ]     │
│                                      │  [ Clone ]                │
│                                      │                           │
│                                      │  ... operation form ...   │
│                                      │                           │
│                                      │  [ Submit ]               │
│                                      │                           │
└──────────────────────────────────────┴──────────────────────────┘
```

**Pivot control:** [Perspective](https://perspective.finos.org/) 4.4.0, loaded from CDN at runtime. Data is fetched from `GET /api/versions/:id/data` as an Arrow IPC binary stream and loaded into an in-browser Perspective worker — Perspective's native ingestion path. Supports grouping, splitting, filtering, sorting, and charting interactively. Layout (group_by, split_by, filters, plugin) is saved per version to `localStorage` via Save layout / Reset layout buttons.

**Large-dataset loading sequence:**
1. Client issues `GET /api/versions/:id/data`
2. Server responds with `X-Row-Count` header and begins streaming Arrow record batches
3. If `X-Row-Count` ≥ 500 000, UI shows a non-blocking loading banner; otherwise no indicator
4. Client calls `worker.table(firstBatch)` on the first batch to make the pivot interactive immediately
5. Each subsequent batch is applied with `pspTable.update(batch)` as it arrives
6. Banner clears when the stream closes

**Interaction flow:**
1. Click a cell or row in the pivot — the `perspective-click` event fires
2. `detail.config.filter` from the event is parsed: only `==` filters on `role = dimension` columns are extracted as the slice
3. Slice populates the Operation Panel — pick operation tab, fill in parameters
4. Submit → POST to API → new rows returned via `RETURNING *` are streamed directly into the Perspective table (`pspTable.update(rows)`) — no full reload needed
5. For recode, both the negative offset rows and positive replacement rows are returned and streamed

**Pivot default layout:** built from col_meta — first two `dimension` columns as `group_by`, `date` column as `split_by`. User can rearrange in Perspective settings panel and save.

**Reference rows** (`pf_iter = 'reference'`) are visible in the pivot for comparison context. Operations never affect them (enforced by `exclude_iters` in the version).

### Log View

AG Grid list of log entries — user, timestamp, operation, slice, note, rows affected.
"Undo" button per row → `DELETE /api/log/:logid` → grid and pivot refresh (full reload of Perspective table).

---

## Forecast SQL Patterns

Column names baked in at generation time. Tokens substituted at request time. Metadata columns are `pf_iter`, `pf_logid`, `pf_user`, `pf_created_at`.

**Units conditionality:** `{units_col}` appears in INSERT column lists and SELECT expressions only when a `units` role is configured in col_meta. The SQL generator omits it entirely otherwise — no placeholder column, no zero-fill.

**dim_period JOIN:** when any `dimension` column has `dim_period_col` set (and its group's `date` key has `is_key = true`), the FROM clause becomes `{schema}.{tname} s JOIN pf.dim_period dp ON dp.drange @> (s.{date_col} + '{{date_offset}}'::interval)::date`. Those dimension columns are selected as `dp.{dim_period_col} AS {col}` instead of `s.{col}`.

### Baseline Load (one segment)

```sql
WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'baseline', NULL, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
,ins AS (
    INSERT INTO {{fc_table}} (
        {dimension_cols}, {date_col}, {value_col} [, {units_col}],
        pf_iter, pf_logid, pf_user, pf_created_at
    )
    SELECT
        {dimension_cols},
        ({date_col} + '{{date_offset}}'::interval)::date,
        {value_col} [, {units_col}],
        'baseline', (SELECT id FROM ilog), '{{pf_user}}', now()
    FROM
        {schema}.{tname}  -- or with dim_period JOIN (see above)
    WHERE
        {{filter_clause}}
    RETURNING *
)
SELECT count(*) AS rows_affected FROM ins
```

Baseline loads are **additive** — no DELETE before INSERT. Each segment appends independently.

Token details:
- `{{date_offset}}` — PostgreSQL interval string (e.g. `1 year`); defaults to `0 days`; applied only to the primary `role = 'date'` column on insert
- `{{filter_clause}}` — built from `filters` or `raw_where` at request time (not baked into stored SQL since conditions vary per segment).
  - Structured path (`filters`): each group becomes a parenthesized AND block; groups are joined with `OR`. Every column is validated against col_meta (`role = 'date'` or `role = 'filter'`). Values are escaped (single quotes doubled). Supported operators: `=`, `!=`, `IN`, `NOT IN`, `BETWEEN`, `IS NULL`, `IS NOT NULL`.
  - Raw path (`raw_where`): the string is injected verbatim. No col_meta validation. Admin-only.

### Clear Baseline

Two queries, run in a transaction:
```sql
DELETE FROM {{fc_table}} WHERE pf_iter = 'baseline';
DELETE FROM pf.log WHERE version_id = {{version_id}} AND operation = 'baseline';
```

### Reference Load

```sql
WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'reference', NULL, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
,ins AS (
    INSERT INTO {{fc_table}} (
        {dimension_cols}, {date_col}, {value_col} [, {units_col}],
        pf_iter, pf_logid, pf_user, pf_created_at
    )
    SELECT
        {dimension_cols}, {date_col}, {value_col} [, {units_col}],
        'reference', (SELECT id FROM ilog), '{{pf_user}}', now()
    FROM
        {schema}.{tname}  -- or with dim_period JOIN (see above)
    WHERE
        {{filter_clause}}
    RETURNING *
)
SELECT count(*) AS rows_affected FROM ins
```

No date offset applied — reference rows land at their original dates for prior-period comparison. Same dim_period JOIN logic applies as baseline.

### Scale

```sql
WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'scale', '{{slice}}'::jsonb, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
,base AS (
    SELECT
        {dimension_cols}, {date_col},
        {value_col} [, {units_col}],
        sum({value_col}) OVER () AS total_value
        [, sum({units_col}) OVER () AS total_units]
    FROM {{fc_table}}
    WHERE {{where_clause}}
    {{exclude_clause}}
)
,ins AS (
    INSERT INTO {{fc_table}} (
        {dimension_cols}, {date_col}, {value_col} [, {units_col}],
        pf_iter, pf_logid, pf_user, pf_created_at
    )
    SELECT
        {dimension_cols}, {date_col},
        round(({value_col} / NULLIF(total_value, 0)) * {{value_incr}}, 2)
        [, round(({units_col} / NULLIF(total_units, 0)) * {{units_incr}}, 5)],
        'scale', (SELECT id FROM ilog), '{{pf_user}}', now()
    FROM base
    RETURNING *
)
SELECT * FROM ins
```

`{{value_incr}}` / `{{units_incr}}` are pre-computed in JS when `pct: true` (multiply slice total by pct). Units expressions are omitted when no units column is configured.

### Recode

```sql
WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'recode', '{{slice}}'::jsonb, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
,src AS (
    SELECT {dimension_cols}, {date_col}, {value_col} [, {units_col}]
    FROM {{fc_table}}
    WHERE {{where_clause}}
    {{exclude_clause}}
)
,neg AS (
    INSERT INTO {{fc_table}} ({dimension_cols}, {date_col}, {value_col} [, {units_col}], pf_iter, pf_logid, pf_user, pf_created_at)
    SELECT {dimension_cols}, {date_col}, -{value_col} [, -{units_col}], 'recode', (SELECT id FROM ilog), '{{pf_user}}', now()
    FROM src
    RETURNING *
)
,ins AS (
    INSERT INTO {{fc_table}} ({dimension_cols}, {date_col}, {value_col} [, {units_col}], pf_iter, pf_logid, pf_user, pf_created_at)
    SELECT {{set_clause}}, {date_col}, {value_col} [, {units_col}], 'recode', (SELECT id FROM ilog), '{{pf_user}}', now()
    FROM src
    RETURNING *
)
SELECT * FROM neg UNION ALL SELECT * FROM ins
```

`{{set_clause}}` replaces the listed dimension columns with new values, passes others through unchanged. Both the negative (zero-out) and positive (replacement) rows share the same `pf_logid` and are undone together.

### Clone

```sql
WITH ilog AS (
    INSERT INTO pf.log (version_id, pf_user, operation, slice, params, note)
    VALUES ({{version_id}}, '{{pf_user}}', 'clone', '{{slice}}'::jsonb, '{{params}}'::jsonb, '{{note}}')
    RETURNING id
)
,ins AS (
    INSERT INTO {{fc_table}} ({dimension_cols}, {date_col}, {value_col} [, {units_col}], pf_iter, pf_logid, pf_user, pf_created_at)
    SELECT
        {{set_clause}}, {date_col},
        round({value_col} * {{scale_factor}}, 2)
        [, round({units_col} * {{scale_factor}}, 5)],
        'clone', (SELECT id FROM ilog), '{{pf_user}}', now()
    FROM {{fc_table}}
    WHERE {{where_clause}}
    {{exclude_clause}}
    RETURNING *
)
SELECT * FROM ins
```

### Undo

Two queries run sequentially (not in a CTE — FK ordering):

```sql
DELETE FROM {{fc_table}} WHERE pf_logid = {{logid}};
DELETE FROM pf.log WHERE id = {{logid}};
```

---

## Admin Setup Flow (end-to-end)

1. Open **Sources** view → browse DB tables → register source table
2. Open col_meta editor → assign roles to columns (`dimension`, `value`, `units`, `date`, `filter`, `ignore`), mark is_key dimensions, set labels
3. Click **Generate SQL** → app writes operation SQL to `pf.sql`
4. Open **Versions** view → create a named version (sets `exclude_iters`, creates forecast table)
5. Open **Baseline Workbench** → build the baseline from one or more segments:
   - Each segment specifies a date range (on any date/filter column), date offset, and optional additional filter conditions
   - Add segments until the baseline is complete; each is independently undoable
   - Use "Clear Baseline" to start over if needed
6. Optionally load **Reference** → pick prior period date range → inserts `iter = 'reference'` rows at their original dates (for comparison in the pivot)
7. Open **Forecast** view → share with users

## User Forecast Flow (end-to-end)

1. Open **Forecast** view → select version
2. Pivot loads — explore data, identify slice to adjust
3. Select cells → Operation Panel populates with slice
4. Choose operation → fill in parameters → Submit
5. Grid refreshes — adjustment visible immediately
6. Repeat as needed
7. Admin closes version when forecasting is complete

---

## Open Questions / Future Scope

- **Baseline replay** — re-execute change log against a restated baseline (`replay: true`); v1 returns 501
- **Approval workflow** — user submits, admin approves before changes are visible to others (deferred)
- **Territory filtering** — restrict what a user can see/edit by dimension value (deferred)
- **Export** — download forecast as CSV or push results to a reporting table
- **Version comparison** — side-by-side view of two versions (facilitated by isolated tables via UNION)
- **Col meta / version schema drift** — if col_meta roles are changed after a version's forecast table is already created, the generated SQL and the table DDL go out of sync. UI should detect this: compare col_meta against the forecast table's actual columns via `information_schema`, warn the user, and offer to rebuild the version (drop + recreate table, preserving the version record and log). Workaround: delete and recreate the version manually.
- **Multi-connection support** — currently one DB via `.env`. Full vision: `pf.connection` table (host, port, dbname, user, password as env-var ref), `connection_id` on `pf.source`, per-connection pg pools at runtime. `pf` schema stays on a "home" connection; source data can live anywhere. Connections UI in Setup. Safe to defer while in dev — requires clean reinstall when added since it changes the source schema.

---

## Project Status — 2026-06-12

### What's working
- Full backend: source registration, col_meta, SQL generation, versions, baseline segments, reference load, scale, recode, clone, undo
- `units` column is optional — sources without a units column register and generate SQL correctly
- `dim_group` / `dim_period_col` on col_meta: baseline/reference load JOINs `pf.dim_period` to derive fiscal/calendar period columns rather than copying them raw from the source
- `pf.dim_period` calendar table (2018–2035): populated by `setup_sql/gen_dim_period.sql`, configurable fiscal year start
- React + Vite + Tailwind CSS frontend in `ui/`, built output to `public/app/`, served by Express
- Data transport: Arrow IPC binary stream (`GET /api/versions/:id/data`); server accumulates all rows into one record batch; client hands buffer directly to Perspective WASM
- 3-step collapsible sidebar (Setup / Baseline / Forecast)
- Setup view: DB table browser with preview modal, source registration, col_meta editor (`dim_group`/`dim_period_col` fields included), SQL generation
- Baseline view: version management (create/close/reopen/delete), multi-segment baseline workbench, canvas timeline, filter builder
- Perspective pivot in Forecast view: loads all version rows, interactive group/split/filter/chart, layout saved per version to localStorage
- Slice extraction from `perspective-click` event feeds operation panel directly
- Incremental row streaming: operation results (`RETURNING *`) applied to Perspective table via `pspTable.update()` — no full reload
- Status bar: shows current source · version · baseline row count · status

### Known issues / next focus

- **Forecast view** — operation panel SQL generation complete; UI wiring to API still needed
- **Load progress bar** — jittery at high throughput; throttle to ~10 updates/sec
- **Default pivot layout** — per-source configurable layout not yet implemented; currently hardcodes first 2 dimensions
- **No "current version" persistence** — source/version selection resets on page reload
- **Perspective slice limitation** — computed date columns (Month, YearDate) from split_by don't map back to raw rows; only native dimension columns work for slice extraction
- **Col_meta / version schema drift** — if col_meta changes after a version's forecast table is created, SQL and DDL go out of sync. Workaround: delete and recreate the version.