dataflow/docs/refactor-transformed-split.md
Paul Trowbridge 89a70bdf7e Split transformed column; add override management; show all override keys in panel
- transformed now stores only rule additions (not merged data+overrides)
- View dynamically computes data || transformed || overrides at query time
- New DB functions: set/clear/bulk_set_record_overrides
- Records panel now includes source-wide override keys so party/reason etc.
  appear even on records that don't have them set yet

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 11:00:24 -04:00

2.5 KiB

Refactor: Split transformed into three columns

Goal

Separate records into three clean JSONB layers with clear semantics:

Column Meaning Wins over
data Raw import values, never mutated
transformed Rule/mapping-derived fields only data
overrides Manual user overrides data, transformed

Consumers merge them at read time:

data || COALESCE(transformed, '{}'::jsonb) || COALESCE(overrides, '{}'::jsonb)

Why

Currently transformed duplicates data keys because apply_transformations was originally written as data || rule_additions. This makes it impossible to tell what the rules actually changed vs. what was carried from the original import.

Current State (branch: transformed-refactor)

Already done in functions.sql

  • apply_transformations — already stores only rule additions (COALESCE(ra.additions, '{}'))
  • generate_source_view — already uses the 3-way coalesce for dfv.* views
  • set_record_overrides, clear_record_overrides, bulk_set_record_overrides — exist
  • API routes — PUT /api/records/:id/overrides, DELETE /:id/overrides, POST /bulk-overrides exist

Still needed

  1. database/schema.sql — add overrides JSONB column to records table and a GIN index. Also fix the syntax error: trailing comma before ) on line 48.

  2. ui/src/pages/Records.jsx — right panel currently iterates selectedRecord.transformed for all fields. Split into three sections:

    • Original (data) — read-only, muted style
    • Transformed (transformed) — rule-derived delta only, highlighted
    • Overrides (overrides) — editable, amber style (existing draft UI already works here)
  3. Deploy + reprocess (user-triggered, not automated):

    • psql -d dataflow -f database/schema.sql (drop/recreate schema)
    • psql -d dataflow -f database/functions.sql (redeploy functions)
    • Regenerate all dfv.* views via the API for each source
    • Run reprocess_records on every source to strip stale data keys from existing transformed rows

Rollback

Branch stacks is the stable point. A pg_dump taken before deployment is the DB rollback.

File Checklist

  • database/schema.sql — add overrides column + index, fix syntax error
  • database/functions.sql — no changes needed (already correct)
  • ui/src/pages/Records.jsx — split inspector panel into 3 sections
  • Build UI: cd ui && npm run build
  • Deploy DB (user-triggered)
  • Reprocess all sources (user-triggered)