dataflow/docs/refactor-transformed-split.md
Paul Trowbridge 89a70bdf7e Split transformed column; add override management; show all override keys in panel
- transformed now stores only rule additions (not merged data+overrides)
- View dynamically computes data || transformed || overrides at query time
- New DB functions: set/clear/bulk_set_record_overrides
- Records panel now includes source-wide override keys so party/reason etc.
  appear even on records that don't have them set yet

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 11:00:24 -04:00

63 lines
2.5 KiB
Markdown

# Refactor: Split `transformed` into three columns
## Goal
Separate `records` into three clean JSONB layers with clear semantics:
| Column | Meaning | Wins over |
|---|---|---|
| `data` | Raw import values, never mutated | — |
| `transformed` | Rule/mapping-derived fields only | `data` |
| `overrides` | Manual user overrides | `data`, `transformed` |
Consumers merge them at read time:
```sql
data || COALESCE(transformed, '{}'::jsonb) || COALESCE(overrides, '{}'::jsonb)
```
## Why
Currently `transformed` duplicates `data` keys because `apply_transformations` was originally
written as `data || rule_additions`. This makes it impossible to tell what the rules actually
changed vs. what was carried from the original import.
## Current State (branch: `transformed-refactor`)
### Already done in functions.sql
- `apply_transformations` — already stores only rule additions (`COALESCE(ra.additions, '{}')`)
- `generate_source_view` — already uses the 3-way coalesce for `dfv.*` views
- `set_record_overrides`, `clear_record_overrides`, `bulk_set_record_overrides` — exist
- API routes — `PUT /api/records/:id/overrides`, `DELETE /:id/overrides`, `POST /bulk-overrides` exist
### Still needed
1. **`database/schema.sql`** — add `overrides JSONB` column to `records` table and a GIN index.
Also fix the syntax error: trailing comma before `)` on line 48.
2. **`ui/src/pages/Records.jsx`** — right panel currently iterates `selectedRecord.transformed`
for all fields. Split into three sections:
- **Original** (`data`) — read-only, muted style
- **Transformed** (`transformed`) — rule-derived delta only, highlighted
- **Overrides** (`overrides`) — editable, amber style (existing draft UI already works here)
3. **Deploy + reprocess** (user-triggered, not automated):
- `psql -d dataflow -f database/schema.sql` (drop/recreate schema)
- `psql -d dataflow -f database/functions.sql` (redeploy functions)
- Regenerate all `dfv.*` views via the API for each source
- Run `reprocess_records` on every source to strip stale `data` keys from existing `transformed` rows
## Rollback
Branch `stacks` is the stable point. A pg_dump taken before deployment is the DB rollback.
## File Checklist
- [ ] `database/schema.sql` — add `overrides` column + index, fix syntax error
- [ ] `database/functions.sql` — no changes needed (already correct)
- [ ] `ui/src/pages/Records.jsx` — split inspector panel into 3 sections
- [ ] Build UI: `cd ui && npm run build`
- [ ] Deploy DB (user-triggered)
- [ ] Reprocess all sources (user-triggered)