Commit Graph

13 Commits

Author SHA1 Message Date
d495ef2fc5 Records filters, global picklist, autocomplete, and rule reprocess
- Records tab: regex filter bar (postgres ~*), add/remove filters, debounced,
  ANDed together; get_view_data gains p_filters JSONB param
- Global picklist: sources.global_picklist flag (default true) controls whether
  a source's mapped output values feed the cross-source autocomplete suggestion pool;
  toggle on Sources page; get_global_output_values() SQL function
- Mappings: replace native datalist with custom AutocompleteInput component —
  Alt+Down opens, Tab cycles, Enter selects, arrow keys navigate, Escape closes
- Rules: auto-reprocess source records when a rule is created or updated
- preview_rule: fix BIGINT/INT return type mismatch
- Stale get_import_log removed from sources.sql
- TSV export: fetch with auth headers instead of plain <a href> (fixes 401)
- + column button: more visible styling

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 16:28:26 -04:00
d63d70cd52 Import log, constraint key overhaul, and dedup improvements
- Rename dedup_key/dedup_fields → constraint_key/constraint_fields everywhere
  (schema, functions, routes, UI, migration script, docs)
- Change constraint_key from MD5 TEXT hash to readable JSONB object
- Drop unique constraint on (source_name, constraint_key); dedup is now
  enforced at import time via CTE, allowing intra-file duplicate rows
- Add import_id FK (ON DELETE CASCADE) so deleting a log entry removes its records
- Add info JSONB to import_log with inserted_keys and excluded_keys arrays
- Add get_import_log, get_all_import_logs, delete_import SQL functions
- Auto-apply transformations immediately after import
- Import UI: expandable key detail, checkbox selection, delete with confirm,
  import ID column, transform result display
- New Log page: global import log across all sources

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-13 23:44:30 -04:00
2abcb89bcd Add import log detail, key tracking, and cascade delete
- Add import_id column to records (links each record to its import batch)
- import_records() now stores readable dedup field values (not hashes) in
  info.inserted_keys / info.excluded_keys, and stamps import_id on insert
- delete_import() simplified to delete log row; ON DELETE CASCADE removes records
- Add get_import_log() and get_all_import_logs() DB functions
- Add DELETE /api/sources/:name/import-log/:id endpoint
- Add GET /api/sources/import-log global log endpoint
- Import route now auto-applies transformations to new records after import
- Import page: show ID column, expandable key detail, checkbox delete
- New Log page: global view of all imports across sources
- Update README API reference and workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-12 11:04:34 -04:00
291c665ed1 Consolidate all SQL into database/queries/, switch to literal SQL in routes
- Add database/queries/{sources,rules,mappings,records}.sql — one file per
  route, all business logic in PostgreSQL functions
- Replace parameterized queries in all four route files with lit()/jsonLit()
  literal interpolation for debuggability
- Add api/lib/sql.js with lit(), jsonLit(), arr() helpers
- Fix get_view_data to use json_agg (preserves column order) with subquery
  (guarantees sort order is respected before aggregation)
- Fix jsonLit() for JSONB params so plain strings become valid JSON
- Update manage.py option 3 to deploy database/queries/ instead of functions.sql
- Add SPEC.md covering architecture, philosophy, and manage.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:36:53 -04:00
dcac6def87 Unify mappings UI around single SQL query with full UX improvements
- Add get_all_values() SQL function returning all extracted values (mapped
  + unmapped) with real record counts and mapping output in one query
- Add /mappings/source/:source/all-values API endpoint
- Rewrite All tab to use get_all_values directly instead of merging two
  separate API calls; counts now populated for all rows
- Rewrite export.tsv to use get_all_values (real counts for mapped rows)
- Fix save bug where editing one output field blanked unedited fields by
  merging drafts over existing mapping output instead of replacing
- Add dirty row highlighting (blue tint) and per-rule Save All button
- Fix sort instability during editing by sorting on committed values only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:58:09 -04:00
4cf5be52e8 Rewrite apply_transformations as set-based CTE chain
Replaces the nested FOR loops (row-by-row, rule-by-rule) with a single
SQL CTE chain that processes all records × rules in one pass, mirroring
the TPS approach.

CTE chain:
  qualifying      → all untransformed records for the source
  rx              → apply each rule (extract/replace) to each record
  linked          → LEFT JOIN mappings to find mapped output
  rule_output     → build per-rule JSONB (with retain support)
  record_additions → merge all rule outputs per record in sequence order
  UPDATE          → set transformed = data || additions

Also adds jsonb_concat_obj aggregate (jsonb merge with ORDER BY support)
needed to collapse multiple rule outputs per record into one object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 21:13:49 -04:00
f59908aaa3 Add retain flag to rules for preserving extracted values alongside mappings
Mirrors TPS's retain: y behaviour — when a mapping is applied, the extracted
value is also written to output_field so both the raw extraction and the
mapped result are available in transformed data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 20:48:52 -04:00
3be5ccc435 Add TSV export/import backend and update unmapped sample column
- Restore export.tsv and import-csv endpoints to mappings routes
- sample column is always last in export and discarded on import
- get_unmapped_values now returns distinct source field values as sample instead of full raw records

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 20:19:51 -04:00
1ed08755c1 Add g flag support and fix regex aggregation in extract rules
- Switch apply_transformations from regexp_match to regexp_matches with
  ORDINALITY, enabling the g flag to return all occurrences as a JSONB array
- Aggregate matches directly to JSONB in lateral subquery to avoid
  text[][] type errors when subscripting array_agg results
- Pass flags as proper third argument to regexp_matches/regexp_replace
  instead of inline (?flags) prefix — the only way g works correctly
- Apply same fix to preview and test endpoints in rules.js
- Add migrate_tps.sql script for migrating data from TPS to Dataflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 22:48:50 -04:00
928a54932d Add multi-capture regex, computed view fields, collapsible rules, and live preview
- Support multi-capture-group regex: mappings.input_value changed to JSONB,
  regexp_match() result stored as scalar or array JSONB in transformed column
- Computed expression fields in generated views: {fieldname} refs substituted
  with (transformed->>'fieldname')::numeric for arithmetic in view columns
- Fix generate_source_view to DROP VIEW before CREATE (avoids column drop error)
- Collapsible rule cards that open directly to inline edit form
- Debounced live regex preview (extract + replace) with popout modal for 50 rows
- Records page now shows dfv.<source> view output instead of raw records
- Unified field table in Sources: single table with In view, Seq, expression columns
- Fix "Rule already exists" error when editing by passing rule.id directly to submit
- Fix Sources page clearing on F5 by watching sourceObj?.name in useEffect dep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 16:37:15 -04:00
eb50704ca0 Add React UI and backend enhancements for dataflow
- Add full React + Vite UI (src/pages: Sources, Rules, Mappings, Records, Import)
- Sidebar layout with source selector persisted to localStorage
- Sources: unified field table with Dedup/In-view checkboxes, CSV suggest, generate dfv view
- Rules: extract/replace function types, regex flags, input field picklist, test results
- Mappings: unmapped values with sample records, inline key/value editor, edit existing mappings
- Records: expanded row shows per-rule extraction and mapping output breakdown
- Import: drag-drop CSV, transform/reprocess buttons, import history
- Backend: add flags/function_type to rules, get_unmapped_values with samples, generate_source_view, fields endpoint, reprocess endpoint
- database/functions.sql: apply_transformations supports replace mode and flags; generate_source_view builds typed dfv views
- Server bound to 0.0.0.0, SPA fallback for client-side routing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 00:35:33 -04:00
83300d7a8e Add missing backend features before UI build
- POST /api/sources/suggest: derive source definition from CSV upload
- GET /api/sources/:name/import-log: query import history
- GET /api/rules/:id/test: test rule pattern against real records
- rules: add function_type (extract/replace) and flags columns
- get_unmapped_values: include up to 3 sample records per value
- npm start now uses nodemon for auto-reload

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 22:48:41 -04:00
3e2d56991c Initial commit: dataflow data transformation tool 2026-03-28 00:44:13 -04:00