Commit Graph

36 Commits

Author SHA1 Message Date
3cc8bc635a Update README to reflect current state of the project
Documents manage.py menu, adds full API reference tables, fixes
incorrect route in quick example, and removes stale sections
(docs/ dir, initial development status).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:42:49 -04:00
291c665ed1 Consolidate all SQL into database/queries/, switch to literal SQL in routes
- Add database/queries/{sources,rules,mappings,records}.sql — one file per
  route, all business logic in PostgreSQL functions
- Replace parameterized queries in all four route files with lit()/jsonLit()
  literal interpolation for debuggability
- Add api/lib/sql.js with lit(), jsonLit(), arr() helpers
- Fix get_view_data to use json_agg (preserves column order) with subquery
  (guarantees sort order is respected before aggregation)
- Fix jsonLit() for JSONB params so plain strings become valid JSON
- Update manage.py option 3 to deploy database/queries/ instead of functions.sql
- Add SPEC.md covering architecture, philosophy, and manage.py

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:36:53 -04:00
10dc92b45e Fix 'Mapping already exists' error on second save
After createMapping, the new mapping's id was not stored in allValues
state, so editing the row again fell into the create path instead of
update. Now stores created.id so subsequent saves correctly use updateMapping.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:50:40 -04:00
37a6645af0 Persist session across page refreshes via sessionStorage
Credentials are saved to sessionStorage on login and restored on mount,
so a page refresh re-authenticates silently. Closing the tab clears them.
Logout explicitly removes them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:32:46 -04:00
21388b7646 Server-side sorting on Records page
Clicking a column header reloads from page 1 with ORDER BY col ASC/DESC
NULLS LAST passed to the view query. Sort column is validated against
information_schema.columns to prevent injection. Pagination preserves
the active sort across prev/next.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:31:14 -04:00
2aa9e0fcdd Records: format dates as YYYY-MM-DD for correct sort order
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 20:26:59 -04:00
b311092987 Records page: sortable headers and short date formatting
- Click any column header to sort asc/desc (⇅ / ▲ / ▼ indicators)
- Sort is client-side within the current page, numeric-aware
- Dates matching ISO format are displayed as e.g. "Apr 5, 26"
- Sort resets on source change

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:57:27 -04:00
6b7f1c1334 Move username and sign out below sidebar title
Username and 'Sign out' link now sit on their own row under the
Dataflow title, so they always have room to display regardless of
title width.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:50:04 -04:00
71c4654361 Collapsible sidebar for mobile, fix logout button visibility
On mobile: hamburger button in top bar opens sidebar as a slide-over
with a backdrop overlay. Nav links and source selector close it on tap.
On desktop: sidebar is static as before.

Logout button now sits alongside the close button in the sidebar header
with a gap so both are always visible.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:48:07 -04:00
2c573a5eeb Add login authentication with Basic Auth
- Express auth middleware checks Authorization: Basic header on all /api
  routes using bcrypt against LOGIN_USER/LOGIN_PASSWORD_HASH in .env
- React login screen shown before app loads, stores credentials in memory,
  sends them with every API request, clears and returns to login on 401
- Logout button in sidebar header
- manage.py option 9: set login credentials (bcrypt via node, writes to .env)
- manage.py status shows whether login credentials are configured

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:41:07 -04:00
1edb998487 manage.py: show commands before confirms, fold schema/fn into step 1, nginx guard
- Show exact commands that will be run before each confirm prompt
- Step 1 dialog now offers schema and function deployment after writing .env
- Steps 2/3 relabeled as 'Redeploy only' for standalone use
- Option 5 (nginx) detects existing config and warns before overwriting
- Option 1 menu label clarified as 'Database configuration and deployment dialog'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:28:26 -04:00
a3c7be61d0 Fix nginx config file permissions after sudo cp
sudo cp creates the file as root:root 0600, making it unreadable by the
app user. Add sudo chmod 644 after writing so status detection can read
it without sudo, matching how other nginx configs are set up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:10:59 -04:00
4b8864edd9 Fix nginx status detection and cert check for root-owned files
nginx config files written by sudo are root-only (rw-------), so
nginx_domain() was silently failing to read them. Now uses 'sudo -n cat'
with fallback to direct read for world-readable files.

Also fix PermissionError on cert_path.exists() — /etc/letsencrypt/live/
requires root, so use 'sudo test -f' instead of Path.exists().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 17:08:49 -04:00
61fe8f630b Fix sudo permission denied in nginx and service setup
capture_output=True on sudo_run suppresses the sudo password prompt,
causing silent auth failure and permission denied on subsequent calls.
Removed capture_output from nginx -t and systemctl enable so the
password prompt and any error output appear on screen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:48:20 -04:00
b93751e3d1 Improve manage.py clarity and verbosity throughout
Every status line, action header, confirm prompt, and ok/err message now
names exactly what it refers to — schema name, database, host, file paths,
and systemd commands. Menu items include source/target context. No ambiguous
shorthand anywhere.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:42:31 -04:00
a26a7643e4 Clarify manage.py configure step wording
Menu item: 'Configure database' → 'Configure database connection'
Intro text: disambiguates that the database is what gets created, not a connection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:26:31 -04:00
f2c4eb8339 Add Python management tool for configure/deploy/manage workflow
No external dependencies — uses psql CLI via subprocess. Interactive menu
detects current state (DB connection, schema, UI build, service status) and
guides through configure, deploy, rebuild, nginx setup, and service management.
Handles both new installs and existing databases (grants access vs. creating).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:22:40 -04:00
1b34dc830b Fix deploy.sh: don't prompt for systemd service if already installed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:13:06 -04:00
0994d48383 Print static table of contents at start of deploy.sh
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:11:15 -04:00
fde9e31b14 Improve deploy.sh: TOC/plan summary, per-section confirm, nginx setup
- Collect all config upfront then print a plan showing every step
  (active or skipped) before doing any work
- Prompt "Proceed? [Y/n]" at each section for granular control
- Add nginx reverse proxy setup with certbot SSL support
- Add overall "Continue?" confirmation after plan is shown

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 16:10:17 -04:00
619e83acb6 Add unified deploy.sh and systemd service unit
- Merge setup.sh and deploy.sh into single deploy.sh
- First run (no .env): creates DB user/database, deploys schema +
  functions, builds UI, installs systemd service
- Subsequent runs: optionally change DB target, redeploy functions,
  rebuild UI, restart service
- Add dataflow.service systemd unit for process management
- Remove setup.sh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 15:53:02 -04:00
63b1634b40 Redesign mappings page: single grid, sticky controls, rule-gated loading
- Replace three-tab layout with single unified grid and filter buttons
  (All/Unmapped/Mapped with counts) in a sticky top control bar
- Require rule selection before loading any data
- Move source label, rule selector, filter, Save All, Export/Import into
  sticky header that stays visible while scrolling
- Add inline delete (×) per mapped row — reverts to unmapped rather than
  removing the row from view
- Simplify component state: drop separate unmapped/mapped state, derive
  everything from allValues

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 11:14:24 -04:00
dcac6def87 Unify mappings UI around single SQL query with full UX improvements
- Add get_all_values() SQL function returning all extracted values (mapped
  + unmapped) with real record counts and mapping output in one query
- Add /mappings/source/:source/all-values API endpoint
- Rewrite All tab to use get_all_values directly instead of merging two
  separate API calls; counts now populated for all rows
- Rewrite export.tsv to use get_all_values (real counts for mapped rows)
- Fix save bug where editing one output field blanked unedited fields by
  merging drafts over existing mapping output instead of replacing
- Add dirty row highlighting (blue tint) and per-rule Save All button
- Fix sort instability during editing by sorting on committed values only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 09:58:09 -04:00
4cf5be52e8 Rewrite apply_transformations as set-based CTE chain
Replaces the nested FOR loops (row-by-row, rule-by-rule) with a single
SQL CTE chain that processes all records × rules in one pass, mirroring
the TPS approach.

CTE chain:
  qualifying      → all untransformed records for the source
  rx              → apply each rule (extract/replace) to each record
  linked          → LEFT JOIN mappings to find mapped output
  rule_output     → build per-rule JSONB (with retain support)
  record_additions → merge all rule outputs per record in sequence order
  UPDATE          → set transformed = data || additions

Also adds jsonb_concat_obj aggregate (jsonb merge with ORDER BY support)
needed to collapse multiple rule outputs per record into one object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 21:13:49 -04:00
f7f88bb5cf Merge sample-refactor: TSV mappings export/import, retain flag, regex improvements
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 20:49:55 -04:00
f59908aaa3 Add retain flag to rules for preserving extracted values alongside mappings
Mirrors TPS's retain: y behaviour — when a mapping is applied, the extracted
value is also written to output_field so both the raw extraction and the
mapped result are available in transformed data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 20:48:52 -04:00
3be5ccc435 Add TSV export/import backend and update unmapped sample column
- Restore export.tsv and import-csv endpoints to mappings routes
- sample column is always last in export and discarded on import
- get_unmapped_values now returns distinct source field values as sample instead of full raw records

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-04 20:19:51 -04:00
6f2992b315 Add TSV export/import UI for mappings
- Export button downloads unmapped + existing mappings as TSV with sample column showing distinct source field values for context
- Import button uploads filled TSV, any non-system column treated as an output key
- Exclude *.tsv files from git

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-03 23:29:07 -04:00
1ed08755c1 Add g flag support and fix regex aggregation in extract rules
- Switch apply_transformations from regexp_match to regexp_matches with
  ORDINALITY, enabling the g flag to return all occurrences as a JSONB array
- Aggregate matches directly to JSONB in lateral subquery to avoid
  text[][] type errors when subscripting array_agg results
- Pass flags as proper third argument to regexp_matches/regexp_replace
  instead of inline (?flags) prefix — the only way g works correctly
- Apply same fix to preview and test endpoints in rules.js
- Add migrate_tps.sql script for migrating data from TPS to Dataflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 22:48:50 -04:00
928a54932d Add multi-capture regex, computed view fields, collapsible rules, and live preview
- Support multi-capture-group regex: mappings.input_value changed to JSONB,
  regexp_match() result stored as scalar or array JSONB in transformed column
- Computed expression fields in generated views: {fieldname} refs substituted
  with (transformed->>'fieldname')::numeric for arithmetic in view columns
- Fix generate_source_view to DROP VIEW before CREATE (avoids column drop error)
- Collapsible rule cards that open directly to inline edit form
- Debounced live regex preview (extract + replace) with popout modal for 50 rows
- Records page now shows dfv.<source> view output instead of raw records
- Unified field table in Sources: single table with In view, Seq, expression columns
- Fix "Rule already exists" error when editing by passing rule.id directly to submit
- Fix Sources page clearing on F5 by watching sourceObj?.name in useEffect dep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 16:37:15 -04:00
eb50704ca0 Add React UI and backend enhancements for dataflow
- Add full React + Vite UI (src/pages: Sources, Rules, Mappings, Records, Import)
- Sidebar layout with source selector persisted to localStorage
- Sources: unified field table with Dedup/In-view checkboxes, CSV suggest, generate dfv view
- Rules: extract/replace function types, regex flags, input field picklist, test results
- Mappings: unmapped values with sample records, inline key/value editor, edit existing mappings
- Records: expanded row shows per-rule extraction and mapping output breakdown
- Import: drag-drop CSV, transform/reprocess buttons, import history
- Backend: add flags/function_type to rules, get_unmapped_values with samples, generate_source_view, fields endpoint, reprocess endpoint
- database/functions.sql: apply_transformations supports replace mode and flags; generate_source_view builds typed dfv views
- Server bound to 0.0.0.0, SPA fallback for client-side routing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 00:35:33 -04:00
83300d7a8e Add missing backend features before UI build
- POST /api/sources/suggest: derive source definition from CSV upload
- GET /api/sources/:name/import-log: query import history
- GET /api/rules/:id/test: test rule pattern against real records
- rules: add function_type (extract/replace) and flags columns
- get_unmapped_values: include up to 3 sample records per value
- npm start now uses nodemon for auto-reload

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 22:48:41 -04:00
864e5fc193 Add systemd service setup script for production deployment 2026-03-28 02:45:23 -04:00
06e002931d Fix user existence check and add PGPASSWORD for app user during deploy 2026-03-28 01:16:45 -04:00
7abecc99ea Add interactive setup script with PostgreSQL user/database creation and uninstall script 2026-03-28 00:59:41 -04:00
3e2d56991c Initial commit: dataflow data transformation tool 2026-03-28 00:44:13 -04:00