pipekit

pt/pipekit

Author	SHA1	Message	Date
Paul Trowbridge	e9ff37a3a0	web: collapse jrunner progress ticks in live_log to a single line jrunner prints an in-place progress counter (a bare number) every batch; read with universal newlines, each tick became its own live_log line, so a large load stacked thousands of numbers (a 1.27M-row INSERT left 5,073 progress lines). append_run_live_log now overwrites a trailing bare-number line with the new tick instead of appending, keeping a single current count. Real lines (headers, "N rows written", timestamps) aren't bare numbers and are preserved. No jrunner change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-18 07:52:34 -04:00
Paul Trowbridge	ad14d0f5c9	jrunner: use bulk copy (-b) for SQL Server destinations Pass jrunner's -b flag when the dest JDBC URL is jdbc:sqlserver:, so SQL Server loads stream via TDS bulk copy instead of 250-row INSERT...VALUES round trips. Non-SQL-Server dests are unchanged. Requires the jrunner -b support (bulk-copy branch). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 22:22:32 -04:00
Paul Trowbridge	40c00bca7b	web: quote wizard source-query aliases with the source dialect wizard_create built the generated source query's column aliases with the DEST driver's quote_identifier, but that query runs on the SOURCE. A pg->SQL Server module emitted "AS [col]" (SQL Server brackets) into a Postgres query, which failed with: syntax error at or near "[". The load maps columns by position, so the alias is cosmetic — quote it with the source dialect. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 15:20:30 -04:00
Paul Trowbridge	ea90b3f56a	api: log request payload (secrets masked) on raised HTTPExceptions Add an ASGI middleware that buffers each request body onto the scope and replays it downstream, so the HTTPException handler can log the submitted payload alongside the error. Fields whose name looks secret (password/pwd/ secret/token) are masked. Makes failures like the wizard 500 debuggable against the actual call content. Covers raised HTTPExceptions; handlers that return an error response are not body-logged yet. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 15:20:30 -04:00
Paul Trowbridge	74967c1e56	jrunner: treat SQL Server's 'no result set' DDL trace as benign run_dest_sql executes DDL via jrunner query mode (executeQuery), which demands a ResultSet. CREATE SCHEMA/TABLE produce none, so the driver throws, jrunner logs the trace and exits 0, and _detect_silent_failure flags it as a failure unless the message is allowlisted. Only PG's wording was listed ("No results were returned by the query"); SQL Server says "The statement did not return a result set." — so pg->mssql wizard provisioning died on the first statement. Add the SQL Server phrasing to the benign list. Verified against the live SQL Server connection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 15:20:30 -04:00
Paul Trowbridge	356e5f7799	api: log HTTPException responses (5xx + 4xx) to the journal Failures surfaced via HTTPException (e.g. wizard dest-provisioning errors raised as HTTPException(500, "dest provisioning failed: …")) were turned into responses by FastAPI and never logged — only the access line showed, so the real DB error went to the browser and vanished from the journal. Register a StarletteHTTPException handler that logs 5xx at ERROR (with exc_info, capturing the chained cause) and 4xx at WARNING, then defers to the default handler. Also configure pipekit's logger to emit to stderr so INFO-level records aren't dropped by uvicorn's last-resort handler. Unhandled (non-HTTPException) errors were already logged by uvicorn. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 09:37:46 -04:00
Paul Trowbridge	7202b86210	web: add columns to an existing module's target table (Phase 0+1) Modules' column listings were write-once at wizard time — no way to add a column to an established sync (e.g. an RRN watermark column on a history table) without hand-editing columns_json and ALTERing the dest by hand. Phase 0 (groundwork): - columns_json rows get a stable `id` (c1, c2, …) — the data-movement identity for future schema reconciliation (the load is positional). - repo.update_module_columns to persist the listing. - Driver.build_add_column_sql + Driver.column_inventory. Phase 1 (append a column): - "+ add column" on the module detail page -> column_form.html. - POST /modules/{id}/columns: validates the name isn't already in the listing or on the table, runs ALTER TABLE … ADD COLUMN (appends at the tail, rows preserved), applies COMMENT ON COLUMN where supported, and appends to columns_json. Re-renders with an error on conflict/DDL failure. Append-only and non-destructive; reorder/retype/drop (which can require a table rebuild) are out of scope for this phase. Verified end-to-end against the live PG dest on a throwaway module. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-12 10:09:12 -04:00
Paul Trowbridge	da0396bff9	web: add SQL-entry path to the new-module wizard The wizard previously required picking a single source table; modules whose entry point is arbitrary SQL (CTEs, joins, computed columns) didn't fit. Add a "write SQL" path alongside "browse a table": - Driver.introspect_query_columns + _zero_row_wrap discover a query's result columns by running it with ~no rows. Generic wrap is a derived table with WHERE 1=0; DB2 appends FETCH FIRST 1 ROW ONLY (DB2 for i forbids WITH inside a nested table expression). - /wizard/sql + POST /wizard/sql/columns seed the column-mapping grid; dest types default to text (no result-set type metadata over jrunner CSV). - wizard_step3.html grows a sql_mode branch (array-named inputs, query shown verbatim, no column unchecking); wizard_create branches on entry_mode. Verified end-to-end against a live DB2 for i connection, including a top-level CTE query. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-11 09:05:39 -04:00
Paul Trowbridge	e7576926ae	drivers: make staging DDL dialect-aware; stop tracking pipekit.db Add dialect-aware DDL hooks to the Driver base (create_schema_sql, drop_table_if_exists_sql, create_like_table_sql, check_dest_table) and implement DB2/MSSQL overrides so they can serve as merge destinations, not just Postgres. runner.py now dispatches staging table creation through the dest driver instead of hardcoding PG syntax. Also untrack pipekit.db (runtime SQLite state) and add it to .gitignore. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-10 23:33:36 -04:00
Paul Trowbridge	f8490a2d4f	deploy/cli: fix /etc/pipekit permissions so non-root group members can write secrets - deploy.sh: set /etc/pipekit to root:pipekit 0775 and secrets.env to pipekit:pipekit 0640 so group members can run 'pipekit secrets set' without sudo - cli.py secrets set: drop os.chown() on temp file — non-root users can't chown to the pipekit service user, and os.replace() preserves the target file's ownership anyway Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 13:35:06 -04:00
Paul Trowbridge	31135cf5be	web: add session-cookie login for web UI - New pipekit/web/auth.py: itsdangerous-signed cookie, 8hr expiry, auto-generates signing secret in settings table on first use - GET/POST /login and POST /logout routes (public, no auth dependency) - All other web routes protected via router-level require_web_auth dep - Starlette middleware injects request.state.current_user for templates - Topbar shows logged-in username + logout button when session active - Reuses existing api_user/api_pass credentials and api_auth_enabled flag - Add itsdangerous>=2.1 to requirements.txt - Enable api_auth_enabled in config.yaml Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 13:21:50 -04:00
Paul Trowbridge	583dc16c9b	web: tighten module list column widths and fix form input overflow - Size all table columns to fit content (em-based) rather than loose percentages - Add white-space:nowrap to groups header and last-run cells - Sticky topbar and panel header so New Module button stays visible while scrolling - Scope min-width:0 to label.field inputs so they don't blow past two-col grid borders Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-04 10:52:23 -04:00
Paul Trowbridge	ba48b2ca2b	CLAUDE.md: document local time scheduling and JAVA_HOME deploy behavior Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 23:51:12 -04:00
Paul Trowbridge	a6cc8da83f	Merge branch 'localtime' - Convert all timestamps to local time for display and scheduling - deploy.sh: detect JAVA_HOME and inject into systemd unit at deploy time - repo: add duration_s to get_group_run query	2026-06-03 23:49:43 -04:00
Paul Trowbridge	c8b507cdc3	repo: add duration_s to get_group_run query The group run detail page was crashing because get_group_run returned no duration_s field, unlike the list queries. Fixes 500 on /group-runs/{id}. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 23:48:45 -04:00
Paul Trowbridge	ed3653f410	deploy.sh: detect JAVA_HOME and inject into systemd unit at deploy time The pipekit system user has no PATH to java. deploy.sh now detects JAVA_HOME by searching common locations and injects Environment= lines into the installed unit file, making deploys portable across machines. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 23:34:27 -04:00
Paul Trowbridge	a66488d1f2	Convert all timestamps to local time for display and scheduling Scheduler now evaluates cron expressions against local time instead of UTC, so schedules fire at the user's local clock time. All timestamp displays in templates use a new `localtime` Jinja filter that converts UTC strings from SQLite to the server's local timezone. Updated CLAUDE.md to reflect the systemd service setup. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 23:21:05 -04:00
Paul Trowbridge	95292bd3f8	deploy.sh: fix SQLite journal permissions; cli: upsert drivers by kind SQLite needs write access to the repo directory to create journal files alongside pipekit.db. Fixed by setting group pipekit + g+w on the directory itself only (not recursive). Driver registration now matches existing rows by kind before falling back to name, so re-deploys update the correct row regardless of what name was used at initial registration. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 21:42:55 -04:00
Paul Trowbridge	a3ff5337ee	deploy.sh: chown only pipekit.db, not the whole repo Avoids stripping write access from the developer. The service only needs to own pipekit.db (runtime writes) and .venv (created as pipekit). Source code stays owned by whoever ran deploy.sh. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 21:37:18 -04:00
Paul Trowbridge	f32706be01	Verbose deploy.sh, fix driver upsert, and fix pip cache warning deploy.sh now prints each step with what it's doing, adds the invoking user to the pipekit group automatically, uses --home-dir /nonexistent for the system user, and passes --no-cache-dir to pip to suppress the home directory warning. cli.py: removed the kind-based early-exit in drivers register that was short-circuiting before the upsert logic, so re-running deploy now correctly updates existing driver rows rather than printing "already registered". Also removed the now-unused --force flag. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 21:35:25 -04:00
Paul Trowbridge	c34fcb38ed	Add scheduling, harden deploy, and update docs Scheduling: cron-based group runs via a daemon thread (scheduler.py) started at API startup. Schedules managed inline on the group edit form. last_fired_at persisted before run to prevent double-fire on restart. Requires croniter (added to requirements.txt); DB migration adds last_fired_at column to schedule table. Deploy: deploy.sh now creates the pipekit system user, chowns the repo, builds the venv as pipekit, and installs/enables the systemd unit. systemd/pipekit.service is now a production-ready unit (User=pipekit uncommented). pipekit secrets set preserves existing file permissions instead of resetting to 0600. Driver registration is now idempotent (upsert via get_driver_by_name + update_driver). Docs: CLAUDE.md and SPEC.md updated to reflect groups, scheduling, scheduler-in-API-process architecture, TUI deferred (not dropped), stop-on-failure tradeoff, jrunner as prerequisite, and deploy flow. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 21:18:13 -04:00
Paul Trowbridge	31d670b4e6	Add group runs and fix wizard identifier sanitization for spaced column names Groups allow multiple modules to be run sequentially in a defined order. Adds full CRUD (repo, engine orchestrator, web routes, templates) for grp, group_member, and group_run tables that were previously schema-only. Module index now shows group membership badges per module. Wizard default dest name now sanitizes source column names with spaces or special characters to valid identifiers rather than failing at CREATE TABLE time. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-02 00:05:37 -04:00
Paul Trowbridge	70e4d79edf	track pipekit.db in git for backup purposes Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-30 22:48:10 -04:00
Paul Trowbridge	780f73021c	Run buttons stay on page with live status updates via HTMX polling Module detail and index Run/Dry run buttons no longer redirect to the run page. The status cell (index) and recent runs panel (detail) poll every 3s while running and stop automatically when idle. force_poll ensures polling starts immediately after clicking Run despite the race between the HTTP response and the background task setting running=1. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-22 12:31:14 -04:00
Paul Trowbridge	bb4f7712d2	Allow dots in identifiers and strip surrounding quotes in validate_identifier Supports iSeries schema names that contain dots (e.g. CMS.CUSLG). Strips surrounding double quotes on input so users don't need to worry about quoting — the driver's quote_identifier handles that when building SQL. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-21 00:43:10 -04:00
Paul Trowbridge	f39b1df75e	Fix missing run start time and add duration to run history create_run now sets started_at on INSERT. list_runs computes duration_s via julianday arithmetic. Both the module detail and runs page show duration formatted as Xs or Xm Ys. A Jinja2 filter handles formatting. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-21 00:23:31 -04:00
Paul Trowbridge	595024eb52	Unify incremental sync config: inline watermarks + editable source query Watermarks, merge strategy, merge key, and source query are now edited together in one form on both the module edit page and wizard step 3. A client-side placeholder warning fires when {name} tokens in the query don't match the watermark rows on the page. The wizard now shows an editable source query textarea pre-populated from column picks so WHERE clauses can be added before module creation. Watermarks submitted via wm_* arrays are processed by _save_inline_watermarks() in both module_update and wizard_create. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-18 21:31:55 -04:00
Paul Trowbridge	99f75490c4	Add live progress to module runs: async web POST + HTMX polling Web POST /modules/{id}/run now returns immediately (BackgroundTasks) instead of blocking until the run completes. jrunner.migrate() switches from subprocess.run to Popen so stdout lines are read as they arrive and appended to run_log.live_log via repo.append_run_live_log(). The run detail page embeds an HTMX fragment that polls /runs/{id}/live every 2s while status=running, showing current status, row count, and live output; polling stops automatically once the run finishes. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-08 14:02:36 -04:00
Paul Trowbridge	760d4e7fec	fix incremental watermark wiring and dry_run status - Wire watermark WHERE clause into GL20000 source query ({dex_row_id} placeholder was present but query had no WHERE clause) - Fix watermark resolver connection for GL20000 (was pointing at AS400, should be postgres dest) - Resolve watermarks live on dry runs and module detail page load instead of using defaults - Use status='dry_run' (not 'success') for dry runs so they can be filtered from recent runs UI - Add exclude_status param to repo.list_runs; module detail excludes dry_run rows - Expand run_log CHECK constraint to include 'dry_run'; backfill 16 historical records - Delete SPEC_v1_archive.md (obsolete v1 design doc) - Update SPEC.md and CLAUDE.md to reflect current engine flow and status values Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-07 19:03:38 -04:00
Paul Trowbridge	dfc76a96d8	Add CLAUDE.md for Claude Code guidance Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 21:06:37 -04:00
Paul Trowbridge	546242e11a	Wizard step 2: schema browser panel with datalist autocomplete Adds a /wizard/schemas JSON endpoint and a live-filtered schema picker panel on step 2. Clicking a row fills the schema input; the datalist also powers browser autocomplete. MSSQL refetches when database or linked_server qualifiers change. CSS fixes prevent picker tables and two-col grid items from overflowing their containers. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 20:56:18 -04:00
Paul Trowbridge	ff19ae9b81	Drivers: add list_schemas() to base, PG, DB2, MSSQL Base provides a no-op default; drivers opt in by overriding. MSSQL scopes the lookup to a linked server / database when those qualifiers are supplied. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 20:56:10 -04:00
Paul Trowbridge	f18ea55a12	Wizard: warn in-UI when default dest table already exists. Previously the existing-dest check fired on submit and surfaced as a raw JSON 400. Now step 3 introspects the default dest up front and renders a yellow banner listing existing columns; submit-time mismatches render wizard_error.html (409) with missing vs. existing side-by-side and a back link that re-plumbs the form qvals. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 13:27:23 -04:00
Paul Trowbridge	bb0b493d18	Wizard step 2: add jump-to-columns shortcut for known tables. New text input + "jump to columns" button skip the full table listing when you already know what you want. Typing "schema.table" and tabbing out auto-splits into the schema qualifier + table name. Jump button stays disabled until the table field has a value. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-23 00:41:00 -04:00
Paul Trowbridge	fde4fa99b6	Wizard: don't clobber pre-existing dest tables. If the dest table already exists, introspect its columns and verify the wizard's picks line up. Missing columns surface a specific error message naming what's missing instead of the opaque "column X does not exist" from a failed COMMENT. On match, skip CREATE + COMMENT so existing schema and comments aren't touched; staging still gets provisioned. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-23 00:41:00 -04:00
Paul Trowbridge	4650a3cbc5	bin/pipekit auto-detects venv; stop rewriting it in deploy.sh. The tracked launcher now checks for .venv/bin/python3 under the repo and uses it if present, else falls back to system python3. Works pre-deploy (no venv) and post-deploy (venv exists) without being modified. Deploy no longer regenerates the file, so `git pull` on a deployed box won't conflict with the launcher. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-23 00:34:32 -04:00
Paul Trowbridge	c205b48be2	Honor api_host in config.yaml; ignore .venv/ created by deploy.sh. cmd_serve now reads api_host from Config with a 127.0.0.1 safe default, matching the existing api_port pattern. --host/--port CLI flags still override. Local config is bumped to bind 0.0.0.0:8200. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-23 00:33:56 -04:00
Paul Trowbridge	1c3586eb2f	deploy.sh: pass -H to sudo so pip doesn't warn about user cache. Without -H, sudo keeps HOME pointed at the invoking user, so pip running as root tries to write to /home/<user>/.cache/pip and disables caching with a warning. -H resets HOME to /root while -E preserves the rest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 23:59:23 -04:00
Paul Trowbridge	e6a615bf70	Add deploy.sh, systemd unit template, and `pipekit secrets` CLI. deploy.sh is the idempotent rollout path: venv + deps, launcher, /etc/pipekit/secrets.env skeleton (mode 0600), schema init, and auto-register of every JDBC driver shipped with jrunner. systemd unit is a template, not auto-installed — user copies it when ready to cut over. `pipekit secrets {list,set,unset}` manages /etc/pipekit/secrets.env with atomic 0600 writes so passwords don't need sudoedit. Prompted input by default; positional value allowed for scripting. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 22:34:38 -04:00
Paul Trowbridge	e27167a4a3	Add `pipekit drivers register` for seeding JDBC driver rows. Registers a driver-table row from the CLI. Kind is validated against the code-level driver registry; JDBC class names default from a built-in table (db2, pg, mssql). Refuses to double-register a kind unless --force is passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 22:01:59 -04:00
Paul Trowbridge	01bcba78b4	Snap staging DDL on module create/edit/run; allowlist benign jrunner exception. Staging table drift caused silent data loss when dest grew columns but staging kept the old shape. Fix on three fronts: - Runner now DROP+CREATEs staging each run instead of CREATE IF NOT EXISTS, so any drift self-heals. - Wizard create drop+creates staging right after dest is provisioned, surfacing DDL errors at create time. - Module edit drops the (old-name) staging table and re-applies COMMENT ON TABLE when dest_description changed. jrunner's query mode uses executeQuery() which raises "No results were returned by the query" after DDL/DML succeeds; the stack-trace detector now allowlists that exception so normal CREATE/TRUNCATE/INSERT runs aren't flagged as failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 20:10:36 -04:00
Paul Trowbridge	2ef68d766c	Add module edit page + detect jrunner silent failures. Modules get a full edit form (name, connections, tables, source query, merge config, description, enabled); reachable via Edit button on the detail page and the source-query panel. jrunner catches SQLException and calls System.exit(0) at every failure site, so pipekit was marking runs success when the migrate phase had actually errored. query() and migrate() now scan stdout+stderr for a Java stack-trace signature and raise JrunnerError. runner.py also captures the failed jrunner output onto run_log so the stack trace is visible on the run detail page. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 11:02:45 -04:00
Paul Trowbridge	d952b48a4e	Add module delete + fail-fast on duplicate module name in wizard. Delete button lives in module-detail header, refuses to delete a running module, and clears run_log history first since it doesn't cascade from module. Wizard now returns 409 on duplicate name before touching the destination, so a failed resubmit doesn't redundantly rerun CREATE TABLE / COMMENT ON on the dest. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 08:28:51 -04:00
Paul Trowbridge	574ada5258	Initial commit: Pipekit rewrite. Orchestration layer around the jrunner Java JDBC CLI, replacing the previous shell-based sync system in .archive/pre-rewrite. Includes the FastAPI + Jinja web frontend, per-driver adapters (DB2, MSSQL, PG), wizard-driven module creation with editable dest types and source-sourced table/column descriptions, watermark/hook CRUD, and the engine that runs modules end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-22 00:38:26 -04:00

44 Commits