Commit Graph

38 Commits

Author SHA1 Message Date
7202b86210 web: add columns to an existing module's target table (Phase 0+1)
Modules' column listings were write-once at wizard time — no way to add a
column to an established sync (e.g. an RRN watermark column on a history
table) without hand-editing columns_json and ALTERing the dest by hand.

Phase 0 (groundwork):
- columns_json rows get a stable `id` (c1, c2, …) — the data-movement
  identity for future schema reconciliation (the load is positional).
- repo.update_module_columns to persist the listing.
- Driver.build_add_column_sql + Driver.column_inventory.

Phase 1 (append a column):
- "+ add column" on the module detail page -> column_form.html.
- POST /modules/{id}/columns: validates the name isn't already in the
  listing or on the table, runs ALTER TABLE … ADD COLUMN (appends at the
  tail, rows preserved), applies COMMENT ON COLUMN where supported, and
  appends to columns_json. Re-renders with an error on conflict/DDL failure.

Append-only and non-destructive; reorder/retype/drop (which can require a
table rebuild) are out of scope for this phase. Verified end-to-end against
the live PG dest on a throwaway module.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 10:09:12 -04:00
da0396bff9 web: add SQL-entry path to the new-module wizard
The wizard previously required picking a single source table; modules whose
entry point is arbitrary SQL (CTEs, joins, computed columns) didn't fit. Add a
"write SQL" path alongside "browse a table":

- Driver.introspect_query_columns + _zero_row_wrap discover a query's result
  columns by running it with ~no rows. Generic wrap is a derived table with
  WHERE 1=0; DB2 appends FETCH FIRST 1 ROW ONLY (DB2 for i forbids WITH inside
  a nested table expression).
- /wizard/sql + POST /wizard/sql/columns seed the column-mapping grid; dest
  types default to text (no result-set type metadata over jrunner CSV).
- wizard_step3.html grows a sql_mode branch (array-named inputs, query shown
  verbatim, no column unchecking); wizard_create branches on entry_mode.

Verified end-to-end against a live DB2 for i connection, including a top-level
CTE query.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 09:05:39 -04:00
e7576926ae drivers: make staging DDL dialect-aware; stop tracking pipekit.db
Add dialect-aware DDL hooks to the Driver base (create_schema_sql,
drop_table_if_exists_sql, create_like_table_sql, check_dest_table) and
implement DB2/MSSQL overrides so they can serve as merge destinations,
not just Postgres. runner.py now dispatches staging table creation
through the dest driver instead of hardcoding PG syntax.

Also untrack pipekit.db (runtime SQLite state) and add it to .gitignore.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 23:33:36 -04:00
f8490a2d4f deploy/cli: fix /etc/pipekit permissions so non-root group members can write secrets
- deploy.sh: set /etc/pipekit to root:pipekit 0775 and secrets.env to
  pipekit:pipekit 0640 so group members can run 'pipekit secrets set'
  without sudo
- cli.py secrets set: drop os.chown() on temp file — non-root users
  can't chown to the pipekit service user, and os.replace() preserves
  the target file's ownership anyway

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-04 13:35:06 -04:00
31135cf5be web: add session-cookie login for web UI
- New pipekit/web/auth.py: itsdangerous-signed cookie, 8hr expiry,
  auto-generates signing secret in settings table on first use
- GET/POST /login and POST /logout routes (public, no auth dependency)
- All other web routes protected via router-level require_web_auth dep
- Starlette middleware injects request.state.current_user for templates
- Topbar shows logged-in username + logout button when session active
- Reuses existing api_user/api_pass credentials and api_auth_enabled flag
- Add itsdangerous>=2.1 to requirements.txt
- Enable api_auth_enabled in config.yaml

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-04 13:21:50 -04:00
583dc16c9b web: tighten module list column widths and fix form input overflow
- Size all table columns to fit content (em-based) rather than loose percentages
- Add white-space:nowrap to groups header and last-run cells
- Sticky topbar and panel header so New Module button stays visible while scrolling
- Scope min-width:0 to label.field inputs so they don't blow past two-col grid borders

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-04 10:52:23 -04:00
ba48b2ca2b CLAUDE.md: document local time scheduling and JAVA_HOME deploy behavior
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 23:51:12 -04:00
a6cc8da83f Merge branch 'localtime'
- Convert all timestamps to local time for display and scheduling
- deploy.sh: detect JAVA_HOME and inject into systemd unit at deploy time
- repo: add duration_s to get_group_run query
2026-06-03 23:49:43 -04:00
c8b507cdc3 repo: add duration_s to get_group_run query
The group run detail page was crashing because get_group_run returned no
duration_s field, unlike the list queries. Fixes 500 on /group-runs/{id}.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 23:48:45 -04:00
ed3653f410 deploy.sh: detect JAVA_HOME and inject into systemd unit at deploy time
The pipekit system user has no PATH to java. deploy.sh now detects
JAVA_HOME by searching common locations and injects Environment= lines
into the installed unit file, making deploys portable across machines.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 23:34:27 -04:00
a66488d1f2 Convert all timestamps to local time for display and scheduling
Scheduler now evaluates cron expressions against local time instead of
UTC, so schedules fire at the user's local clock time. All timestamp
displays in templates use a new `localtime` Jinja filter that converts
UTC strings from SQLite to the server's local timezone. Updated CLAUDE.md
to reflect the systemd service setup.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 23:21:05 -04:00
95292bd3f8 deploy.sh: fix SQLite journal permissions; cli: upsert drivers by kind
SQLite needs write access to the repo directory to create journal files
alongside pipekit.db. Fixed by setting group pipekit + g+w on the
directory itself only (not recursive).

Driver registration now matches existing rows by kind before falling
back to name, so re-deploys update the correct row regardless of what
name was used at initial registration.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 21:42:55 -04:00
a3ff5337ee deploy.sh: chown only pipekit.db, not the whole repo
Avoids stripping write access from the developer. The service only needs
to own pipekit.db (runtime writes) and .venv (created as pipekit).
Source code stays owned by whoever ran deploy.sh.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 21:37:18 -04:00
f32706be01 Verbose deploy.sh, fix driver upsert, and fix pip cache warning
deploy.sh now prints each step with what it's doing, adds the invoking
user to the pipekit group automatically, uses --home-dir /nonexistent
for the system user, and passes --no-cache-dir to pip to suppress the
home directory warning.

cli.py: removed the kind-based early-exit in drivers register that was
short-circuiting before the upsert logic, so re-running deploy now
correctly updates existing driver rows rather than printing "already
registered". Also removed the now-unused --force flag.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 21:35:25 -04:00
c34fcb38ed Add scheduling, harden deploy, and update docs
Scheduling: cron-based group runs via a daemon thread (scheduler.py)
started at API startup. Schedules managed inline on the group edit form.
last_fired_at persisted before run to prevent double-fire on restart.
Requires croniter (added to requirements.txt); DB migration adds
last_fired_at column to schedule table.

Deploy: deploy.sh now creates the pipekit system user, chowns the repo,
builds the venv as pipekit, and installs/enables the systemd unit.
systemd/pipekit.service is now a production-ready unit (User=pipekit
uncommented). pipekit secrets set preserves existing file permissions
instead of resetting to 0600. Driver registration is now idempotent
(upsert via get_driver_by_name + update_driver).

Docs: CLAUDE.md and SPEC.md updated to reflect groups, scheduling,
scheduler-in-API-process architecture, TUI deferred (not dropped),
stop-on-failure tradeoff, jrunner as prerequisite, and deploy flow.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 21:18:13 -04:00
31d670b4e6 Add group runs and fix wizard identifier sanitization for spaced column names
Groups allow multiple modules to be run sequentially in a defined order.
Adds full CRUD (repo, engine orchestrator, web routes, templates) for grp,
group_member, and group_run tables that were previously schema-only. Module
index now shows group membership badges per module. Wizard default dest name
now sanitizes source column names with spaces or special characters to valid
identifiers rather than failing at CREATE TABLE time.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-02 00:05:37 -04:00
70e4d79edf track pipekit.db in git for backup purposes
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-30 22:48:10 -04:00
780f73021c Run buttons stay on page with live status updates via HTMX polling
Module detail and index Run/Dry run buttons no longer redirect to the
run page. The status cell (index) and recent runs panel (detail) poll
every 3s while running and stop automatically when idle. force_poll
ensures polling starts immediately after clicking Run despite the race
between the HTTP response and the background task setting running=1.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 12:31:14 -04:00
bb4f7712d2 Allow dots in identifiers and strip surrounding quotes in validate_identifier
Supports iSeries schema names that contain dots (e.g. CMS.CUSLG).
Strips surrounding double quotes on input so users don't need to worry
about quoting — the driver's quote_identifier handles that when building SQL.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-21 00:43:10 -04:00
f39b1df75e Fix missing run start time and add duration to run history
create_run now sets started_at on INSERT. list_runs computes duration_s
via julianday arithmetic. Both the module detail and runs page show
duration formatted as Xs or Xm Ys. A Jinja2 filter handles formatting.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-21 00:23:31 -04:00
595024eb52 Unify incremental sync config: inline watermarks + editable source query
Watermarks, merge strategy, merge key, and source query are now edited
together in one form on both the module edit page and wizard step 3.
A client-side placeholder warning fires when {name} tokens in the query
don't match the watermark rows on the page. The wizard now shows an
editable source query textarea pre-populated from column picks so WHERE
clauses can be added before module creation. Watermarks submitted via
wm_* arrays are processed by _save_inline_watermarks() in both
module_update and wizard_create.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-18 21:31:55 -04:00
99f75490c4 Add live progress to module runs: async web POST + HTMX polling
Web POST /modules/{id}/run now returns immediately (BackgroundTasks)
instead of blocking until the run completes. jrunner.migrate() switches
from subprocess.run to Popen so stdout lines are read as they arrive and
appended to run_log.live_log via repo.append_run_live_log(). The run
detail page embeds an HTMX fragment that polls /runs/{id}/live every 2s
while status=running, showing current status, row count, and live output;
polling stops automatically once the run finishes.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-08 14:02:36 -04:00
760d4e7fec fix incremental watermark wiring and dry_run status
- Wire watermark WHERE clause into GL20000 source query ({dex_row_id} placeholder was present but query had no WHERE clause)
- Fix watermark resolver connection for GL20000 (was pointing at AS400, should be postgres dest)
- Resolve watermarks live on dry runs and module detail page load instead of using defaults
- Use status='dry_run' (not 'success') for dry runs so they can be filtered from recent runs UI
- Add exclude_status param to repo.list_runs; module detail excludes dry_run rows
- Expand run_log CHECK constraint to include 'dry_run'; backfill 16 historical records
- Delete SPEC_v1_archive.md (obsolete v1 design doc)
- Update SPEC.md and CLAUDE.md to reflect current engine flow and status values

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 19:03:38 -04:00
dfc76a96d8 Add CLAUDE.md for Claude Code guidance
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 21:06:37 -04:00
546242e11a Wizard step 2: schema browser panel with datalist autocomplete
Adds a /wizard/schemas JSON endpoint and a live-filtered schema picker
panel on step 2. Clicking a row fills the schema input; the datalist
also powers browser autocomplete. MSSQL refetches when database or
linked_server qualifiers change. CSS fixes prevent picker tables and
two-col grid items from overflowing their containers.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 20:56:18 -04:00
ff19ae9b81 Drivers: add list_schemas() to base, PG, DB2, MSSQL
Base provides a no-op default; drivers opt in by overriding. MSSQL
scopes the lookup to a linked server / database when those qualifiers
are supplied.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 20:56:10 -04:00
f18ea55a12 Wizard: warn in-UI when default dest table already exists.
Previously the existing-dest check fired on submit and surfaced as a raw
JSON 400. Now step 3 introspects the default dest up front and renders a
yellow banner listing existing columns; submit-time mismatches render
wizard_error.html (409) with missing vs. existing side-by-side and a back
link that re-plumbs the form qvals.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 13:27:23 -04:00
bb0b493d18 Wizard step 2: add jump-to-columns shortcut for known tables.
New text input + "jump to columns" button skip the full table listing
when you already know what you want. Typing "schema.table" and tabbing
out auto-splits into the schema qualifier + table name. Jump button
stays disabled until the table field has a value.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 00:41:00 -04:00
fde4fa99b6 Wizard: don't clobber pre-existing dest tables.
If the dest table already exists, introspect its columns and verify the
wizard's picks line up. Missing columns surface a specific error message
naming what's missing instead of the opaque "column X does not exist"
from a failed COMMENT. On match, skip CREATE + COMMENT so existing
schema and comments aren't touched; staging still gets provisioned.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 00:41:00 -04:00
4650a3cbc5 bin/pipekit auto-detects venv; stop rewriting it in deploy.sh.
The tracked launcher now checks for .venv/bin/python3 under the repo and
uses it if present, else falls back to system python3. Works pre-deploy
(no venv) and post-deploy (venv exists) without being modified. Deploy
no longer regenerates the file, so `git pull` on a deployed box won't
conflict with the launcher.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 00:34:32 -04:00
c205b48be2 Honor api_host in config.yaml; ignore .venv/ created by deploy.sh.
cmd_serve now reads api_host from Config with a 127.0.0.1 safe default,
matching the existing api_port pattern. --host/--port CLI flags still
override. Local config is bumped to bind 0.0.0.0:8200.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 00:33:56 -04:00
1c3586eb2f deploy.sh: pass -H to sudo so pip doesn't warn about user cache.
Without -H, sudo keeps HOME pointed at the invoking user, so pip running
as root tries to write to /home/<user>/.cache/pip and disables caching
with a warning. -H resets HOME to /root while -E preserves the rest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 23:59:23 -04:00
e6a615bf70 Add deploy.sh, systemd unit template, and pipekit secrets CLI.
deploy.sh is the idempotent rollout path: venv + deps, launcher,
/etc/pipekit/secrets.env skeleton (mode 0600), schema init, and
auto-register of every JDBC driver shipped with jrunner. systemd
unit is a template, not auto-installed — user copies it when ready
to cut over.

`pipekit secrets {list,set,unset}` manages /etc/pipekit/secrets.env
with atomic 0600 writes so passwords don't need sudoedit. Prompted
input by default; positional value allowed for scripting.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 22:34:38 -04:00
e27167a4a3 Add pipekit drivers register for seeding JDBC driver rows.
Registers a driver-table row from the CLI. Kind is validated against
the code-level driver registry; JDBC class names default from a
built-in table (db2, pg, mssql). Refuses to double-register a kind
unless --force is passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 22:01:59 -04:00
01bcba78b4 Snap staging DDL on module create/edit/run; allowlist benign jrunner exception.
Staging table drift caused silent data loss when dest grew columns but
staging kept the old shape. Fix on three fronts:

- Runner now DROP+CREATEs staging each run instead of CREATE IF NOT
  EXISTS, so any drift self-heals.
- Wizard create drop+creates staging right after dest is provisioned,
  surfacing DDL errors at create time.
- Module edit drops the (old-name) staging table and re-applies
  COMMENT ON TABLE when dest_description changed.

jrunner's query mode uses executeQuery() which raises
"No results were returned by the query" after DDL/DML succeeds; the
stack-trace detector now allowlists that exception so normal
CREATE/TRUNCATE/INSERT runs aren't flagged as failures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 20:10:36 -04:00
2ef68d766c Add module edit page + detect jrunner silent failures.
Modules get a full edit form (name, connections, tables, source query,
merge config, description, enabled); reachable via Edit button on the
detail page and the source-query panel.

jrunner catches SQLException and calls System.exit(0) at every failure
site, so pipekit was marking runs success when the migrate phase had
actually errored. query() and migrate() now scan stdout+stderr for a
Java stack-trace signature and raise JrunnerError. runner.py also
captures the failed jrunner output onto run_log so the stack trace is
visible on the run detail page.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 11:02:45 -04:00
d952b48a4e Add module delete + fail-fast on duplicate module name in wizard.
Delete button lives in module-detail header, refuses to delete a
running module, and clears run_log history first since it doesn't
cascade from module. Wizard now returns 409 on duplicate name before
touching the destination, so a failed resubmit doesn't redundantly
rerun CREATE TABLE / COMMENT ON on the dest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 08:28:51 -04:00
574ada5258 Initial commit: Pipekit rewrite.
Orchestration layer around the jrunner Java JDBC CLI, replacing the
previous shell-based sync system in .archive/pre-rewrite. Includes
the FastAPI + Jinja web frontend, per-driver adapters (DB2, MSSQL,
PG), wizard-driven module creation with editable dest types and
source-sourced table/column descriptions, watermark/hook CRUD,
and the engine that runs modules end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 00:38:26 -04:00