pt/pipekit

Paul Trowbridge dfc76a96d8 Add CLAUDE.md for Claude Code guidance

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

2026-05-02 21:06:37 -04:00

4.9 KiB

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Running the Server

pipekit serve                          # default host/port from config.yaml
pipekit serve --host 0.0.0.0 --port 8080 --reload   # dev mode with auto-reload

The user runs the server themselves in their own terminal — do not nohup or background it.

Other CLI Commands

pipekit init                           # create/upgrade SQLite schema
pipekit doctor                         # health check (config, jrunner, DB)
pipekit run <module_name>              # run a module synchronously (manual test)
pipekit set-password <username>        # set HTTP Basic Auth credentials
pipekit secrets set KEY [VALUE]        # add/update a secret (prompts if value omitted)
pipekit drivers list                   # show registered driver kinds
./deploy.sh                            # idempotent: venv, deps, launcher, driver registration

Architecture

Pipekit is a database sync tool. A module defines a source query → dest table sync job. The engine runs a module by: resolving watermarks → materializing source SQL → staging data via jrunner → merging into dest.

Layers (bottom to top):

SQLite (pipekit.db) — single file, all state
repo.py — all CRUD for every table; ~1,900 LOC, the only layer that touches the DB
engine/ — orchestrates a module run: lock acquisition, watermark resolution, jrunner calls, merge SQL, post-run hooks, run_log write, lock release
jrunner — external Java CLI; handles all JDBC access. Python never talks to remote DBs directly; it shells out to jrunner
api/ — FastAPI REST endpoints under /api/*, HTTP Basic Auth (except /health)
web/ — HTML pages (Jinja2 templates) at /; HTMX + Alpine.js for interactivity

Key Data Model

driver — JDBC driver registration (kind, jar, class, url_template)
connection — named DB connection (jdbc_url, username, password as $ENV_VAR reference)
module — sync job (source/dest connection, source query, merge strategy, merge key, enabled, running lock)
watermark — named placeholder with resolver SQL; first column of first row used as opaque string; replaces {watermark_name} in source query
hook — post-merge SQL (run_order, run_on: success/failure/always)
run_log — immutable history record with resolved SQL, merge SQL, watermark values, stdout/stderr, timing

Engine Flow

run_module(module_id)
  → atomic UPDATE module SET running=1 WHERE running=0  # fail if already locked
  → for each watermark: run resolver SQL via jrunner → capture first cell
  → materialize source query (simple string replace {name} → value)
  → jrunner create staging table (pipekit_staging.{module_name})
  → jrunner migrate source → staging
  → build merge SQL (engine/merge.py: full=truncate+insert, incremental=delete by key+insert, append=insert)
  → run merge SQL via jrunner
  → run hooks in order
  → write run_log entry
  → UPDATE module SET running=0 (in finally)

Credentials Pattern

Passwords in connections are stored as $VAR_NAME references. At run time they are resolved from /etc/pipekit/secrets.env (override with PIPEKIT_SECRETS env var). Config path override: PIPEKIT_CONFIG.

Driver Abstraction

Each driver in pipekit/drivers/ inherits from base.py::Driver and implements: browse_fields, list_tables, list_schemas, get_columns, map_type, default_expression, quote_identifier. The wizard UI calls /api/introspect/* which dispatches to the appropriate driver.

Module Columns

Modules store their column mapping as columns_json — a JSON list of dicts with keys source_name, source_type, dest_name, dest_type. The engine uses this to build the staging CREATE TABLE and the merge INSERT column lists.

Merge Key

merge_key is stored as a comma-separated string (e.g., "col1, col2"). The engine parses it and generates a multi-column DELETE predicate for incremental strategy.

Staging Table

Recreated on every run as pipekit_staging.{module_name} (DROP + CREATE, not IF NOT EXISTS). Ephemeral — exists only during the run.

API vs. Web

/api/* — JSON REST, HTTP Basic Auth, consumed by HTMX fragments and external callers
/ and other bare paths — full HTML pages (Jinja2), no auth currently
POST /modules/{id}/run returns {run_id} immediately; run is async; poll /runs/{id} for status

Tech Stack

Python 3.10+, FastAPI, Uvicorn, Jinja2, PyYAML, SQLite3 (stdlib)
python-multipart required for HTML form POSTs (not auto-installed as a FastAPI transitive dep)
Frontend: HTMX + Alpine.js (CDN), no build step
jrunner: separate Java tool, must be on PATH

Full Spec

/opt/pipekit/SPEC.md is the authoritative design document. Read it for deep rationale on any architectural decision.

4.9 KiB Raw Permalink Blame History