jrunner: use bulk copy (-b) for Postgres dests too (COPY)

Extend the -b wiring to jdbc:postgresql: dests, so DB2->PG (and any PG-dest)
loads use jrunner's COPY FROM STDIN path instead of batched INSERTs. SQL
Server already used -b (SQLServerBulkCopy); DB2 dests stay on INSERT. Update
the CLAUDE.md bulk section accordingly.

Validated DB2->PG COPY with real types (dates -> date col, decimals ->
numeric, char) and null/empty-string fidelity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Paul Trowbridge 2026-06-18 23:14:15 -04:00
parent f4d8cd005d
commit 0ddb636f14
2 changed files with 7 additions and 6 deletions

View File

@ -122,9 +122,9 @@ Watermarks are managed inline on both the module edit form and wizard step 3 (no
Recreated on every run as `pipekit_staging.{module_name}` (DROP + CREATE, not IF NOT EXISTS). Ephemeral — exists only during the run. Recreated on every run as `pipekit_staging.{module_name}` (DROP + CREATE, not IF NOT EXISTS). Ephemeral — exists only during the run.
## Bulk Copy (SQL Server dest) ## Bulk Copy
`jrunner.migrate` passes jrunner's `-b` flag when the dest JDBC URL starts with `jdbc:sqlserver:`, so SQL Server loads stream via `SQLServerBulkCopy` (TDS bulk-load) instead of batched INSERTs — dramatically faster on large/wide tables (a 1.27M-row load went ~111 min → ~4 min). DB2/PG dests keep the INSERT path. This is automatic per-dest; no module config. (Requires jrunner with `-b` support.) Note: jrunner only streams the **Postgres source** without buffering it all into memory because it sets `autoCommit(false)` on the source connection in migration mode — a PG-driver requirement for `setFetchSize` to take effect. `jrunner.migrate` passes jrunner's `-b` flag when the dest is SQL Server (`jdbc:sqlserver:`) or Postgres (`jdbc:postgresql:`), so loads use the dest's native bulk path instead of batched `INSERT…VALUES`**SQL Server** via `SQLServerBulkCopy` (TDS bulk-load), **Postgres** via `COPY … FROM STDIN`. Dramatically faster on large/wide tables (a 1.27M-row SQL Server load went ~111 min → ~4 min). DB2 dests keep the INSERT path. Automatic per-dest; no module config. (Requires jrunner with `-b` support.) Note: jrunner only streams the **Postgres source** without buffering it all into memory because it sets `autoCommit(false)` on the source connection in migration mode — a PG-driver requirement for `setFetchSize` to take effect.
## Scheduler ## Scheduler

View File

@ -171,10 +171,11 @@ def migrate(
argv.append("-t") argv.append("-t")
if clear: if clear:
argv.append("-c") argv.append("-c")
# SQL Server dest: stream via TDS bulk copy instead of INSERT...VALUES # Use jrunner's native bulk load instead of INSERT...VALUES round trips
# round trips (much faster on wide/large tables). jrunner -b is a no-op # (much faster on wide/large tables): SQL Server -> SQLServerBulkCopy,
# for non-SQL-Server dests, but only pass it where it applies. # Postgres -> COPY FROM STDIN. Only pass -b where jrunner supports it.
if (dest_conn.get("jdbc_url") or "").lower().startswith("jdbc:sqlserver:"): _durl = (dest_conn.get("jdbc_url") or "").lower()
if _durl.startswith("jdbc:sqlserver:") or _durl.startswith("jdbc:postgresql:"):
argv.append("-b") argv.append("-b")
proc = subprocess.Popen(argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE, proc = subprocess.Popen(argv, stdout=subprocess.PIPE, stderr=subprocess.PIPE,