docs: -b now covers Postgres (COPY) as well as SQL Server

Update readme + CLAUDE: -b is no longer SQL-Server-only. Describe the Postgres
COPY FROM STDIN path (CopyManager, text-based, CSV-quoted, empty vs NULL) next
to the existing SQL Server SQLServerBulkCopy path; DB2 still falls back to INSERT.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Paul Trowbridge 2026-06-18 23:16:00 -04:00
parent 6fe2bea089
commit 2ced7810d9
2 changed files with 28 additions and 18 deletions

View File

@ -61,17 +61,26 @@ The tool operates in two modes:
- Reads from source, writes to destination with batched INSERTs (or bulk copy with `-b`) - Reads from source, writes to destination with batched INSERTs (or bulk copy with `-b`)
- Shows progress counters and timing information - Shows progress counters and timing information
**Bulk Copy** (migration mode, `-b`, SQL Server dest only): **Bulk Copy** (migration mode, `-b`) — uses the dest's native bulk path; falls
- Streams the source ResultSet into SQL Server over the TDS bulk-load protocol back to the INSERT path for any other dest (e.g. DB2):
via `SQLServerBulkCopy` — no per-batch INSERT round trips. Far faster on
large/wide tables (a 1.27M-row, ~298-col load went ~111 min → ~4 min). *SQL Server dest* — streams the source ResultSet over the TDS bulk-load
- A `BulkSource` adapter (`ISQLServerBulkData`) maps source type names to JDBC protocol via `SQLServerBulkCopy` (no per-batch INSERT round trips; a 1.27M-row,
types we control. String-ish types (text/varchar/char/bpchar/json/jsonb/uuid ~298-col load went ~111 min → ~4 min). A `BulkSource` adapter
**and numeric**) are declared NVARCHAR and read via `getString` so SQL Server (`ISQLServerBulkData`) maps source type names to JDBC types we control:
converts losslessly — numeric goes this route because PG reports unconstrained string-ish types (text/varchar/char/bpchar/json/jsonb/uuid **and numeric**) are
numeric as scale 0, which a typed DECIMAL path would round (123.45 → 123). declared NVARCHAR and read via `getString` so SQL Server converts losslessly —
- Emits a `\r`-counter every 10k rows for live progress, and prints the final numeric goes this route because PG reports unconstrained numeric as scale 0,
row count. Falls back to the INSERT path for non-SQL-Server dests. which a typed DECIMAL path would round (123.45 → 123).
*Postgres dest* — streams via `COPY <table> FROM STDIN WITH (FORMAT csv)` using
the JDBC `CopyManager`. COPY is text-based, so the server parses each field into
the column type — no per-type handling. Every non-null value is CSV-quoted
(empty string stays distinct from NULL, which is an empty unquoted field); rows
flush in 1000-row buffers.
Both emit a `\r`-counter every 10k rows for live progress and print the final
row count.
### Data Flow ### Data Flow
@ -89,9 +98,9 @@ The tool operates in two modes:
3. Connect to source and destination databases via JDBC 3. Connect to source and destination databases via JDBC
4. Execute source query and fetch results (fetch size: 10,000 rows) 4. Execute source query and fetch results (fetch size: 10,000 rows)
5. Optionally clear target table before insert if -c flag is set 5. Optionally clear target table before insert if -c flag is set
6. With `-b` (SQL Server dest): bulk-copy the ResultSet via `SQLServerBulkCopy`. 6. With `-b`: bulk-load via the dest's native path (SQL Server → `SQLServerBulkCopy`,
Otherwise: build batched INSERT statements (250 rows per batch) and execute Postgres → `COPY FROM STDIN`). Otherwise: build batched INSERT statements
them against the destination table specified by -dt (250 rows per batch) and execute them against the destination table (-dt)
### Type Handling ### Type Handling
The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming. The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming.
@ -122,7 +131,7 @@ Command-line flags:
- `-dt` - fully qualified destination table name (migration mode only) - `-dt` - fully qualified destination table name (migration mode only)
- `-t` - trim text fields (default: true) - `-t` - trim text fields (default: true)
- `-c` - clear target table before insert (default: true, migration mode only) - `-c` - clear target table before insert (default: true, migration mode only)
- `-b` - bulk copy into dest via SQLServerBulkCopy (migration mode, SQL Server dest only) - `-b` - bulk load into dest (migration mode): SQL Server via SQLServerBulkCopy, Postgres via COPY
- `-f` - output format: csv, tsv (query mode only, default: csv) - `-f` - output format: csv, tsv (query mode only, default: csv)
## Key Implementation Details ## Key Implementation Details

View File

@ -187,7 +187,8 @@ jrunner -scu jdbc:postgresql://source:5432/sourcedb \
**Options:** **Options:**
- `-t` - trim text fields (default: true) - `-t` - trim text fields (default: true)
- `-c` - clear target table before insert (default: true) - `-c` - clear target table before insert (default: true)
- `-b` - bulk copy into the destination (SQL Server dest only); streams via the - `-b` - bulk load into the destination instead of batched INSERTs — far faster
TDS bulk-load protocol instead of batched INSERTs — far faster on large/wide on large/wide tables. SQL Server: TDS bulk-load via SQLServerBulkCopy.
tables. No-op for non-SQL-Server dests. (migration mode only) Postgres: COPY FROM STDIN. Other dests (e.g. DB2) fall back to INSERT.
(migration mode only)
- `-f` - output format: csv, tsv (query mode only, default: csv) - `-f` - output format: csv, tsv (query mode only, default: csv)