diff --git a/CLAUDE.md b/CLAUDE.md index 1509fd2..0bce228 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -61,17 +61,26 @@ The tool operates in two modes: - Reads from source, writes to destination with batched INSERTs (or bulk copy with `-b`) - Shows progress counters and timing information -**Bulk Copy** (migration mode, `-b`, SQL Server dest only): -- Streams the source ResultSet into SQL Server over the TDS bulk-load protocol - via `SQLServerBulkCopy` — no per-batch INSERT round trips. Far faster on - large/wide tables (a 1.27M-row, ~298-col load went ~111 min → ~4 min). -- A `BulkSource` adapter (`ISQLServerBulkData`) maps source type names to JDBC - types we control. String-ish types (text/varchar/char/bpchar/json/jsonb/uuid - **and numeric**) are declared NVARCHAR and read via `getString` so SQL Server - converts losslessly — numeric goes this route because PG reports unconstrained - numeric as scale 0, which a typed DECIMAL path would round (123.45 → 123). -- Emits a `\r`-counter every 10k rows for live progress, and prints the final - row count. Falls back to the INSERT path for non-SQL-Server dests. +**Bulk Copy** (migration mode, `-b`) — uses the dest's native bulk path; falls +back to the INSERT path for any other dest (e.g. DB2): + +*SQL Server dest* — streams the source ResultSet over the TDS bulk-load +protocol via `SQLServerBulkCopy` (no per-batch INSERT round trips; a 1.27M-row, +~298-col load went ~111 min → ~4 min). A `BulkSource` adapter +(`ISQLServerBulkData`) maps source type names to JDBC types we control: +string-ish types (text/varchar/char/bpchar/json/jsonb/uuid **and numeric**) are +declared NVARCHAR and read via `getString` so SQL Server converts losslessly — +numeric goes this route because PG reports unconstrained numeric as scale 0, +which a typed DECIMAL path would round (123.45 → 123). + +*Postgres dest* — streams via `COPY FROM STDIN WITH (FORMAT csv)` using +the JDBC `CopyManager`. COPY is text-based, so the server parses each field into +the column type — no per-type handling. Every non-null value is CSV-quoted +(empty string stays distinct from NULL, which is an empty unquoted field); rows +flush in 1000-row buffers. + +Both emit a `\r`-counter every 10k rows for live progress and print the final +row count. ### Data Flow @@ -89,9 +98,9 @@ The tool operates in two modes: 3. Connect to source and destination databases via JDBC 4. Execute source query and fetch results (fetch size: 10,000 rows) 5. Optionally clear target table before insert if -c flag is set -6. With `-b` (SQL Server dest): bulk-copy the ResultSet via `SQLServerBulkCopy`. - Otherwise: build batched INSERT statements (250 rows per batch) and execute - them against the destination table specified by -dt +6. With `-b`: bulk-load via the dest's native path (SQL Server → `SQLServerBulkCopy`, + Postgres → `COPY FROM STDIN`). Otherwise: build batched INSERT statements + (250 rows per batch) and execute them against the destination table (-dt) ### Type Handling The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming. @@ -122,7 +131,7 @@ Command-line flags: - `-dt` - fully qualified destination table name (migration mode only) - `-t` - trim text fields (default: true) - `-c` - clear target table before insert (default: true, migration mode only) -- `-b` - bulk copy into dest via SQLServerBulkCopy (migration mode, SQL Server dest only) +- `-b` - bulk load into dest (migration mode): SQL Server via SQLServerBulkCopy, Postgres via COPY - `-f` - output format: csv, tsv (query mode only, default: csv) ## Key Implementation Details diff --git a/readme.md b/readme.md index 9a6d8a0..5e03528 100644 --- a/readme.md +++ b/readme.md @@ -187,7 +187,8 @@ jrunner -scu jdbc:postgresql://source:5432/sourcedb \ **Options:** - `-t` - trim text fields (default: true) - `-c` - clear target table before insert (default: true) -- `-b` - bulk copy into the destination (SQL Server dest only); streams via the - TDS bulk-load protocol instead of batched INSERTs — far faster on large/wide - tables. No-op for non-SQL-Server dests. (migration mode only) +- `-b` - bulk load into the destination instead of batched INSERTs — far faster + on large/wide tables. SQL Server: TDS bulk-load via SQLServerBulkCopy. + Postgres: COPY FROM STDIN. Other dests (e.g. DB2) fall back to INSERT. + (migration mode only) - `-f` - output format: csv, tsv (query mode only, default: csv)