docs: update CLAUDE.md for PG streaming and new quoted types

Reflect the two behavior changes: (1) migration mode sets the source
connection to autoCommit=false so PostgreSQL's setFetchSize actually streams
(it's ignored otherwise) — and why query mode is excluded; (2) json/jsonb/
bpchar/uuid are now quoted, plus document the default-emits-unquoted gotcha
for future type additions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Paul Trowbridge 2026-06-17 21:39:47 -04:00
parent 78c832eb1f
commit d9fd651c72

View File

@ -81,7 +81,9 @@ The tool operates in two modes:
7. Optionally clear target table before insert if -c flag is set
### Type Handling
The tool includes explicit handling for different SQL data types in a switch statement (lines 229-312). Supported types include VARCHAR, TEXT, CHAR, CLOB, DATE, TIME, TIMESTAMP, and BIGINT. String types get quote escaping and optional trimming.
The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming.
**Caveat — the `default` case emits values UNQUOTED** (correct for numerics like INT*/NUMERIC, which is why they're not listed). Any *string-typed* column whose JDBC type name isn't in the switch falls here and breaks the generated INSERT with a syntax error (e.g. PostgreSQL `bool``'t'`/`'f'` is currently unhandled). When adding a new source type, decide: numeric → leave to default; anything string-like → add a quoted case. A more robust future fix is to flip the default to quote-as-string with an explicit numeric allowlist.
### Database Drivers
JDBC drivers are configured in `jrunner/build.gradle`:
@ -140,10 +142,10 @@ Both modes use a streaming architecture with no array storage of result rows:
- Only holds up to 250 rows worth of SQL text in memory at once
**JDBC Fetch Size:**
- Both modes set `stmt.setFetchSize(10000)` (line 190)
- This is a hint to the JDBC driver to fetch 10,000 rows at a time from the database
- The driver maintains this internal buffer for network efficiency
- The application code never sees or stores all 10,000 rows - it processes them one at a time via `rs.next()`
- Both modes set `stmt.setFetchSize(10000)` — a hint to fetch 10,000 rows at a time
- The application processes rows one at a time via `rs.next()`; the only buffer is the driver's fetch window
**⚠️ PostgreSQL requires autoCommit=false for fetchSize to take effect.** The PG JDBC driver IGNORES `setFetchSize` while autoCommit is true and instead loads the ENTIRE result set into memory (OOMs / GC-thrashes on large source tables). So in **migration mode** the source connection is set to `setAutoCommit(false)` right after connecting, which enables a server-side cursor and makes streaming actually stream. This is done **only in migration mode** — query mode leaves autoCommit at its default because callers run committed DDL/DML through query mode (e.g. external tools), and autoCommit=false would roll those statements back on connection close. (jt400/MSSQL drivers stream regardless, so only PG is affected.)
### Batch Size (Migration Mode)
INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.