docs: update CLAUDE.md for PG streaming and new quoted types

Reflect the two behavior changes: (1) migration mode sets the source connection to autoCommit=false so PostgreSQL's setFetchSize actually streams (it's ignored otherwise) — and why query mode is excluded; (2) json/jsonb/ bpchar/uuid are now quoted, plus document the default-emits-unquoted gotcha for future type additions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 21:39:47 -04:00 · 2026-06-17 21:39:47 -04:00 · d9fd651c72
commit d9fd651c72
parent 78c832eb1f
1 changed files with 7 additions and 5 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -81,7 +81,9 @@ The tool operates in two modes:
 7. Optionally clear target table before insert if -c flag is set

 ### Type Handling
-The tool includes explicit handling for different SQL data types in a switch statement (lines 229-312). Supported types include VARCHAR, TEXT, CHAR, CLOB, DATE, TIME, TIMESTAMP, and BIGINT. String types get quote escaping and optional trimming.
+The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming.
+
+**Caveat — the `default` case emits values UNQUOTED** (correct for numerics like INT*/NUMERIC, which is why they're not listed). Any *string-typed* column whose JDBC type name isn't in the switch falls here and breaks the generated INSERT with a syntax error (e.g. PostgreSQL `bool` → `'t'`/`'f'` is currently unhandled). When adding a new source type, decide: numeric → leave to default; anything string-like → add a quoted case. A more robust future fix is to flip the default to quote-as-string with an explicit numeric allowlist.

 ### Database Drivers
 JDBC drivers are configured in `jrunner/build.gradle`:
@ -140,10 +142,10 @@ Both modes use a streaming architecture with no array storage of result rows:
 - Only holds up to 250 rows worth of SQL text in memory at once

 **JDBC Fetch Size:**
- Both modes set `stmt.setFetchSize(10000)` (line 190)
- This is a hint to the JDBC driver to fetch 10,000 rows at a time from the database
- The driver maintains this internal buffer for network efficiency
- The application code never sees or stores all 10,000 rows - it processes them one at a time via `rs.next()`
+- Both modes set `stmt.setFetchSize(10000)` — a hint to fetch 10,000 rows at a time
+- The application processes rows one at a time via `rs.next()`; the only buffer is the driver's fetch window
+
+**⚠️ PostgreSQL requires autoCommit=false for fetchSize to take effect.** The PG JDBC driver IGNORES `setFetchSize` while autoCommit is true and instead loads the ENTIRE result set into memory (OOMs / GC-thrashes on large source tables). So in **migration mode** the source connection is set to `setAutoCommit(false)` right after connecting, which enables a server-side cursor and makes streaming actually stream. This is done **only in migration mode** — query mode leaves autoCommit at its default because callers run committed DDL/DML through query mode (e.g. external tools), and autoCommit=false would roll those statements back on connection close. (jt400/MSSQL drivers stream regardless, so only PG is affected.)

 ### Batch Size (Migration Mode)
 INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.