Compare commits
No commits in common. "master" and "passfile" have entirely different histories.
43
CLAUDE.md
43
CLAUDE.md
@ -58,30 +58,9 @@ The tool operates in two modes:
|
||||
|
||||
**Migration Mode** (original functionality):
|
||||
- Activates when destination flags are provided
|
||||
- Reads from source, writes to destination with batched INSERTs (or bulk copy with `-b`)
|
||||
- Reads from source, writes to destination with batched INSERTs
|
||||
- Shows progress counters and timing information
|
||||
|
||||
**Bulk Copy** (migration mode, `-b`) — uses the dest's native bulk path; falls
|
||||
back to the INSERT path for any other dest (e.g. DB2):
|
||||
|
||||
*SQL Server dest* — streams the source ResultSet over the TDS bulk-load
|
||||
protocol via `SQLServerBulkCopy` (no per-batch INSERT round trips; a 1.27M-row,
|
||||
~298-col load went ~111 min → ~4 min). A `BulkSource` adapter
|
||||
(`ISQLServerBulkData`) maps source type names to JDBC types we control:
|
||||
string-ish types (text/varchar/char/bpchar/json/jsonb/uuid **and numeric**) are
|
||||
declared NVARCHAR and read via `getString` so SQL Server converts losslessly —
|
||||
numeric goes this route because PG reports unconstrained numeric as scale 0,
|
||||
which a typed DECIMAL path would round (123.45 → 123).
|
||||
|
||||
*Postgres dest* — streams via `COPY <table> FROM STDIN WITH (FORMAT csv)` using
|
||||
the JDBC `CopyManager`. COPY is text-based, so the server parses each field into
|
||||
the column type — no per-type handling. Every non-null value is CSV-quoted
|
||||
(empty string stays distinct from NULL, which is an empty unquoted field); rows
|
||||
flush in 1000-row buffers.
|
||||
|
||||
Both emit a `\r`-counter every 10k rows for live progress and print the final
|
||||
row count.
|
||||
|
||||
### Data Flow
|
||||
|
||||
**Query Mode:**
|
||||
@ -97,15 +76,12 @@ row count.
|
||||
2. Read SQL query from file specified by -sq flag
|
||||
3. Connect to source and destination databases via JDBC
|
||||
4. Execute source query and fetch results (fetch size: 10,000 rows)
|
||||
5. Optionally clear target table before insert if -c flag is set
|
||||
6. With `-b`: bulk-load via the dest's native path (SQL Server → `SQLServerBulkCopy`,
|
||||
Postgres → `COPY FROM STDIN`). Otherwise: build batched INSERT statements
|
||||
(250 rows per batch) and execute them against the destination table (-dt)
|
||||
5. Build batched INSERT statements (250 rows per batch)
|
||||
6. Execute batches against destination table specified by -dt flag
|
||||
7. Optionally clear target table before insert if -c flag is set
|
||||
|
||||
### Type Handling
|
||||
The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming.
|
||||
|
||||
**Caveat — the `default` case emits values UNQUOTED** (correct for numerics like INT*/NUMERIC, which is why they're not listed). Any *string-typed* column whose JDBC type name isn't in the switch falls here and breaks the generated INSERT with a syntax error (e.g. PostgreSQL `bool` → `'t'`/`'f'` is currently unhandled). When adding a new source type, decide: numeric → leave to default; anything string-like → add a quoted case. A more robust future fix is to flip the default to quote-as-string with an explicit numeric allowlist.
|
||||
The tool includes explicit handling for different SQL data types in a switch statement (lines 229-312). Supported types include VARCHAR, TEXT, CHAR, CLOB, DATE, TIME, TIMESTAMP, and BIGINT. String types get quote escaping and optional trimming.
|
||||
|
||||
### Database Drivers
|
||||
JDBC drivers are configured in `jrunner/build.gradle`:
|
||||
@ -131,7 +107,6 @@ Command-line flags:
|
||||
- `-dt` - fully qualified destination table name (migration mode only)
|
||||
- `-t` - trim text fields (default: true)
|
||||
- `-c` - clear target table before insert (default: true, migration mode only)
|
||||
- `-b` - bulk load into dest (migration mode): SQL Server via SQLServerBulkCopy, Postgres via COPY
|
||||
- `-f` - output format: csv, tsv (query mode only, default: csv)
|
||||
|
||||
## Key Implementation Details
|
||||
@ -165,10 +140,10 @@ Both modes use a streaming architecture with no array storage of result rows:
|
||||
- Only holds up to 250 rows worth of SQL text in memory at once
|
||||
|
||||
**JDBC Fetch Size:**
|
||||
- Both modes set `stmt.setFetchSize(10000)` — a hint to fetch 10,000 rows at a time
|
||||
- The application processes rows one at a time via `rs.next()`; the only buffer is the driver's fetch window
|
||||
|
||||
**⚠️ PostgreSQL requires autoCommit=false for fetchSize to take effect.** The PG JDBC driver IGNORES `setFetchSize` while autoCommit is true and instead loads the ENTIRE result set into memory (OOMs / GC-thrashes on large source tables). So in **migration mode** the source connection is set to `setAutoCommit(false)` right after connecting, which enables a server-side cursor and makes streaming actually stream. This is done **only in migration mode** — query mode leaves autoCommit at its default because callers run committed DDL/DML through query mode (e.g. external tools), and autoCommit=false would roll those statements back on connection close. (jt400/MSSQL drivers stream regardless, so only PG is affected.)
|
||||
- Both modes set `stmt.setFetchSize(10000)` (line 190)
|
||||
- This is a hint to the JDBC driver to fetch 10,000 rows at a time from the database
|
||||
- The driver maintains this internal buffer for network efficiency
|
||||
- The application code never sees or stores all 10,000 rows - it processes them one at a time via `rs.next()`
|
||||
|
||||
### Batch Size (Migration Mode)
|
||||
INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.
|
||||
|
||||
@ -6,13 +6,6 @@ import java.nio.file.Path ;
|
||||
import java.nio.file.Paths;
|
||||
import java.time.*;
|
||||
import java.io.IOException;
|
||||
import com.microsoft.sqlserver.jdbc.SQLServerBulkCopy;
|
||||
import com.microsoft.sqlserver.jdbc.SQLServerBulkCopyOptions;
|
||||
import com.microsoft.sqlserver.jdbc.ISQLServerBulkData;
|
||||
import com.microsoft.sqlserver.jdbc.SQLServerException;
|
||||
import org.postgresql.PGConnection;
|
||||
import org.postgresql.copy.CopyManager;
|
||||
import org.postgresql.copy.CopyIn;
|
||||
|
||||
public class jrunner {
|
||||
//static final String QUERY = "SELECT * from rlarp.osm LIMIT 100";
|
||||
@ -31,7 +24,6 @@ public class jrunner {
|
||||
String dt = "";
|
||||
Boolean trim = true;
|
||||
Boolean clear = true;
|
||||
Boolean bulk = false;
|
||||
Integer r = 0;
|
||||
Integer t = 0;
|
||||
String sql = "";
|
||||
@ -65,7 +57,6 @@ public class jrunner {
|
||||
msg = msg + nl + "-dt fully qualified name of destination table";
|
||||
msg = msg + nl + "-t trim text";
|
||||
msg = msg + nl + "-c clear target table";
|
||||
msg = msg + nl + "-b bulk copy into destination (SQL Server dest only)";
|
||||
msg = msg + nl + "-f output format (csv, tsv, table, json) - default: csv";
|
||||
msg = msg + nl + "--help info";
|
||||
msg = msg + nl + "";
|
||||
@ -134,9 +125,6 @@ public class jrunner {
|
||||
case "-c":
|
||||
clear = true;
|
||||
break;
|
||||
case "-b":
|
||||
bulk = true;
|
||||
break;
|
||||
case "-f":
|
||||
outputFormat = args[i+1].toLowerCase();
|
||||
break;
|
||||
@ -312,72 +300,6 @@ public class jrunner {
|
||||
e.printStackTrace();
|
||||
System.exit(0);
|
||||
}
|
||||
} else if (bulk && dcu.toLowerCase().startsWith("jdbc:sqlserver:")) {
|
||||
//-------------------------------bulk copy (SQL Server dest)-------------------------------------------------
|
||||
// Stream the source ResultSet straight into SQL Server over the TDS
|
||||
// bulk-load protocol — no per-row INSERT round trips. Source type
|
||||
// names map to JDBC types via BulkSource (string-ish -> NVARCHAR).
|
||||
System.out.println("------------bulk copy-------------------------------------");
|
||||
try {
|
||||
SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(dcon);
|
||||
bulkCopy.setDestinationTableName(dt);
|
||||
SQLServerBulkCopyOptions options = new SQLServerBulkCopyOptions();
|
||||
options.setBatchSize(10000);
|
||||
options.setBulkCopyTimeout(0);
|
||||
bulkCopy.setBulkCopyOptions(options);
|
||||
BulkSource src = new BulkSource(rs, cols, dtn, trim);
|
||||
bulkCopy.writeToServer(src);
|
||||
bulkCopy.close();
|
||||
// leading \r starts a fresh line so the count doesn't concatenate
|
||||
// onto the last progress tick; the trailing " rows written" makes
|
||||
// it parseable.
|
||||
System.out.print("\r" + src.rowsWritten());
|
||||
} catch (Exception e) {
|
||||
e.printStackTrace();
|
||||
System.exit(0);
|
||||
}
|
||||
} else if (bulk && dcu.toLowerCase().startsWith("jdbc:postgresql:")) {
|
||||
//-------------------------------bulk copy (COPY, Postgres dest)--------------------------------------------
|
||||
// Stream the source ResultSet into Postgres via COPY ... FROM STDIN.
|
||||
// COPY is text-based: each field is sent as CSV text and the server
|
||||
// parses it into the column type, so there's no per-type quoting.
|
||||
// Non-null values are always CSV-quoted; NULL is an empty unquoted
|
||||
// field; column order must match the dest (positional, as always).
|
||||
System.out.println("------------bulk copy (COPY)------------------------------");
|
||||
try {
|
||||
CopyManager cm = ((PGConnection) dcon).getCopyAPI();
|
||||
CopyIn cin = cm.copyIn("COPY " + dt + " FROM STDIN WITH (FORMAT csv)");
|
||||
StringBuilder buf = new StringBuilder();
|
||||
long rows = 0;
|
||||
while (rs.next()) {
|
||||
for (int i = 1; i <= cols; i++) {
|
||||
if (i > 1) { buf.append(','); }
|
||||
String val = rs.getString(i);
|
||||
if (!rs.wasNull() && val != null) {
|
||||
if (trim) { val = val.trim(); }
|
||||
buf.append('"').append(val.replace("\"", "\"\"")).append('"');
|
||||
}
|
||||
// else: empty field -> NULL
|
||||
}
|
||||
buf.append('\n');
|
||||
rows++;
|
||||
if (rows % 1000 == 0) {
|
||||
byte[] b = buf.toString().getBytes(java.nio.charset.StandardCharsets.UTF_8);
|
||||
cin.writeToCopy(b, 0, b.length);
|
||||
buf.setLength(0);
|
||||
if (rows % 10000 == 0) { System.out.print("\r" + rows); System.out.flush(); }
|
||||
}
|
||||
}
|
||||
if (buf.length() > 0) {
|
||||
byte[] b = buf.toString().getBytes(java.nio.charset.StandardCharsets.UTF_8);
|
||||
cin.writeToCopy(b, 0, b.length);
|
||||
}
|
||||
cin.endCopy();
|
||||
System.out.print("\r" + rows);
|
||||
} catch (Exception e) {
|
||||
e.printStackTrace();
|
||||
System.exit(0);
|
||||
}
|
||||
} else {
|
||||
System.out.println("------------row count-------------------------------------");
|
||||
//-------------------------------build & execute sql-------------------------------------------------------------
|
||||
@ -556,135 +478,6 @@ public class jrunner {
|
||||
System.out.println();
|
||||
}
|
||||
|
||||
// Adapts a source ResultSet to SQLServerBulkCopy. Maps source type names to
|
||||
// JDBC types we control: string-ish PG types (text/varchar/char/bpchar/json/
|
||||
// jsonb/uuid/...) are declared NVARCHAR and read via getString — mirroring the
|
||||
// INSERT path's "quote it" choices and sidestepping unsupported types (jsonb
|
||||
// reports as OTHER, which writeToServer(ResultSet) can't handle directly).
|
||||
static class BulkSource implements ISQLServerBulkData {
|
||||
private final ResultSet rs;
|
||||
private final ResultSetMetaData md;
|
||||
private final int cols;
|
||||
private final boolean trim;
|
||||
private final int[] jdbcType;
|
||||
private final boolean[] asString;
|
||||
private long rows = 0;
|
||||
|
||||
BulkSource(ResultSet rs, int cols, String[] dtn, boolean trim) throws SQLException {
|
||||
this.rs = rs;
|
||||
this.cols = cols;
|
||||
this.trim = trim;
|
||||
this.md = rs.getMetaData();
|
||||
this.jdbcType = new int[cols + 1];
|
||||
this.asString = new boolean[cols + 1];
|
||||
for (int i = 1; i <= cols; i++) {
|
||||
String t = (dtn[i] == null ? "" : dtn[i].toUpperCase());
|
||||
switch (t) {
|
||||
case "INT2": case "SMALLINT":
|
||||
jdbcType[i] = Types.SMALLINT; break;
|
||||
case "INT4": case "INT": case "INTEGER": case "SERIAL":
|
||||
jdbcType[i] = Types.INTEGER; break;
|
||||
case "INT8": case "BIGINT": case "BIGSERIAL":
|
||||
jdbcType[i] = Types.BIGINT; break;
|
||||
// numeric/decimal: PG reports unconstrained numeric as
|
||||
// scale 0, which makes bulk copy round (123.45 -> 123). Send
|
||||
// the exact text instead and let SQL Server convert it into
|
||||
// the dest column losslessly.
|
||||
case "FLOAT4": case "REAL":
|
||||
jdbcType[i] = Types.REAL; break;
|
||||
case "FLOAT8": case "DOUBLE": case "DOUBLE PRECISION":
|
||||
jdbcType[i] = Types.DOUBLE; break;
|
||||
case "BOOL": case "BOOLEAN": case "BIT":
|
||||
jdbcType[i] = Types.BIT; break;
|
||||
case "DATE":
|
||||
jdbcType[i] = Types.DATE; break;
|
||||
case "TIME":
|
||||
jdbcType[i] = Types.TIME; break;
|
||||
case "TIMESTAMP": case "TIMESTAMPTZ":
|
||||
case "DATETIME": case "DATETIME2": case "SMALLDATETIME":
|
||||
jdbcType[i] = Types.TIMESTAMP; break;
|
||||
case "BYTEA": case "BINARY": case "VARBINARY":
|
||||
jdbcType[i] = Types.VARBINARY; break;
|
||||
default:
|
||||
jdbcType[i] = Types.LONGNVARCHAR;
|
||||
asString[i] = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
public java.util.Set<Integer> getColumnOrdinals() {
|
||||
java.util.Set<Integer> ords = new java.util.TreeSet<Integer>();
|
||||
for (int i = 1; i <= cols; i++) { ords.add(i); }
|
||||
return ords;
|
||||
}
|
||||
|
||||
public String getColumnName(int column) {
|
||||
try { return md.getColumnName(column); }
|
||||
catch (SQLException e) { return "col" + column; }
|
||||
}
|
||||
|
||||
public int getColumnType(int column) { return jdbcType[column]; }
|
||||
|
||||
public int getPrecision(int column) {
|
||||
if (asString[column]) { return 0; } // LONGNVARCHAR -> nvarchar(max)
|
||||
try {
|
||||
int p = md.getPrecision(column);
|
||||
return (p > 0 ? p : 38);
|
||||
} catch (SQLException e) { return 38; }
|
||||
}
|
||||
|
||||
public int getScale(int column) {
|
||||
if (jdbcType[column] == Types.DECIMAL) {
|
||||
try { return md.getScale(column); }
|
||||
catch (SQLException e) { return 0; }
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
public boolean next() throws SQLServerException {
|
||||
try { return rs.next(); }
|
||||
catch (SQLException e) { throw new RuntimeException(e); }
|
||||
}
|
||||
|
||||
public long rowsWritten() { return rows; }
|
||||
|
||||
public Object[] getRowData() throws SQLServerException {
|
||||
rows++;
|
||||
// live progress: emit an in-place counter every 10k rows (the
|
||||
// caller pulls one row at a time, so this runs during the load).
|
||||
if (rows % 10000 == 0) { System.out.print("\r" + rows); System.out.flush(); }
|
||||
Object[] row = new Object[cols];
|
||||
try {
|
||||
for (int i = 1; i <= cols; i++) {
|
||||
Object v;
|
||||
switch (jdbcType[i]) {
|
||||
case Types.SMALLINT:
|
||||
case Types.INTEGER:
|
||||
case Types.BIGINT:
|
||||
case Types.REAL:
|
||||
case Types.DOUBLE: v = rs.getObject(i); break;
|
||||
case Types.DECIMAL: v = rs.getBigDecimal(i); break;
|
||||
case Types.BIT: v = rs.getBoolean(i); break;
|
||||
case Types.DATE: v = rs.getDate(i); break;
|
||||
case Types.TIME: v = rs.getTime(i); break;
|
||||
case Types.TIMESTAMP:v = rs.getTimestamp(i); break;
|
||||
case Types.VARBINARY:v = rs.getBytes(i); break;
|
||||
default: {
|
||||
String s = rs.getString(i);
|
||||
if (s != null && trim) { s = s.trim(); }
|
||||
v = s;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (rs.wasNull()) { v = null; }
|
||||
row[i - 1] = v;
|
||||
}
|
||||
} catch (SQLException e) { throw new RuntimeException(e); }
|
||||
return row;
|
||||
}
|
||||
}
|
||||
|
||||
private static void outputQueryResults(ResultSet rs, int cols, String[] dtn, String format) throws SQLException {
|
||||
switch (format) {
|
||||
case "csv":
|
||||
|
||||
@ -187,8 +187,4 @@ jrunner -scu jdbc:postgresql://source:5432/sourcedb \
|
||||
**Options:**
|
||||
- `-t` - trim text fields (default: true)
|
||||
- `-c` - clear target table before insert (default: true)
|
||||
- `-b` - bulk load into the destination instead of batched INSERTs — far faster
|
||||
on large/wide tables. SQL Server: TDS bulk-load via SQLServerBulkCopy.
|
||||
Postgres: COPY FROM STDIN. Other dests (e.g. DB2) fall back to INSERT.
|
||||
(migration mode only)
|
||||
- `-f` - output format: csv, tsv (query mode only, default: csv)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user