Compare commits
No commits in common. "master" and "passfile" have entirely different histories.
43
CLAUDE.md
43
CLAUDE.md
@ -58,30 +58,9 @@ The tool operates in two modes:
|
|||||||
|
|
||||||
**Migration Mode** (original functionality):
|
**Migration Mode** (original functionality):
|
||||||
- Activates when destination flags are provided
|
- Activates when destination flags are provided
|
||||||
- Reads from source, writes to destination with batched INSERTs (or bulk copy with `-b`)
|
- Reads from source, writes to destination with batched INSERTs
|
||||||
- Shows progress counters and timing information
|
- Shows progress counters and timing information
|
||||||
|
|
||||||
**Bulk Copy** (migration mode, `-b`) — uses the dest's native bulk path; falls
|
|
||||||
back to the INSERT path for any other dest (e.g. DB2):
|
|
||||||
|
|
||||||
*SQL Server dest* — streams the source ResultSet over the TDS bulk-load
|
|
||||||
protocol via `SQLServerBulkCopy` (no per-batch INSERT round trips; a 1.27M-row,
|
|
||||||
~298-col load went ~111 min → ~4 min). A `BulkSource` adapter
|
|
||||||
(`ISQLServerBulkData`) maps source type names to JDBC types we control:
|
|
||||||
string-ish types (text/varchar/char/bpchar/json/jsonb/uuid **and numeric**) are
|
|
||||||
declared NVARCHAR and read via `getString` so SQL Server converts losslessly —
|
|
||||||
numeric goes this route because PG reports unconstrained numeric as scale 0,
|
|
||||||
which a typed DECIMAL path would round (123.45 → 123).
|
|
||||||
|
|
||||||
*Postgres dest* — streams via `COPY <table> FROM STDIN WITH (FORMAT csv)` using
|
|
||||||
the JDBC `CopyManager`. COPY is text-based, so the server parses each field into
|
|
||||||
the column type — no per-type handling. Every non-null value is CSV-quoted
|
|
||||||
(empty string stays distinct from NULL, which is an empty unquoted field); rows
|
|
||||||
flush in 1000-row buffers.
|
|
||||||
|
|
||||||
Both emit a `\r`-counter every 10k rows for live progress and print the final
|
|
||||||
row count.
|
|
||||||
|
|
||||||
### Data Flow
|
### Data Flow
|
||||||
|
|
||||||
**Query Mode:**
|
**Query Mode:**
|
||||||
@ -97,15 +76,12 @@ row count.
|
|||||||
2. Read SQL query from file specified by -sq flag
|
2. Read SQL query from file specified by -sq flag
|
||||||
3. Connect to source and destination databases via JDBC
|
3. Connect to source and destination databases via JDBC
|
||||||
4. Execute source query and fetch results (fetch size: 10,000 rows)
|
4. Execute source query and fetch results (fetch size: 10,000 rows)
|
||||||
5. Optionally clear target table before insert if -c flag is set
|
5. Build batched INSERT statements (250 rows per batch)
|
||||||
6. With `-b`: bulk-load via the dest's native path (SQL Server → `SQLServerBulkCopy`,
|
6. Execute batches against destination table specified by -dt flag
|
||||||
Postgres → `COPY FROM STDIN`). Otherwise: build batched INSERT statements
|
7. Optionally clear target table before insert if -c flag is set
|
||||||
(250 rows per batch) and execute them against the destination table (-dt)
|
|
||||||
|
|
||||||
### Type Handling
|
### Type Handling
|
||||||
The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming.
|
The tool includes explicit handling for different SQL data types in a switch statement (lines 229-312). Supported types include VARCHAR, TEXT, CHAR, CLOB, DATE, TIME, TIMESTAMP, and BIGINT. String types get quote escaping and optional trimming.
|
||||||
|
|
||||||
**Caveat — the `default` case emits values UNQUOTED** (correct for numerics like INT*/NUMERIC, which is why they're not listed). Any *string-typed* column whose JDBC type name isn't in the switch falls here and breaks the generated INSERT with a syntax error (e.g. PostgreSQL `bool` → `'t'`/`'f'` is currently unhandled). When adding a new source type, decide: numeric → leave to default; anything string-like → add a quoted case. A more robust future fix is to flip the default to quote-as-string with an explicit numeric allowlist.
|
|
||||||
|
|
||||||
### Database Drivers
|
### Database Drivers
|
||||||
JDBC drivers are configured in `jrunner/build.gradle`:
|
JDBC drivers are configured in `jrunner/build.gradle`:
|
||||||
@ -131,7 +107,6 @@ Command-line flags:
|
|||||||
- `-dt` - fully qualified destination table name (migration mode only)
|
- `-dt` - fully qualified destination table name (migration mode only)
|
||||||
- `-t` - trim text fields (default: true)
|
- `-t` - trim text fields (default: true)
|
||||||
- `-c` - clear target table before insert (default: true, migration mode only)
|
- `-c` - clear target table before insert (default: true, migration mode only)
|
||||||
- `-b` - bulk load into dest (migration mode): SQL Server via SQLServerBulkCopy, Postgres via COPY
|
|
||||||
- `-f` - output format: csv, tsv (query mode only, default: csv)
|
- `-f` - output format: csv, tsv (query mode only, default: csv)
|
||||||
|
|
||||||
## Key Implementation Details
|
## Key Implementation Details
|
||||||
@ -165,10 +140,10 @@ Both modes use a streaming architecture with no array storage of result rows:
|
|||||||
- Only holds up to 250 rows worth of SQL text in memory at once
|
- Only holds up to 250 rows worth of SQL text in memory at once
|
||||||
|
|
||||||
**JDBC Fetch Size:**
|
**JDBC Fetch Size:**
|
||||||
- Both modes set `stmt.setFetchSize(10000)` — a hint to fetch 10,000 rows at a time
|
- Both modes set `stmt.setFetchSize(10000)` (line 190)
|
||||||
- The application processes rows one at a time via `rs.next()`; the only buffer is the driver's fetch window
|
- This is a hint to the JDBC driver to fetch 10,000 rows at a time from the database
|
||||||
|
- The driver maintains this internal buffer for network efficiency
|
||||||
**⚠️ PostgreSQL requires autoCommit=false for fetchSize to take effect.** The PG JDBC driver IGNORES `setFetchSize` while autoCommit is true and instead loads the ENTIRE result set into memory (OOMs / GC-thrashes on large source tables). So in **migration mode** the source connection is set to `setAutoCommit(false)` right after connecting, which enables a server-side cursor and makes streaming actually stream. This is done **only in migration mode** — query mode leaves autoCommit at its default because callers run committed DDL/DML through query mode (e.g. external tools), and autoCommit=false would roll those statements back on connection close. (jt400/MSSQL drivers stream regardless, so only PG is affected.)
|
- The application code never sees or stores all 10,000 rows - it processes them one at a time via `rs.next()`
|
||||||
|
|
||||||
### Batch Size (Migration Mode)
|
### Batch Size (Migration Mode)
|
||||||
INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.
|
INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.
|
||||||
|
|||||||
@ -6,13 +6,6 @@ import java.nio.file.Path ;
|
|||||||
import java.nio.file.Paths;
|
import java.nio.file.Paths;
|
||||||
import java.time.*;
|
import java.time.*;
|
||||||
import java.io.IOException;
|
import java.io.IOException;
|
||||||
import com.microsoft.sqlserver.jdbc.SQLServerBulkCopy;
|
|
||||||
import com.microsoft.sqlserver.jdbc.SQLServerBulkCopyOptions;
|
|
||||||
import com.microsoft.sqlserver.jdbc.ISQLServerBulkData;
|
|
||||||
import com.microsoft.sqlserver.jdbc.SQLServerException;
|
|
||||||
import org.postgresql.PGConnection;
|
|
||||||
import org.postgresql.copy.CopyManager;
|
|
||||||
import org.postgresql.copy.CopyIn;
|
|
||||||
|
|
||||||
public class jrunner {
|
public class jrunner {
|
||||||
//static final String QUERY = "SELECT * from rlarp.osm LIMIT 100";
|
//static final String QUERY = "SELECT * from rlarp.osm LIMIT 100";
|
||||||
@ -31,7 +24,6 @@ public class jrunner {
|
|||||||
String dt = "";
|
String dt = "";
|
||||||
Boolean trim = true;
|
Boolean trim = true;
|
||||||
Boolean clear = true;
|
Boolean clear = true;
|
||||||
Boolean bulk = false;
|
|
||||||
Integer r = 0;
|
Integer r = 0;
|
||||||
Integer t = 0;
|
Integer t = 0;
|
||||||
String sql = "";
|
String sql = "";
|
||||||
@ -65,7 +57,6 @@ public class jrunner {
|
|||||||
msg = msg + nl + "-dt fully qualified name of destination table";
|
msg = msg + nl + "-dt fully qualified name of destination table";
|
||||||
msg = msg + nl + "-t trim text";
|
msg = msg + nl + "-t trim text";
|
||||||
msg = msg + nl + "-c clear target table";
|
msg = msg + nl + "-c clear target table";
|
||||||
msg = msg + nl + "-b bulk copy into destination (SQL Server dest only)";
|
|
||||||
msg = msg + nl + "-f output format (csv, tsv, table, json) - default: csv";
|
msg = msg + nl + "-f output format (csv, tsv, table, json) - default: csv";
|
||||||
msg = msg + nl + "--help info";
|
msg = msg + nl + "--help info";
|
||||||
msg = msg + nl + "";
|
msg = msg + nl + "";
|
||||||
@ -134,9 +125,6 @@ public class jrunner {
|
|||||||
case "-c":
|
case "-c":
|
||||||
clear = true;
|
clear = true;
|
||||||
break;
|
break;
|
||||||
case "-b":
|
|
||||||
bulk = true;
|
|
||||||
break;
|
|
||||||
case "-f":
|
case "-f":
|
||||||
outputFormat = args[i+1].toLowerCase();
|
outputFormat = args[i+1].toLowerCase();
|
||||||
break;
|
break;
|
||||||
@ -312,72 +300,6 @@ public class jrunner {
|
|||||||
e.printStackTrace();
|
e.printStackTrace();
|
||||||
System.exit(0);
|
System.exit(0);
|
||||||
}
|
}
|
||||||
} else if (bulk && dcu.toLowerCase().startsWith("jdbc:sqlserver:")) {
|
|
||||||
//-------------------------------bulk copy (SQL Server dest)-------------------------------------------------
|
|
||||||
// Stream the source ResultSet straight into SQL Server over the TDS
|
|
||||||
// bulk-load protocol — no per-row INSERT round trips. Source type
|
|
||||||
// names map to JDBC types via BulkSource (string-ish -> NVARCHAR).
|
|
||||||
System.out.println("------------bulk copy-------------------------------------");
|
|
||||||
try {
|
|
||||||
SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(dcon);
|
|
||||||
bulkCopy.setDestinationTableName(dt);
|
|
||||||
SQLServerBulkCopyOptions options = new SQLServerBulkCopyOptions();
|
|
||||||
options.setBatchSize(10000);
|
|
||||||
options.setBulkCopyTimeout(0);
|
|
||||||
bulkCopy.setBulkCopyOptions(options);
|
|
||||||
BulkSource src = new BulkSource(rs, cols, dtn, trim);
|
|
||||||
bulkCopy.writeToServer(src);
|
|
||||||
bulkCopy.close();
|
|
||||||
// leading \r starts a fresh line so the count doesn't concatenate
|
|
||||||
// onto the last progress tick; the trailing " rows written" makes
|
|
||||||
// it parseable.
|
|
||||||
System.out.print("\r" + src.rowsWritten());
|
|
||||||
} catch (Exception e) {
|
|
||||||
e.printStackTrace();
|
|
||||||
System.exit(0);
|
|
||||||
}
|
|
||||||
} else if (bulk && dcu.toLowerCase().startsWith("jdbc:postgresql:")) {
|
|
||||||
//-------------------------------bulk copy (COPY, Postgres dest)--------------------------------------------
|
|
||||||
// Stream the source ResultSet into Postgres via COPY ... FROM STDIN.
|
|
||||||
// COPY is text-based: each field is sent as CSV text and the server
|
|
||||||
// parses it into the column type, so there's no per-type quoting.
|
|
||||||
// Non-null values are always CSV-quoted; NULL is an empty unquoted
|
|
||||||
// field; column order must match the dest (positional, as always).
|
|
||||||
System.out.println("------------bulk copy (COPY)------------------------------");
|
|
||||||
try {
|
|
||||||
CopyManager cm = ((PGConnection) dcon).getCopyAPI();
|
|
||||||
CopyIn cin = cm.copyIn("COPY " + dt + " FROM STDIN WITH (FORMAT csv)");
|
|
||||||
StringBuilder buf = new StringBuilder();
|
|
||||||
long rows = 0;
|
|
||||||
while (rs.next()) {
|
|
||||||
for (int i = 1; i <= cols; i++) {
|
|
||||||
if (i > 1) { buf.append(','); }
|
|
||||||
String val = rs.getString(i);
|
|
||||||
if (!rs.wasNull() && val != null) {
|
|
||||||
if (trim) { val = val.trim(); }
|
|
||||||
buf.append('"').append(val.replace("\"", "\"\"")).append('"');
|
|
||||||
}
|
|
||||||
// else: empty field -> NULL
|
|
||||||
}
|
|
||||||
buf.append('\n');
|
|
||||||
rows++;
|
|
||||||
if (rows % 1000 == 0) {
|
|
||||||
byte[] b = buf.toString().getBytes(java.nio.charset.StandardCharsets.UTF_8);
|
|
||||||
cin.writeToCopy(b, 0, b.length);
|
|
||||||
buf.setLength(0);
|
|
||||||
if (rows % 10000 == 0) { System.out.print("\r" + rows); System.out.flush(); }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (buf.length() > 0) {
|
|
||||||
byte[] b = buf.toString().getBytes(java.nio.charset.StandardCharsets.UTF_8);
|
|
||||||
cin.writeToCopy(b, 0, b.length);
|
|
||||||
}
|
|
||||||
cin.endCopy();
|
|
||||||
System.out.print("\r" + rows);
|
|
||||||
} catch (Exception e) {
|
|
||||||
e.printStackTrace();
|
|
||||||
System.exit(0);
|
|
||||||
}
|
|
||||||
} else {
|
} else {
|
||||||
System.out.println("------------row count-------------------------------------");
|
System.out.println("------------row count-------------------------------------");
|
||||||
//-------------------------------build & execute sql-------------------------------------------------------------
|
//-------------------------------build & execute sql-------------------------------------------------------------
|
||||||
@ -556,135 +478,6 @@ public class jrunner {
|
|||||||
System.out.println();
|
System.out.println();
|
||||||
}
|
}
|
||||||
|
|
||||||
// Adapts a source ResultSet to SQLServerBulkCopy. Maps source type names to
|
|
||||||
// JDBC types we control: string-ish PG types (text/varchar/char/bpchar/json/
|
|
||||||
// jsonb/uuid/...) are declared NVARCHAR and read via getString — mirroring the
|
|
||||||
// INSERT path's "quote it" choices and sidestepping unsupported types (jsonb
|
|
||||||
// reports as OTHER, which writeToServer(ResultSet) can't handle directly).
|
|
||||||
static class BulkSource implements ISQLServerBulkData {
|
|
||||||
private final ResultSet rs;
|
|
||||||
private final ResultSetMetaData md;
|
|
||||||
private final int cols;
|
|
||||||
private final boolean trim;
|
|
||||||
private final int[] jdbcType;
|
|
||||||
private final boolean[] asString;
|
|
||||||
private long rows = 0;
|
|
||||||
|
|
||||||
BulkSource(ResultSet rs, int cols, String[] dtn, boolean trim) throws SQLException {
|
|
||||||
this.rs = rs;
|
|
||||||
this.cols = cols;
|
|
||||||
this.trim = trim;
|
|
||||||
this.md = rs.getMetaData();
|
|
||||||
this.jdbcType = new int[cols + 1];
|
|
||||||
this.asString = new boolean[cols + 1];
|
|
||||||
for (int i = 1; i <= cols; i++) {
|
|
||||||
String t = (dtn[i] == null ? "" : dtn[i].toUpperCase());
|
|
||||||
switch (t) {
|
|
||||||
case "INT2": case "SMALLINT":
|
|
||||||
jdbcType[i] = Types.SMALLINT; break;
|
|
||||||
case "INT4": case "INT": case "INTEGER": case "SERIAL":
|
|
||||||
jdbcType[i] = Types.INTEGER; break;
|
|
||||||
case "INT8": case "BIGINT": case "BIGSERIAL":
|
|
||||||
jdbcType[i] = Types.BIGINT; break;
|
|
||||||
// numeric/decimal: PG reports unconstrained numeric as
|
|
||||||
// scale 0, which makes bulk copy round (123.45 -> 123). Send
|
|
||||||
// the exact text instead and let SQL Server convert it into
|
|
||||||
// the dest column losslessly.
|
|
||||||
case "FLOAT4": case "REAL":
|
|
||||||
jdbcType[i] = Types.REAL; break;
|
|
||||||
case "FLOAT8": case "DOUBLE": case "DOUBLE PRECISION":
|
|
||||||
jdbcType[i] = Types.DOUBLE; break;
|
|
||||||
case "BOOL": case "BOOLEAN": case "BIT":
|
|
||||||
jdbcType[i] = Types.BIT; break;
|
|
||||||
case "DATE":
|
|
||||||
jdbcType[i] = Types.DATE; break;
|
|
||||||
case "TIME":
|
|
||||||
jdbcType[i] = Types.TIME; break;
|
|
||||||
case "TIMESTAMP": case "TIMESTAMPTZ":
|
|
||||||
case "DATETIME": case "DATETIME2": case "SMALLDATETIME":
|
|
||||||
jdbcType[i] = Types.TIMESTAMP; break;
|
|
||||||
case "BYTEA": case "BINARY": case "VARBINARY":
|
|
||||||
jdbcType[i] = Types.VARBINARY; break;
|
|
||||||
default:
|
|
||||||
jdbcType[i] = Types.LONGNVARCHAR;
|
|
||||||
asString[i] = true;
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
public java.util.Set<Integer> getColumnOrdinals() {
|
|
||||||
java.util.Set<Integer> ords = new java.util.TreeSet<Integer>();
|
|
||||||
for (int i = 1; i <= cols; i++) { ords.add(i); }
|
|
||||||
return ords;
|
|
||||||
}
|
|
||||||
|
|
||||||
public String getColumnName(int column) {
|
|
||||||
try { return md.getColumnName(column); }
|
|
||||||
catch (SQLException e) { return "col" + column; }
|
|
||||||
}
|
|
||||||
|
|
||||||
public int getColumnType(int column) { return jdbcType[column]; }
|
|
||||||
|
|
||||||
public int getPrecision(int column) {
|
|
||||||
if (asString[column]) { return 0; } // LONGNVARCHAR -> nvarchar(max)
|
|
||||||
try {
|
|
||||||
int p = md.getPrecision(column);
|
|
||||||
return (p > 0 ? p : 38);
|
|
||||||
} catch (SQLException e) { return 38; }
|
|
||||||
}
|
|
||||||
|
|
||||||
public int getScale(int column) {
|
|
||||||
if (jdbcType[column] == Types.DECIMAL) {
|
|
||||||
try { return md.getScale(column); }
|
|
||||||
catch (SQLException e) { return 0; }
|
|
||||||
}
|
|
||||||
return 0;
|
|
||||||
}
|
|
||||||
|
|
||||||
public boolean next() throws SQLServerException {
|
|
||||||
try { return rs.next(); }
|
|
||||||
catch (SQLException e) { throw new RuntimeException(e); }
|
|
||||||
}
|
|
||||||
|
|
||||||
public long rowsWritten() { return rows; }
|
|
||||||
|
|
||||||
public Object[] getRowData() throws SQLServerException {
|
|
||||||
rows++;
|
|
||||||
// live progress: emit an in-place counter every 10k rows (the
|
|
||||||
// caller pulls one row at a time, so this runs during the load).
|
|
||||||
if (rows % 10000 == 0) { System.out.print("\r" + rows); System.out.flush(); }
|
|
||||||
Object[] row = new Object[cols];
|
|
||||||
try {
|
|
||||||
for (int i = 1; i <= cols; i++) {
|
|
||||||
Object v;
|
|
||||||
switch (jdbcType[i]) {
|
|
||||||
case Types.SMALLINT:
|
|
||||||
case Types.INTEGER:
|
|
||||||
case Types.BIGINT:
|
|
||||||
case Types.REAL:
|
|
||||||
case Types.DOUBLE: v = rs.getObject(i); break;
|
|
||||||
case Types.DECIMAL: v = rs.getBigDecimal(i); break;
|
|
||||||
case Types.BIT: v = rs.getBoolean(i); break;
|
|
||||||
case Types.DATE: v = rs.getDate(i); break;
|
|
||||||
case Types.TIME: v = rs.getTime(i); break;
|
|
||||||
case Types.TIMESTAMP:v = rs.getTimestamp(i); break;
|
|
||||||
case Types.VARBINARY:v = rs.getBytes(i); break;
|
|
||||||
default: {
|
|
||||||
String s = rs.getString(i);
|
|
||||||
if (s != null && trim) { s = s.trim(); }
|
|
||||||
v = s;
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (rs.wasNull()) { v = null; }
|
|
||||||
row[i - 1] = v;
|
|
||||||
}
|
|
||||||
} catch (SQLException e) { throw new RuntimeException(e); }
|
|
||||||
return row;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
private static void outputQueryResults(ResultSet rs, int cols, String[] dtn, String format) throws SQLException {
|
private static void outputQueryResults(ResultSet rs, int cols, String[] dtn, String format) throws SQLException {
|
||||||
switch (format) {
|
switch (format) {
|
||||||
case "csv":
|
case "csv":
|
||||||
|
|||||||
@ -187,8 +187,4 @@ jrunner -scu jdbc:postgresql://source:5432/sourcedb \
|
|||||||
**Options:**
|
**Options:**
|
||||||
- `-t` - trim text fields (default: true)
|
- `-t` - trim text fields (default: true)
|
||||||
- `-c` - clear target table before insert (default: true)
|
- `-c` - clear target table before insert (default: true)
|
||||||
- `-b` - bulk load into the destination instead of batched INSERTs — far faster
|
|
||||||
on large/wide tables. SQL Server: TDS bulk-load via SQLServerBulkCopy.
|
|
||||||
Postgres: COPY FROM STDIN. Other dests (e.g. DB2) fall back to INSERT.
|
|
||||||
(migration mode only)
|
|
||||||
- `-f` - output format: csv, tsv (query mode only, default: csv)
|
- `-f` - output format: csv, tsv (query mode only, default: csv)
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user