Compare commits

..

No commits in common. "master" and "passfile" have entirely different histories.

3 changed files with 9 additions and 245 deletions

View File

@ -58,30 +58,9 @@ The tool operates in two modes:
**Migration Mode** (original functionality): **Migration Mode** (original functionality):
- Activates when destination flags are provided - Activates when destination flags are provided
- Reads from source, writes to destination with batched INSERTs (or bulk copy with `-b`) - Reads from source, writes to destination with batched INSERTs
- Shows progress counters and timing information - Shows progress counters and timing information
**Bulk Copy** (migration mode, `-b`) — uses the dest's native bulk path; falls
back to the INSERT path for any other dest (e.g. DB2):
*SQL Server dest* — streams the source ResultSet over the TDS bulk-load
protocol via `SQLServerBulkCopy` (no per-batch INSERT round trips; a 1.27M-row,
~298-col load went ~111 min → ~4 min). A `BulkSource` adapter
(`ISQLServerBulkData`) maps source type names to JDBC types we control:
string-ish types (text/varchar/char/bpchar/json/jsonb/uuid **and numeric**) are
declared NVARCHAR and read via `getString` so SQL Server converts losslessly —
numeric goes this route because PG reports unconstrained numeric as scale 0,
which a typed DECIMAL path would round (123.45 → 123).
*Postgres dest* — streams via `COPY <table> FROM STDIN WITH (FORMAT csv)` using
the JDBC `CopyManager`. COPY is text-based, so the server parses each field into
the column type — no per-type handling. Every non-null value is CSV-quoted
(empty string stays distinct from NULL, which is an empty unquoted field); rows
flush in 1000-row buffers.
Both emit a `\r`-counter every 10k rows for live progress and print the final
row count.
### Data Flow ### Data Flow
**Query Mode:** **Query Mode:**
@ -97,15 +76,12 @@ row count.
2. Read SQL query from file specified by -sq flag 2. Read SQL query from file specified by -sq flag
3. Connect to source and destination databases via JDBC 3. Connect to source and destination databases via JDBC
4. Execute source query and fetch results (fetch size: 10,000 rows) 4. Execute source query and fetch results (fetch size: 10,000 rows)
5. Optionally clear target table before insert if -c flag is set 5. Build batched INSERT statements (250 rows per batch)
6. With `-b`: bulk-load via the dest's native path (SQL Server → `SQLServerBulkCopy`, 6. Execute batches against destination table specified by -dt flag
Postgres → `COPY FROM STDIN`). Otherwise: build batched INSERT statements 7. Optionally clear target table before insert if -c flag is set
(250 rows per batch) and execute them against the destination table (-dt)
### Type Handling ### Type Handling
The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG `char(n)`), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (`'` → `''`) and optional trimming. The tool includes explicit handling for different SQL data types in a switch statement (lines 229-312). Supported types include VARCHAR, TEXT, CHAR, CLOB, DATE, TIME, TIMESTAMP, and BIGINT. String types get quote escaping and optional trimming.
**Caveat — the `default` case emits values UNQUOTED** (correct for numerics like INT*/NUMERIC, which is why they're not listed). Any *string-typed* column whose JDBC type name isn't in the switch falls here and breaks the generated INSERT with a syntax error (e.g. PostgreSQL `bool``'t'`/`'f'` is currently unhandled). When adding a new source type, decide: numeric → leave to default; anything string-like → add a quoted case. A more robust future fix is to flip the default to quote-as-string with an explicit numeric allowlist.
### Database Drivers ### Database Drivers
JDBC drivers are configured in `jrunner/build.gradle`: JDBC drivers are configured in `jrunner/build.gradle`:
@ -131,7 +107,6 @@ Command-line flags:
- `-dt` - fully qualified destination table name (migration mode only) - `-dt` - fully qualified destination table name (migration mode only)
- `-t` - trim text fields (default: true) - `-t` - trim text fields (default: true)
- `-c` - clear target table before insert (default: true, migration mode only) - `-c` - clear target table before insert (default: true, migration mode only)
- `-b` - bulk load into dest (migration mode): SQL Server via SQLServerBulkCopy, Postgres via COPY
- `-f` - output format: csv, tsv (query mode only, default: csv) - `-f` - output format: csv, tsv (query mode only, default: csv)
## Key Implementation Details ## Key Implementation Details
@ -165,10 +140,10 @@ Both modes use a streaming architecture with no array storage of result rows:
- Only holds up to 250 rows worth of SQL text in memory at once - Only holds up to 250 rows worth of SQL text in memory at once
**JDBC Fetch Size:** **JDBC Fetch Size:**
- Both modes set `stmt.setFetchSize(10000)` — a hint to fetch 10,000 rows at a time - Both modes set `stmt.setFetchSize(10000)` (line 190)
- The application processes rows one at a time via `rs.next()`; the only buffer is the driver's fetch window - This is a hint to the JDBC driver to fetch 10,000 rows at a time from the database
- The driver maintains this internal buffer for network efficiency
**⚠️ PostgreSQL requires autoCommit=false for fetchSize to take effect.** The PG JDBC driver IGNORES `setFetchSize` while autoCommit is true and instead loads the ENTIRE result set into memory (OOMs / GC-thrashes on large source tables). So in **migration mode** the source connection is set to `setAutoCommit(false)` right after connecting, which enables a server-side cursor and makes streaming actually stream. This is done **only in migration mode** — query mode leaves autoCommit at its default because callers run committed DDL/DML through query mode (e.g. external tools), and autoCommit=false would roll those statements back on connection close. (jt400/MSSQL drivers stream regardless, so only PG is affected.) - The application code never sees or stores all 10,000 rows - it processes them one at a time via `rs.next()`
### Batch Size (Migration Mode) ### Batch Size (Migration Mode)
INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared. INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.

View File

@ -6,13 +6,6 @@ import java.nio.file.Path ;
import java.nio.file.Paths; import java.nio.file.Paths;
import java.time.*; import java.time.*;
import java.io.IOException; import java.io.IOException;
import com.microsoft.sqlserver.jdbc.SQLServerBulkCopy;
import com.microsoft.sqlserver.jdbc.SQLServerBulkCopyOptions;
import com.microsoft.sqlserver.jdbc.ISQLServerBulkData;
import com.microsoft.sqlserver.jdbc.SQLServerException;
import org.postgresql.PGConnection;
import org.postgresql.copy.CopyManager;
import org.postgresql.copy.CopyIn;
public class jrunner { public class jrunner {
//static final String QUERY = "SELECT * from rlarp.osm LIMIT 100"; //static final String QUERY = "SELECT * from rlarp.osm LIMIT 100";
@ -31,7 +24,6 @@ public class jrunner {
String dt = ""; String dt = "";
Boolean trim = true; Boolean trim = true;
Boolean clear = true; Boolean clear = true;
Boolean bulk = false;
Integer r = 0; Integer r = 0;
Integer t = 0; Integer t = 0;
String sql = ""; String sql = "";
@ -65,7 +57,6 @@ public class jrunner {
msg = msg + nl + "-dt fully qualified name of destination table"; msg = msg + nl + "-dt fully qualified name of destination table";
msg = msg + nl + "-t trim text"; msg = msg + nl + "-t trim text";
msg = msg + nl + "-c clear target table"; msg = msg + nl + "-c clear target table";
msg = msg + nl + "-b bulk copy into destination (SQL Server dest only)";
msg = msg + nl + "-f output format (csv, tsv, table, json) - default: csv"; msg = msg + nl + "-f output format (csv, tsv, table, json) - default: csv";
msg = msg + nl + "--help info"; msg = msg + nl + "--help info";
msg = msg + nl + ""; msg = msg + nl + "";
@ -134,9 +125,6 @@ public class jrunner {
case "-c": case "-c":
clear = true; clear = true;
break; break;
case "-b":
bulk = true;
break;
case "-f": case "-f":
outputFormat = args[i+1].toLowerCase(); outputFormat = args[i+1].toLowerCase();
break; break;
@ -312,72 +300,6 @@ public class jrunner {
e.printStackTrace(); e.printStackTrace();
System.exit(0); System.exit(0);
} }
} else if (bulk && dcu.toLowerCase().startsWith("jdbc:sqlserver:")) {
//-------------------------------bulk copy (SQL Server dest)-------------------------------------------------
// Stream the source ResultSet straight into SQL Server over the TDS
// bulk-load protocol no per-row INSERT round trips. Source type
// names map to JDBC types via BulkSource (string-ish -> NVARCHAR).
System.out.println("------------bulk copy-------------------------------------");
try {
SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(dcon);
bulkCopy.setDestinationTableName(dt);
SQLServerBulkCopyOptions options = new SQLServerBulkCopyOptions();
options.setBatchSize(10000);
options.setBulkCopyTimeout(0);
bulkCopy.setBulkCopyOptions(options);
BulkSource src = new BulkSource(rs, cols, dtn, trim);
bulkCopy.writeToServer(src);
bulkCopy.close();
// leading \r starts a fresh line so the count doesn't concatenate
// onto the last progress tick; the trailing " rows written" makes
// it parseable.
System.out.print("\r" + src.rowsWritten());
} catch (Exception e) {
e.printStackTrace();
System.exit(0);
}
} else if (bulk && dcu.toLowerCase().startsWith("jdbc:postgresql:")) {
//-------------------------------bulk copy (COPY, Postgres dest)--------------------------------------------
// Stream the source ResultSet into Postgres via COPY ... FROM STDIN.
// COPY is text-based: each field is sent as CSV text and the server
// parses it into the column type, so there's no per-type quoting.
// Non-null values are always CSV-quoted; NULL is an empty unquoted
// field; column order must match the dest (positional, as always).
System.out.println("------------bulk copy (COPY)------------------------------");
try {
CopyManager cm = ((PGConnection) dcon).getCopyAPI();
CopyIn cin = cm.copyIn("COPY " + dt + " FROM STDIN WITH (FORMAT csv)");
StringBuilder buf = new StringBuilder();
long rows = 0;
while (rs.next()) {
for (int i = 1; i <= cols; i++) {
if (i > 1) { buf.append(','); }
String val = rs.getString(i);
if (!rs.wasNull() && val != null) {
if (trim) { val = val.trim(); }
buf.append('"').append(val.replace("\"", "\"\"")).append('"');
}
// else: empty field -> NULL
}
buf.append('\n');
rows++;
if (rows % 1000 == 0) {
byte[] b = buf.toString().getBytes(java.nio.charset.StandardCharsets.UTF_8);
cin.writeToCopy(b, 0, b.length);
buf.setLength(0);
if (rows % 10000 == 0) { System.out.print("\r" + rows); System.out.flush(); }
}
}
if (buf.length() > 0) {
byte[] b = buf.toString().getBytes(java.nio.charset.StandardCharsets.UTF_8);
cin.writeToCopy(b, 0, b.length);
}
cin.endCopy();
System.out.print("\r" + rows);
} catch (Exception e) {
e.printStackTrace();
System.exit(0);
}
} else { } else {
System.out.println("------------row count-------------------------------------"); System.out.println("------------row count-------------------------------------");
//-------------------------------build & execute sql------------------------------------------------------------- //-------------------------------build & execute sql-------------------------------------------------------------
@ -556,135 +478,6 @@ public class jrunner {
System.out.println(); System.out.println();
} }
// Adapts a source ResultSet to SQLServerBulkCopy. Maps source type names to
// JDBC types we control: string-ish PG types (text/varchar/char/bpchar/json/
// jsonb/uuid/...) are declared NVARCHAR and read via getString mirroring the
// INSERT path's "quote it" choices and sidestepping unsupported types (jsonb
// reports as OTHER, which writeToServer(ResultSet) can't handle directly).
static class BulkSource implements ISQLServerBulkData {
private final ResultSet rs;
private final ResultSetMetaData md;
private final int cols;
private final boolean trim;
private final int[] jdbcType;
private final boolean[] asString;
private long rows = 0;
BulkSource(ResultSet rs, int cols, String[] dtn, boolean trim) throws SQLException {
this.rs = rs;
this.cols = cols;
this.trim = trim;
this.md = rs.getMetaData();
this.jdbcType = new int[cols + 1];
this.asString = new boolean[cols + 1];
for (int i = 1; i <= cols; i++) {
String t = (dtn[i] == null ? "" : dtn[i].toUpperCase());
switch (t) {
case "INT2": case "SMALLINT":
jdbcType[i] = Types.SMALLINT; break;
case "INT4": case "INT": case "INTEGER": case "SERIAL":
jdbcType[i] = Types.INTEGER; break;
case "INT8": case "BIGINT": case "BIGSERIAL":
jdbcType[i] = Types.BIGINT; break;
// numeric/decimal: PG reports unconstrained numeric as
// scale 0, which makes bulk copy round (123.45 -> 123). Send
// the exact text instead and let SQL Server convert it into
// the dest column losslessly.
case "FLOAT4": case "REAL":
jdbcType[i] = Types.REAL; break;
case "FLOAT8": case "DOUBLE": case "DOUBLE PRECISION":
jdbcType[i] = Types.DOUBLE; break;
case "BOOL": case "BOOLEAN": case "BIT":
jdbcType[i] = Types.BIT; break;
case "DATE":
jdbcType[i] = Types.DATE; break;
case "TIME":
jdbcType[i] = Types.TIME; break;
case "TIMESTAMP": case "TIMESTAMPTZ":
case "DATETIME": case "DATETIME2": case "SMALLDATETIME":
jdbcType[i] = Types.TIMESTAMP; break;
case "BYTEA": case "BINARY": case "VARBINARY":
jdbcType[i] = Types.VARBINARY; break;
default:
jdbcType[i] = Types.LONGNVARCHAR;
asString[i] = true;
break;
}
}
}
public java.util.Set<Integer> getColumnOrdinals() {
java.util.Set<Integer> ords = new java.util.TreeSet<Integer>();
for (int i = 1; i <= cols; i++) { ords.add(i); }
return ords;
}
public String getColumnName(int column) {
try { return md.getColumnName(column); }
catch (SQLException e) { return "col" + column; }
}
public int getColumnType(int column) { return jdbcType[column]; }
public int getPrecision(int column) {
if (asString[column]) { return 0; } // LONGNVARCHAR -> nvarchar(max)
try {
int p = md.getPrecision(column);
return (p > 0 ? p : 38);
} catch (SQLException e) { return 38; }
}
public int getScale(int column) {
if (jdbcType[column] == Types.DECIMAL) {
try { return md.getScale(column); }
catch (SQLException e) { return 0; }
}
return 0;
}
public boolean next() throws SQLServerException {
try { return rs.next(); }
catch (SQLException e) { throw new RuntimeException(e); }
}
public long rowsWritten() { return rows; }
public Object[] getRowData() throws SQLServerException {
rows++;
// live progress: emit an in-place counter every 10k rows (the
// caller pulls one row at a time, so this runs during the load).
if (rows % 10000 == 0) { System.out.print("\r" + rows); System.out.flush(); }
Object[] row = new Object[cols];
try {
for (int i = 1; i <= cols; i++) {
Object v;
switch (jdbcType[i]) {
case Types.SMALLINT:
case Types.INTEGER:
case Types.BIGINT:
case Types.REAL:
case Types.DOUBLE: v = rs.getObject(i); break;
case Types.DECIMAL: v = rs.getBigDecimal(i); break;
case Types.BIT: v = rs.getBoolean(i); break;
case Types.DATE: v = rs.getDate(i); break;
case Types.TIME: v = rs.getTime(i); break;
case Types.TIMESTAMP:v = rs.getTimestamp(i); break;
case Types.VARBINARY:v = rs.getBytes(i); break;
default: {
String s = rs.getString(i);
if (s != null && trim) { s = s.trim(); }
v = s;
break;
}
}
if (rs.wasNull()) { v = null; }
row[i - 1] = v;
}
} catch (SQLException e) { throw new RuntimeException(e); }
return row;
}
}
private static void outputQueryResults(ResultSet rs, int cols, String[] dtn, String format) throws SQLException { private static void outputQueryResults(ResultSet rs, int cols, String[] dtn, String format) throws SQLException {
switch (format) { switch (format) {
case "csv": case "csv":

View File

@ -187,8 +187,4 @@ jrunner -scu jdbc:postgresql://source:5432/sourcedb \
**Options:** **Options:**
- `-t` - trim text fields (default: true) - `-t` - trim text fields (default: true)
- `-c` - clear target table before insert (default: true) - `-c` - clear target table before insert (default: true)
- `-b` - bulk load into the destination instead of batched INSERTs — far faster
on large/wide tables. SQL Server: TDS bulk-load via SQLServerBulkCopy.
Postgres: COPY FROM STDIN. Other dests (e.g. DB2) fall back to INSERT.
(migration mode only)
- `-f` - output format: csv, tsv (query mode only, default: csv) - `-f` - output format: csv, tsv (query mode only, default: csv)