pt/jrunner

Paul Trowbridge 2ced7810d9 docs: -b now covers Postgres (COPY) as well as SQL Server

Update readme + CLAUDE: -b is no longer SQL-Server-only. Describe the Postgres
COPY FROM STDIN path (CopyManager, text-based, CSV-quoted, empty vs NULL) next
to the existing SQL Server SQLServerBulkCopy path; DB2 still falls back to INSERT.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-18 23:16:00 -04:00

9.3 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

jrunner is a Java CLI tool for migrating data between databases. It reads data from a source database using SQL queries and writes it to a destination table, batching inserts for performance. The tool supports multiple database types via JDBC drivers including PostgreSQL, IBM AS/400, and Microsoft SQL Server.

Build and Test Commands

Build the project:

gradle build
# or use wrapper
./gradlew build

Run tests:

gradle test
# or use wrapper
./gradlew test

Build distribution package:

gradle build
# Creates jrunner/build/distributions/jrunner.zip

Local install for testing (recommended):

./gradlew installDist
# Creates executable at jrunner/build/install/jrunner/bin/jrunner

Deploy using interactive script:

./deploy.sh
# Choose: 1) Local install, 2) Global install to /opt, 3) Custom directory

Architecture

Single-File Design

The entire application logic resides in jrunner/src/main/java/jrunner/jrunner.java. This is a monolithic command-line tool with no abstraction layers or separate modules.

Dual Mode Operation (v1.1+)

The tool operates in two modes:

Query Mode (new in v1.1):

Activates automatically when destination flags are not provided
Outputs query results to stdout in CSV or TSV format
Silent operation - no diagnostic output, just clean data
Designed for piping to visidata, pspg, less, or other data tools
Format controlled by -f flag (csv or tsv)

Migration Mode (original functionality):

Activates when destination flags are provided
Reads from source, writes to destination with batched INSERTs (or bulk copy with -b)
Shows progress counters and timing information

Bulk Copy (migration mode, -b) — uses the dest's native bulk path; falls back to the INSERT path for any other dest (e.g. DB2):

SQL Server dest — streams the source ResultSet over the TDS bulk-load protocol via SQLServerBulkCopy (no per-batch INSERT round trips; a 1.27M-row, ~298-col load went ~111 min → ~4 min). A BulkSource adapter (ISQLServerBulkData) maps source type names to JDBC types we control: string-ish types (text/varchar/char/bpchar/json/jsonb/uuid and numeric) are declared NVARCHAR and read via getString so SQL Server converts losslessly — numeric goes this route because PG reports unconstrained numeric as scale 0, which a typed DECIMAL path would round (123.45 → 123).

Postgres dest — streams via COPY <table> FROM STDIN WITH (FORMAT csv) using the JDBC CopyManager. COPY is text-based, so the server parses each field into the column type — no per-type handling. Every non-null value is CSV-quoted (empty string stays distinct from NULL, which is an empty unquoted field); rows flush in 1000-row buffers.

Both emit a \r-counter every 10k rows for live progress and print the final row count.

Data Flow

Query Mode:

Parse command-line arguments (-scu, -scn, -scp for source)
Read SQL query from file specified by -sq flag
Connect to source database via JDBC
Execute source query and fetch results (fetch size: 10,000 rows)
Output results to stdout in CSV or TSV format
Close connection and exit

Migration Mode:

Parse command-line arguments (-scu, -scn, -scp for source; -dcu, -dcn, -dcp for destination)
Read SQL query from file specified by -sq flag
Connect to source and destination databases via JDBC
Execute source query and fetch results (fetch size: 10,000 rows)
Optionally clear target table before insert if -c flag is set
With -b: bulk-load via the dest's native path (SQL Server → SQLServerBulkCopy, Postgres → COPY FROM STDIN). Otherwise: build batched INSERT statements (250 rows per batch) and execute them against the destination table (-dt)

Type Handling

The tool includes explicit handling for different SQL data types in a switch statement (migration mode). Quoted string types: VARCHAR/NVARCHAR, TEXT/NTEXT, CHAR/NCHAR, CLOB/NCLOB, and the PostgreSQL string-ish types JSON, JSONB, BPCHAR (PG char(n)), and UUID. Date/time types (DATE, TIME, TIMESTAMP/DATETIME variants) are also quoted. String types get quote escaping (' → '') and optional trimming.

Caveat — the default case emits values UNQUOTED (correct for numerics like INT*/NUMERIC, which is why they're not listed). Any string-typed column whose JDBC type name isn't in the switch falls here and breaks the generated INSERT with a syntax error (e.g. PostgreSQL bool → 't'/'f' is currently unhandled). When adding a new source type, decide: numeric → leave to default; anything string-like → add a quoted case. A more robust future fix is to flip the default to quote-as-string with an explicit numeric allowlist.

Database Drivers

JDBC drivers are configured in jrunner/build.gradle:

PostgreSQL: org.postgresql:postgresql:42.5.0
IBM AS/400 (JT400): net.sf.jt400:jt400:11.0
Microsoft SQL Server: com.microsoft.sqlserver:mssql-jdbc:9.2.0.jre8
SQL Server Integrated Auth: com.microsoft.sqlserver:mssql-jdbc_auth:9.2.0.x64

The AS/400 driver requires explicit Class.forName() registration (line 144).

Configuration

The project uses a YAML configuration format (run.yml) to specify database connections, SQL script paths, and runtime options. However, the main application currently uses command-line arguments instead of parsing this YAML file.

Command-line flags:

-scu - source JDBC URL
-scn - source username
-scp - source password
-dcu - destination JDBC URL (migration mode only)
-dcn - destination username (migration mode only)
-dcp - destination password (migration mode only)
-sq - path to source SQL query file
-dt - fully qualified destination table name (migration mode only)
-t - trim text fields (default: true)
-c - clear target table before insert (default: true, migration mode only)
-b - bulk load into dest (migration mode): SQL Server via SQLServerBulkCopy, Postgres via COPY
-f - output format: csv, tsv (query mode only, default: csv)

Key Implementation Details

Mode Detection

Query mode is automatically detected at runtime (line 131) by checking if all destination flags (dcu, dcn, dcp, dt) are empty. This allows seamless switching between query and migration modes without explicit mode flags.

Query Mode Output (v1.1+)

Query mode uses dedicated output methods:

outputQueryResults() - Dispatches to format-specific methods
outputCSV() - RFC 4180 compliant CSV with proper quote escaping
outputTSV() - Tab-separated with tabs/newlines replaced by spaces
All output goes to stdout; no diagnostic messages in query mode
Helper methods: escapeCSV() and escapeTSV() for proper formatting

Memory and Streaming Architecture

Both modes use a streaming architecture with no array storage of result rows:

Query Mode Streaming:

Rows are pulled from the ResultSet via rs.next() one at a time
Each row is immediately formatted and written to stdout
No accumulation in memory - pure streaming from database to stdout
The only buffer is the JDBC driver's internal fetch buffer (10,000 rows)

Migration Mode Streaming:

Rows are pulled from the ResultSet via rs.next() one at a time
Each row is converted to a SQL VALUES clause string: (val1,val2,val3)
VALUES clauses are accumulated into a single sql string variable
When 250 rows accumulate, the string is prepended with INSERT INTO {table} VALUES and executed
The sql string is cleared and accumulation starts again
Only holds up to 250 rows worth of SQL text in memory at once

JDBC Fetch Size:

Both modes set stmt.setFetchSize(10000) — a hint to fetch 10,000 rows at a time
The application processes rows one at a time via rs.next(); the only buffer is the driver's fetch window

⚠️ PostgreSQL requires autoCommit=false for fetchSize to take effect. The PG JDBC driver IGNORES setFetchSize while autoCommit is true and instead loads the ENTIRE result set into memory (OOMs / GC-thrashes on large source tables). So in migration mode the source connection is set to setAutoCommit(false) right after connecting, which enables a server-side cursor and makes streaming actually stream. This is done only in migration mode — query mode leaves autoCommit at its default because callers run committed DDL/DML through query mode (e.g. external tools), and autoCommit=false would roll those statements back on connection close. (jt400/MSSQL drivers stream regardless, so only PG is affected.)

Batch Size (Migration Mode)

INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.

Error Handling

SQLException handling prints stack trace and exits immediately with System.exit(0). There is no transaction rollback or partial failure recovery.

Performance Considerations

Result set fetch size is set to 10,000 rows (line 190)
Progress counter prints with carriage return for real-time updates (migration mode only)
Timestamps captured at start and end for duration tracking (migration mode only)
Query mode has no progress output to keep stdout clean for piping

9.3 KiB Raw Blame History