jrunner/CLAUDE.md
Paul Trowbridge c0f6e3a6e6 Document streaming architecture and memory usage
Clarify that both query and migration modes use streaming with no array
storage. Query mode streams directly to stdout, while migration mode
streams into a SQL string buffer (250 rows). The 10k fetch size is a
JDBC driver hint for network efficiency, not application-level storage.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-25 14:28:26 -05:00

159 lines
6.6 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
jrunner is a Java CLI tool for migrating data between databases. It reads data from a source database using SQL queries and writes it to a destination table, batching inserts for performance. The tool supports multiple database types via JDBC drivers including PostgreSQL, IBM AS/400, and Microsoft SQL Server.
## Build and Test Commands
Build the project:
```bash
gradle build
# or use wrapper
./gradlew build
```
Run tests:
```bash
gradle test
# or use wrapper
./gradlew test
```
Build distribution package:
```bash
gradle build
# Creates jrunner/build/distributions/jrunner.zip
```
Local install for testing (recommended):
```bash
./gradlew installDist
# Creates executable at jrunner/build/install/jrunner/bin/jrunner
```
Deploy using interactive script:
```bash
./deploy.sh
# Choose: 1) Local install, 2) Global install to /opt, 3) Custom directory
```
## Architecture
### Single-File Design
The entire application logic resides in `jrunner/src/main/java/jrunner/jrunner.java`. This is a monolithic command-line tool with no abstraction layers or separate modules.
### Dual Mode Operation (v1.1+)
The tool operates in two modes:
**Query Mode** (new in v1.1):
- Activates automatically when destination flags are not provided
- Outputs query results to stdout in CSV or TSV format
- Silent operation - no diagnostic output, just clean data
- Designed for piping to visidata, pspg, less, or other data tools
- Format controlled by -f flag (csv or tsv)
**Migration Mode** (original functionality):
- Activates when destination flags are provided
- Reads from source, writes to destination with batched INSERTs
- Shows progress counters and timing information
### Data Flow
**Query Mode:**
1. Parse command-line arguments (-scu, -scn, -scp for source)
2. Read SQL query from file specified by -sq flag
3. Connect to source database via JDBC
4. Execute source query and fetch results (fetch size: 10,000 rows)
5. Output results to stdout in CSV or TSV format
6. Close connection and exit
**Migration Mode:**
1. Parse command-line arguments (-scu, -scn, -scp for source; -dcu, -dcn, -dcp for destination)
2. Read SQL query from file specified by -sq flag
3. Connect to source and destination databases via JDBC
4. Execute source query and fetch results (fetch size: 10,000 rows)
5. Build batched INSERT statements (250 rows per batch)
6. Execute batches against destination table specified by -dt flag
7. Optionally clear target table before insert if -c flag is set
### Type Handling
The tool includes explicit handling for different SQL data types in a switch statement (lines 229-312). Supported types include VARCHAR, TEXT, CHAR, CLOB, DATE, TIME, TIMESTAMP, and BIGINT. String types get quote escaping and optional trimming.
### Database Drivers
JDBC drivers are configured in `jrunner/build.gradle`:
- PostgreSQL: org.postgresql:postgresql:42.5.0
- IBM AS/400 (JT400): net.sf.jt400:jt400:11.0
- Microsoft SQL Server: com.microsoft.sqlserver:mssql-jdbc:9.2.0.jre8
- SQL Server Integrated Auth: com.microsoft.sqlserver:mssql-jdbc_auth:9.2.0.x64
The AS/400 driver requires explicit Class.forName() registration (line 144).
## Configuration
The project uses a YAML configuration format (run.yml) to specify database connections, SQL script paths, and runtime options. However, the main application currently uses command-line arguments instead of parsing this YAML file.
Command-line flags:
- `-scu` - source JDBC URL
- `-scn` - source username
- `-scp` - source password
- `-dcu` - destination JDBC URL (migration mode only)
- `-dcn` - destination username (migration mode only)
- `-dcp` - destination password (migration mode only)
- `-sq` - path to source SQL query file
- `-dt` - fully qualified destination table name (migration mode only)
- `-t` - trim text fields (default: true)
- `-c` - clear target table before insert (default: true, migration mode only)
- `-f` - output format: csv, tsv (query mode only, default: csv)
## Key Implementation Details
### Mode Detection
Query mode is automatically detected at runtime (line 131) by checking if all destination flags (dcu, dcn, dcp, dt) are empty. This allows seamless switching between query and migration modes without explicit mode flags.
### Query Mode Output (v1.1+)
Query mode uses dedicated output methods:
- `outputQueryResults()` - Dispatches to format-specific methods
- `outputCSV()` - RFC 4180 compliant CSV with proper quote escaping
- `outputTSV()` - Tab-separated with tabs/newlines replaced by spaces
- All output goes to stdout; no diagnostic messages in query mode
- Helper methods: `escapeCSV()` and `escapeTSV()` for proper formatting
### Memory and Streaming Architecture
Both modes use a streaming architecture with no array storage of result rows:
**Query Mode Streaming:**
- Rows are pulled from the ResultSet via `rs.next()` one at a time
- Each row is immediately formatted and written to stdout
- No accumulation in memory - pure streaming from database to stdout
- The only buffer is the JDBC driver's internal fetch buffer (10,000 rows)
**Migration Mode Streaming:**
- Rows are pulled from the ResultSet via `rs.next()` one at a time
- Each row is converted to a SQL VALUES clause string: `(val1,val2,val3)`
- VALUES clauses are accumulated into a single `sql` string variable
- When 250 rows accumulate, the string is prepended with `INSERT INTO {table} VALUES` and executed
- The `sql` string is cleared and accumulation starts again
- Only holds up to 250 rows worth of SQL text in memory at once
**JDBC Fetch Size:**
- Both modes set `stmt.setFetchSize(10000)` (line 190)
- This is a hint to the JDBC driver to fetch 10,000 rows at a time from the database
- The driver maintains this internal buffer for network efficiency
- The application code never sees or stores all 10,000 rows - it processes them one at a time via `rs.next()`
### Batch Size (Migration Mode)
INSERT statements are batched at 250 rows (hardcoded around line 356). Rows are streamed into a SQL string buffer as VALUES clauses. When 250 rows accumulate in the string, it is prepended with "INSERT INTO {table} VALUES" and executed, then the string is cleared.
### Error Handling
SQLException handling prints stack trace and exits immediately with System.exit(0). There is no transaction rollback or partial failure recovery.
### Performance Considerations
- Result set fetch size is set to 10,000 rows (line 190)
- Progress counter prints with carriage return for real-time updates (migration mode only)
- Timestamps captured at start and end for duration tracking (migration mode only)
- Query mode has no progress output to keep stdout clean for piping