Port light/dark mode from pf_app: ThemeProvider context, CSS custom properties (Pro Dark palette), dark overrides for Tailwind classes, and Perspective viewer theme sync in Pivot. Toggle button in sidebar header. Improve toggle icons to Feather-style stroke SVGs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
260 lines
11 KiB
Markdown
260 lines
11 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Overview
|
|
|
|
Dataflow is a simple data transformation tool for importing, cleaning, and standardizing data from various sources. Built with PostgreSQL and Node.js/Express, it emphasizes clarity and simplicity over complexity.
|
|
|
|
## Core Concepts
|
|
|
|
1. **Sources** - Define data sources and deduplication rules (which fields make a record unique)
|
|
2. **Import** - Load CSV data, automatically deduplicating based on source rules
|
|
3. **Rules** - Extract information using regex patterns (e.g., extract merchant from transaction description)
|
|
4. **Mappings** - Map extracted values to standardized output (e.g., "WALMART" → {"vendor": "Walmart", "category": "Groceries"})
|
|
5. **Transform** - Apply rules and mappings to create clean, enriched data
|
|
|
|
## Architecture
|
|
|
|
### Database Schema (`database/schema.sql`)
|
|
|
|
**5 simple tables:**
|
|
- `sources` - Source definitions with `constraint_fields` array
|
|
- `records` - Imported data with `data` (raw) and `transformed` (enriched) JSONB columns
|
|
- `rules` - Regex extraction rules with `field`, `pattern`, `output_field`
|
|
- `mappings` - Input/output value mappings
|
|
- `import_log` - Audit trail
|
|
|
|
**Key design:**
|
|
- JSONB for flexible data storage
|
|
- Deduplication via MD5 hash of specified fields
|
|
- Simple, flat structure (no complex relationships)
|
|
|
|
### Database Functions (`database/functions.sql`)
|
|
|
|
**4 focused functions:**
|
|
- `import_records(source_name, data)` - Import with deduplication
|
|
- `apply_transformations(source_name, record_ids)` - Apply rules and mappings
|
|
- `get_unmapped_values(source_name, rule_name)` - Find values needing mappings
|
|
- `reprocess_records(source_name)` - Re-transform all records
|
|
|
|
**Design principle:** Each function does ONE thing. No nested CTEs, no duplication.
|
|
|
|
### API Server (`api/server.js` + `api/routes/`)
|
|
|
|
**RESTful endpoints:**
|
|
- `/api/sources` - CRUD sources, import CSV, trigger transformations
|
|
- `/api/rules` - CRUD transformation rules
|
|
- `/api/mappings` - CRUD value mappings, view unmapped values
|
|
- `/api/records` - Query and search transformed data
|
|
|
|
**Route files:**
|
|
- `routes/sources.js` - Source management and CSV import
|
|
- `routes/rules.js` - Rule management
|
|
- `routes/mappings.js` - Mapping management + unmapped values
|
|
- `routes/records.js` - Record queries and search
|
|
|
|
## Common Development Tasks
|
|
|
|
### Running the Application
|
|
|
|
```bash
|
|
# Setup (first time only)
|
|
./setup.sh
|
|
|
|
# Start development server with auto-reload
|
|
npm run dev
|
|
|
|
# Start production server
|
|
npm start
|
|
|
|
# Test API
|
|
curl http://localhost:3000/health
|
|
```
|
|
|
|
### Database Changes
|
|
|
|
When modifying schema:
|
|
1. Edit `database/schema.sql`
|
|
2. Drop and recreate schema: `psql -d dataflow -f database/schema.sql`
|
|
3. Redeploy functions: `psql -d dataflow -f database/functions.sql`
|
|
|
|
For production, write migration scripts instead of dropping schema.
|
|
|
|
### Adding a New API Endpoint
|
|
|
|
1. Add route to appropriate file in `api/routes/`
|
|
2. Follow existing patterns (async/await, error handling via `next()`)
|
|
3. Use parameterized queries to prevent SQL injection
|
|
4. Return consistent JSON format
|
|
|
|
### Testing
|
|
|
|
Manual testing workflow:
|
|
1. Create a source: `POST /api/sources`
|
|
2. Create rules: `POST /api/rules`
|
|
3. Import data: `POST /api/sources/:name/import`
|
|
4. Apply transformations: `POST /api/sources/:name/transform`
|
|
5. View results: `GET /api/records/source/:name`
|
|
|
|
See `examples/GETTING_STARTED.md` for complete curl examples.
|
|
|
|
## Design Principles
|
|
|
|
1. **Simple over clever** - Straightforward code beats optimization
|
|
2. **Explicit over implicit** - No magic, no hidden triggers
|
|
3. **Clear naming** - `data` not `rec`, `transformed` not `allj`
|
|
4. **One function, one job** - No 250-line functions
|
|
5. **JSONB for flexibility** - Handle varying schemas without migrations
|
|
|
|
## Common Patterns
|
|
|
|
### Import Flow
|
|
```
|
|
CSV file → parse → import_records() → records table (data column)
|
|
```
|
|
|
|
### Transformation Flow
|
|
```
|
|
records.data → apply_transformations() →
|
|
- Apply each rule (regex extraction)
|
|
- Look up mappings
|
|
- Merge into records.transformed
|
|
```
|
|
|
|
### Deduplication
|
|
- `constraint_key` is a JSONB object of the constraint field values (readable, no hashing)
|
|
- Dedup is enforced at import time via CTE — NO unique DB constraint on constraint_key
|
|
- **The constraint key is for cross-batch re-import protection, NOT record uniqueness**
|
|
- Within a single import batch, ALL rows insert regardless of duplicate constraint keys
|
|
- Banks legitimately send multiple identical-looking transactions (same date, description, amount)
|
|
- Example: 11 Cedar Point merchandise charges on one day — all should insert in one batch
|
|
- On re-import of overlapping date range, rows whose constraint_key already exists in DB are skipped
|
|
- This prevents double-counting when you re-run a month-to-date export the next day
|
|
- NEVER use `ON CONFLICT (constraint_key)` — there is no unique constraint and it would wrongly
|
|
drop legitimate duplicate transactions from the same batch
|
|
- Deleting an import log entry cascades to all records from that batch (import_id FK)
|
|
|
|
### Error Handling
|
|
- API routes use `try/catch` and pass errors to `next(err)`
|
|
- Server.js has global error handler
|
|
- Database functions return JSON with `success` boolean
|
|
|
|
## Light / dark mode
|
|
|
|
Theme state lives in `ui/src/theme.jsx` — a React context (`ThemeContext`) with a `ThemeProvider` that wraps the app in `main.jsx`.
|
|
|
|
- **Storage key:** `df_dark` in `localStorage`; falls back to `window.matchMedia('(prefers-color-scheme: dark)')` on first visit
|
|
- **Toggle:** button in the sidebar header in `App.jsx`; effect writes `localStorage` and toggles the `.dark` class on `<html>`
|
|
- **CSS:** `ui/src/index.css` defines CSS custom properties under `:root` (light) and `.dark`. All Tailwind color overrides are written as `.dark .bg-white { ... }` etc.
|
|
- **Palette:** dark mode uses Perspective's "Pro Dark" colours (`--bg-primary: #242526`, panels `#2a2c2f`, gridlines `#3b3f46`, text `#c5c9d0`)
|
|
- **Perspective viewer:** `Pivot.jsx` calls `viewer.setAttribute('theme', dark ? 'Pro Dark' : 'Pro Light')` on initial load and in a `useEffect([dark])` so the viewer stays in sync when the toggle fires
|
|
- **Consuming the theme:** `import useTheme from '../theme.jsx'` then `const { dark, setDark } = useTheme()`
|
|
|
|
## UI (React + Vite)
|
|
|
|
The frontend lives in `ui/src/` and is built to `public/` via `npm run build` from the `ui/` directory. **Always run `npm run build` from `ui/` after any changes to `ui/src/` files.**
|
|
|
|
### Pages
|
|
|
|
- **Sources / Rules / Mappings / Records** — standard CRUD pages
|
|
- **Pivot** (`ui/src/pages/Pivot.jsx`) — interactive pivot/crosstab powered by Perspective (`@perspective-dev` v4.4.0, loaded from CDN). See `docs/perspective-pivot.md` for the full Perspective API reference.
|
|
- **Stacks** — multi-source union views with running balance
|
|
- **Log** — import audit trail
|
|
|
|
### Pivot inspector panel
|
|
|
|
Clicking a data cell opens a right-hand inspector panel showing the underlying transactions for that cell. Key behaviors:
|
|
|
|
- **Toggle**: clicking the same cell again closes the panel. The toggle key is `JSON.stringify({ p: row.__ROW_PATH__, c: column_names })` — stable across source and stack views.
|
|
- **Listener cleanup**: the `perspective-click` handler is stored in `perspClickHandlerRef` and removed via `removeEventListener` on effect cleanup. Without this, switching views accumulates duplicate listeners that fire multiple times per click.
|
|
- **split_by filter derivation**: `detail.config.filter` from the click event may omit split_by column constraints. They are derived from `column_names` positionally (`column_names[i]` matches `config.split_by[i]`) and appended to the filter before querying.
|
|
- **Row filtering**: a temporary `table.view({ filter, expressions })` is used so Perspective evaluates expression/computed columns correctly. Falls back to JS-side `filterRowsByConfig` on error (which skips filters for fields not in raw data).
|
|
- The panel is resizable via a drag handle on its left edge (`paneWidth` state, min 240px).
|
|
- The transaction table is sortable (click header) and shows column totals for all-numeric columns.
|
|
|
|
### Pivot layout persistence
|
|
|
|
Named layouts are stored in `dataflow.pivot_layouts` for both sources and stacks. The `source_name` column holds either a source name or a stack name — the FK to `sources(name)` was dropped to allow this. Source layouts use `/api/sources/:name/layouts`; stack layouts use `/api/stacks/:name/layouts`. Both call the same DB functions (`list_pivot_layouts`, `save_pivot_layout`, `delete_pivot_layout`). `localStorage` is still used to remember the *last active layout* for a view (the `psp_layout_<name>` key), but named layout definitions live in the DB so they persist across machines.
|
|
|
|
## File Structure
|
|
|
|
```
|
|
dataflow/
|
|
├── database/
|
|
│ ├── schema.sql # Table definitions
|
|
│ └── functions.sql # Import/transform functions
|
|
├── api/
|
|
│ ├── server.js # Express server
|
|
│ └── routes/ # API endpoints
|
|
│ ├── sources.js
|
|
│ ├── rules.js
|
|
│ ├── mappings.js
|
|
│ └── records.js
|
|
├── ui/
|
|
│ ├── src/
|
|
│ │ ├── pages/ # One file per page
|
|
│ │ └── api.js # API client
|
|
│ └── package.json
|
|
├── public/ # Built UI (gitignored, generated by npm run build)
|
|
├── docs/
|
|
│ └── perspective-pivot.md # Perspective API reference
|
|
├── examples/
|
|
│ ├── GETTING_STARTED.md # Tutorial
|
|
│ └── bank_transactions.csv
|
|
├── .env.example # Config template
|
|
├── package.json
|
|
└── README.md
|
|
```
|
|
|
|
## Comparison to Legacy TPS System
|
|
|
|
This project replaces an older system (in `/opt/tps`) that had:
|
|
- 2,150 lines of complex SQL with heavy duplication
|
|
- 5 nearly-identical 200+ line functions
|
|
- Confusing names and deep nested CTEs
|
|
- Complex trigger-based processing
|
|
|
|
Dataflow achieves the same functionality with:
|
|
- ~400 lines of simple SQL
|
|
- 4 focused functions
|
|
- Clear names and linear logic
|
|
- Explicit API-triggered processing
|
|
|
|
The simplification makes it easy to understand, modify, and maintain.
|
|
|
|
## Troubleshooting
|
|
|
|
**Database connection fails:**
|
|
- Check `.env` file exists and has correct credentials
|
|
- Verify PostgreSQL is running: `psql -U postgres -l`
|
|
- Check search path is set: Should default to `dataflow` schema
|
|
|
|
**Import succeeds but transformation fails:**
|
|
- Check rules exist: `SELECT * FROM dataflow.rules WHERE source_name = 'xxx'`
|
|
- Verify field names match CSV columns
|
|
- Test regex pattern manually
|
|
- Check for SQL errors in logs
|
|
|
|
**All records marked as duplicates:**
|
|
- Verify `constraint_fields` match actual field names in data
|
|
- Check if data was already imported
|
|
- Use different source name for testing
|
|
|
|
## Adding New Features
|
|
|
|
When adding features, follow these principles:
|
|
- Add ONE function that does ONE thing
|
|
- Keep functions under 100 lines if possible
|
|
- Write clear SQL, not clever SQL
|
|
- Add API endpoint that calls the function
|
|
- Document in README.md and update examples
|
|
|
|
## Notes for Claude
|
|
|
|
- This is a **simple** system by design - don't over-engineer it
|
|
- Keep functions focused and linear
|
|
- Use JSONB for flexibility, not as a crutch for bad design
|
|
- When confused, read the examples/GETTING_STARTED.md walkthrough
|
|
- The old TPS system is in `/opt/tps` - this is a clean rewrite, not a refactor
|