Go to file
Paul Trowbridge 928a54932d Add multi-capture regex, computed view fields, collapsible rules, and live preview
- Support multi-capture-group regex: mappings.input_value changed to JSONB,
  regexp_match() result stored as scalar or array JSONB in transformed column
- Computed expression fields in generated views: {fieldname} refs substituted
  with (transformed->>'fieldname')::numeric for arithmetic in view columns
- Fix generate_source_view to DROP VIEW before CREATE (avoids column drop error)
- Collapsible rule cards that open directly to inline edit form
- Debounced live regex preview (extract + replace) with popout modal for 50 rows
- Records page now shows dfv.<source> view output instead of raw records
- Unified field table in Sources: single table with In view, Seq, expression columns
- Fix "Rule already exists" error when editing by passing rule.id directly to submit
- Fix Sources page clearing on F5 by watching sourceObj?.name in useEffect dep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-29 16:37:15 -04:00
api Add multi-capture regex, computed view fields, collapsible rules, and live preview 2026-03-29 16:37:15 -04:00
database Add multi-capture regex, computed view fields, collapsible rules, and live preview 2026-03-29 16:37:15 -04:00
examples Initial commit: dataflow data transformation tool 2026-03-28 00:44:13 -04:00
scripts Add systemd service setup script for production deployment 2026-03-28 02:45:23 -04:00
ui Add multi-capture regex, computed view fields, collapsible rules, and live preview 2026-03-29 16:37:15 -04:00
.env.example Initial commit: dataflow data transformation tool 2026-03-28 00:44:13 -04:00
.gitignore Add React UI and backend enhancements for dataflow 2026-03-29 00:35:33 -04:00
CLAUDE.md Initial commit: dataflow data transformation tool 2026-03-28 00:44:13 -04:00
package.json Add missing backend features before UI build 2026-03-28 22:48:41 -04:00
README.md Initial commit: dataflow data transformation tool 2026-03-28 00:44:13 -04:00
setup.sh Fix user existence check and add PGPASSWORD for app user during deploy 2026-03-28 01:16:45 -04:00
uninstall.sh Add interactive setup script with PostgreSQL user/database creation and uninstall script 2026-03-28 00:59:41 -04:00

Dataflow

A simple, understandable data transformation tool for ingesting, mapping, and transforming data from various sources.

What It Does

Dataflow helps you:

  1. Import data from CSV files (or other formats)
  2. Transform data using regex rules to extract meaningful information
  3. Map extracted values to standardized output
  4. Query the transformed data

Perfect for cleaning up messy data like bank transactions, product lists, or any repetitive data that needs normalization.

Core Concepts

1. Sources

Define where data comes from and how to deduplicate it.

Example: Bank transactions deduplicated by date + amount + description

2. Rules

Extract information using regex patterns.

Example: Extract merchant name from transaction description

3. Mappings

Map extracted values to clean, standardized output.

Example: "DISCOUNT DRUG MART 32" → {"vendor": "Discount Drug Mart", "category": "Healthcare"}

Architecture

  • Database: PostgreSQL with JSONB for flexibility
  • API: Node.js/Express for REST endpoints
  • Storage: Raw data preserved, transformations are computed and stored

Design Principles

  • Simple & Clear - Easy to understand what's happening
  • Explicit - No hidden magic or complex triggers
  • Testable - Every function can be tested independently
  • Flexible - Handle varying data formats without schema changes

Getting Started

Prerequisites

  • PostgreSQL 12+
  • Node.js 16+

Installation

  1. Install dependencies:
npm install
  1. Configure database (copy .env.example to .env and edit):
cp .env.example .env
  1. Deploy database schema:
psql -U postgres -d dataflow -f database/schema.sql
psql -U postgres -d dataflow -f database/functions.sql
  1. Start the API server:
npm start

Quick Example

// 1. Define a source
POST /api/sources
{
  "name": "bank_transactions",
  "dedup_fields": ["date", "amount", "description"]
}

// 2. Create a transformation rule
POST /api/sources/bank_transactions/rules
{
  "name": "extract_merchant",
  "pattern": "^([A-Z][A-Z ]+)",
  "field": "description"
}

// 3. Import data
POST /api/sources/bank_transactions/import
[CSV file upload]

// 4. Query transformed data
GET /api/sources/bank_transactions/records

Project Structure

dataflow/
├── database/           # PostgreSQL schema and functions
│   ├── schema.sql     # Table definitions
│   └── functions.sql  # Import and transformation functions
├── api/               # Express REST API
│   ├── server.js     # Main server
│   └── routes/       # API route handlers
├── examples/          # Sample data and use cases
└── docs/             # Additional documentation

Status

Current Phase: Initial development - building core functionality

License

MIT