# Dataflow A simple, understandable data transformation tool for ingesting, mapping, and transforming data from various sources. ## What It Does Dataflow helps you: 1. **Import** data from CSV files (or other formats) 2. **Transform** data using regex rules to extract meaningful information 3. **Map** extracted values to standardized output 4. **Query** the transformed data Perfect for cleaning up messy data like bank transactions, product lists, or any repetitive data that needs normalization. ## Core Concepts ### 1. Sources Define where data comes from and how to deduplicate it. **Example:** Bank transactions deduplicated by date + amount + description ### 2. Rules Extract information using regex patterns. **Example:** Extract merchant name from transaction description ### 3. Mappings Map extracted values to clean, standardized output. **Example:** "DISCOUNT DRUG MART 32" → {"vendor": "Discount Drug Mart", "category": "Healthcare"} ## Architecture - **Database:** PostgreSQL with JSONB for flexibility - **API:** Node.js/Express for REST endpoints - **Storage:** Raw data preserved, transformations are computed and stored ## Design Principles - **Simple & Clear** - Easy to understand what's happening - **Explicit** - No hidden magic or complex triggers - **Testable** - Every function can be tested independently - **Flexible** - Handle varying data formats without schema changes ## Getting Started ### Prerequisites - PostgreSQL 12+ - Node.js 16+ ### Installation 1. Install dependencies: ```bash npm install ``` 2. Configure database (copy .env.example to .env and edit): ```bash cp .env.example .env ``` 3. Deploy database schema: ```bash psql -U postgres -d dataflow -f database/schema.sql psql -U postgres -d dataflow -f database/functions.sql ``` 4. Start the API server: ```bash npm start ``` ## Quick Example ```javascript // 1. Define a source POST /api/sources { "name": "bank_transactions", "dedup_fields": ["date", "amount", "description"] } // 2. Create a transformation rule POST /api/sources/bank_transactions/rules { "name": "extract_merchant", "pattern": "^([A-Z][A-Z ]+)", "field": "description" } // 3. Import data POST /api/sources/bank_transactions/import [CSV file upload] // 4. Query transformed data GET /api/sources/bank_transactions/records ``` ## Project Structure ``` dataflow/ ├── database/ # PostgreSQL schema and functions │ ├── schema.sql # Table definitions │ └── functions.sql # Import and transformation functions ├── api/ # Express REST API │ ├── server.js # Main server │ └── routes/ # API route handlers ├── examples/ # Sample data and use cases └── docs/ # Additional documentation ``` ## Status **Current Phase:** Initial development - building core functionality ## License MIT