- Express auth middleware checks Authorization: Basic header on all /api routes using bcrypt against LOGIN_USER/LOGIN_PASSWORD_HASH in .env - React login screen shown before app loads, stores credentials in memory, sends them with every API request, clears and returns to login on 401 - Logout button in sidebar header - manage.py option 9: set login credentials (bcrypt via node, writes to .env) - manage.py status shows whether login credentials are configured Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| api | ||
| database | ||
| examples | ||
| scripts | ||
| ui | ||
| .env.example | ||
| .gitignore | ||
| CLAUDE.md | ||
| dataflow.service | ||
| deploy.sh | ||
| manage.py | ||
| package.json | ||
| README.md | ||
| uninstall.sh | ||
Dataflow
A simple, understandable data transformation tool for ingesting, mapping, and transforming data from various sources.
What It Does
Dataflow helps you:
- Import data from CSV files (or other formats)
- Transform data using regex rules to extract meaningful information
- Map extracted values to standardized output
- Query the transformed data
Perfect for cleaning up messy data like bank transactions, product lists, or any repetitive data that needs normalization.
Core Concepts
1. Sources
Define where data comes from and how to deduplicate it.
Example: Bank transactions deduplicated by date + amount + description
2. Rules
Extract information using regex patterns.
Example: Extract merchant name from transaction description
3. Mappings
Map extracted values to clean, standardized output.
Example: "DISCOUNT DRUG MART 32" → {"vendor": "Discount Drug Mart", "category": "Healthcare"}
Architecture
- Database: PostgreSQL with JSONB for flexibility
- API: Node.js/Express for REST endpoints
- Storage: Raw data preserved, transformations are computed and stored
Design Principles
- Simple & Clear - Easy to understand what's happening
- Explicit - No hidden magic or complex triggers
- Testable - Every function can be tested independently
- Flexible - Handle varying data formats without schema changes
Getting Started
Prerequisites
- PostgreSQL 12+
- Node.js 16+
Installation
- Install dependencies:
npm install
- Configure database (copy .env.example to .env and edit):
cp .env.example .env
- Deploy database schema:
psql -U postgres -d dataflow -f database/schema.sql
psql -U postgres -d dataflow -f database/functions.sql
- Start the API server:
npm start
Quick Example
// 1. Define a source
POST /api/sources
{
"name": "bank_transactions",
"dedup_fields": ["date", "amount", "description"]
}
// 2. Create a transformation rule
POST /api/sources/bank_transactions/rules
{
"name": "extract_merchant",
"pattern": "^([A-Z][A-Z ]+)",
"field": "description"
}
// 3. Import data
POST /api/sources/bank_transactions/import
[CSV file upload]
// 4. Query transformed data
GET /api/sources/bank_transactions/records
Project Structure
dataflow/
├── database/ # PostgreSQL schema and functions
│ ├── schema.sql # Table definitions
│ └── functions.sql # Import and transformation functions
├── api/ # Express REST API
│ ├── server.js # Main server
│ └── routes/ # API route handlers
├── examples/ # Sample data and use cases
└── docs/ # Additional documentation
Status
Current Phase: Initial development - building core functionality
License
MIT