Update README to reflect current state of the project

Documents manage.py menu, adds full API reference tables, fixes incorrect route in quick example, and removes stale sections (docs/ dir, initial development status). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:42:49 -04:00 · 2026-04-05 22:42:49 -04:00 · 3cc8bc635a
commit 3cc8bc635a
parent 291c665ed1
1 changed files with 129 additions and 50 deletions
--- a/README.md
+++ b/README.md
@ -1,14 +1,14 @@
 # Dataflow
-A simple, understandable data transformation tool for ingesting, mapping, and transforming data from various sources.
+A simple data transformation tool for importing, cleaning, and standardizing data from various sources.
 ## What It Does
 Dataflow helps you:
-1. **Import** data from CSV files (or other formats)
+1. **Import** CSV data with automatic deduplication
 2. **Transform** data using regex rules to extract meaningful information
 3. **Map** extracted values to standardized output
-4. **Query** the transformed data
+4. **Query** the transformed data via a web UI or REST API
 Perfect for cleaning up messy data like bank transactions, product lists, or any repetitive data that needs normalization.
@ -20,26 +20,26 @@ Define where data comes from and how to deduplicate it.
 **Example:** Bank transactions deduplicated by date + amount + description
 ### 2. Rules
-Extract information using regex patterns.
+Extract information using regex patterns (`extract` or `replace` modes).
 **Example:** Extract merchant name from transaction description
 ### 3. Mappings
 Map extracted values to clean, standardized output.
-**Example:** "DISCOUNT DRUG MART 32" → {"vendor": "Discount Drug Mart", "category": "Healthcare"}
+**Example:** "DISCOUNT DRUG MART 32" → `{"vendor": "Discount Drug Mart", "category": "Healthcare"}`
 ## Architecture
- **Database:** PostgreSQL with JSONB for flexibility
+- **Database:** PostgreSQL with JSONB for flexible data storage
- **API:** Node.js/Express for REST endpoints
+- **API:** Node.js/Express REST API
- **Storage:** Raw data preserved, transformations are computed and stored
+- **UI:** React SPA served from `public/`
 - **Auth:** HTTP Basic auth (configured in `.env`)
 ## Design Principles
 - **Simple & Clear** - Easy to understand what's happening
 - **Explicit** - No hidden magic or complex triggers
 - **Testable** - Every function can be tested independently
 - **Flexible** - Handle varying data formats without schema changes
 ## Getting Started
@ -47,74 +47,153 @@ Map extracted values to clean, standardized output.
 ### Prerequisites
 - PostgreSQL 12+
 - Node.js 16+
 - Python 3 (for `manage.py`)
 ### Installation
-1. Install dependencies:
+1. Install Node dependencies:
 ```bash
 npm install
 ```
-2. Configure database (copy .env.example to .env and edit):
+2. Run the management script to configure and deploy everything:
 ```bash
-cp .env.example .env
+python3 manage.py
 ```
-3. Deploy database schema:
+For development with auto-reload:
 ```bash
-psql -U postgres -d dataflow -f database/schema.sql
+npm run dev
 psql -U postgres -d dataflow -f database/functions.sql
 ```
-4. Start the API server:
+The UI is available at `http://localhost:3000`. The API is at `http://localhost:3000/api`.
-```bash
+
-npm start
+## Management Script (`manage.py`)
 `manage.py` is an interactive tool for configuring, deploying, and managing the service. Run it and choose from the numbered menu:
 ```
 python3 manage.py
 ```
-## Quick Example
+| # | Action |
 |---|--------|
 | 1 | **Database configuration** — create/update `.env`, optionally create the PostgreSQL user/database, and deploy schema + functions |
 | 2 | Redeploy schema only (`database/schema.sql`) — drops and recreates all tables |
 | 3 | Redeploy SQL functions only (`database/queries/`) |
 | 4 | Build UI (`ui/` → `public/`) |
 | 5 | Set up nginx reverse proxy (HTTP or HTTPS via certbot) |
 | 6 | Install systemd service unit (`dataflow.service`) |
 | 7 | Start / restart `dataflow.service` |
 | 8 | Stop `dataflow.service` |
 | 9 | Set login credentials (`LOGIN_USER` / `LOGIN_PASSWORD_HASH` in `.env`) |
-```javascript
+The status screen at the top of the menu shows the current state of each component (database connection, schema, UI build, service, nginx).
 // 1. Define a source
 POST /api/sources
 {
  "name": "bank_transactions",
  "dedup_fields": ["date", "amount", "description"]
 }
-// 2. Create a transformation rule
+**Typical first-time setup:** run options 1 → 4 → 9 → 6 → 7 (→ 5 if you want nginx).
 POST /api/sources/bank_transactions/rules
 {
  "name": "extract_merchant",
  "pattern": "^([A-Z][A-Z ]+)",
  "field": "description"
 }
-// 3. Import data
+## API Reference
-POST /api/sources/bank_transactions/import
+
-[CSV file upload]
+All `/api` routes require HTTP Basic authentication.
 ### Sources — `/api/sources`
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/sources` | List all sources |
 | POST | `/api/sources` | Create a source |
 | GET | `/api/sources/:name` | Get a source |
 | PUT | `/api/sources/:name` | Update a source |
 | DELETE | `/api/sources/:name` | Delete a source |
 | POST | `/api/sources/suggest` | Suggest source definition from CSV upload |
 | POST | `/api/sources/:name/import` | Import CSV data |
 | GET | `/api/sources/:name/import-log` | View import history |
 | POST | `/api/sources/:name/transform` | Apply rules and mappings to records |
 | POST | `/api/sources/:name/reprocess` | Re-transform all records |
 | GET | `/api/sources/:name/fields` | List all known field names |
 | GET | `/api/sources/:name/stats` | Get record and mapping counts |
 | POST | `/api/sources/:name/view` | Generate output view |
 | GET | `/api/sources/:name/view-data` | Query output view (paginated, sortable) |
 ### Rules — `/api/rules`
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/rules/source/:source_name` | List rules for a source |
 | POST | `/api/rules` | Create a rule |
 | GET | `/api/rules/:id` | Get a rule |
 | PUT | `/api/rules/:id` | Update a rule |
 | DELETE | `/api/rules/:id` | Delete a rule |
 | GET | `/api/rules/preview` | Preview a pattern against real records (ad-hoc) |
 | GET | `/api/rules/:id/test` | Test a saved rule against real records |
 ### Mappings — `/api/mappings`
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/mappings/source/:source_name` | List mappings |
 | POST | `/api/mappings` | Create a mapping |
 | POST | `/api/mappings/bulk` | Bulk create/update mappings |
 | GET | `/api/mappings/:id` | Get a mapping |
 | PUT | `/api/mappings/:id` | Update a mapping |
 | DELETE | `/api/mappings/:id` | Delete a mapping |
 | GET | `/api/mappings/source/:source_name/unmapped` | Get values with no mapping yet |
 | GET | `/api/mappings/source/:source_name/all-values` | All extracted values with counts |
 | GET | `/api/mappings/source/:source_name/counts` | Record counts for existing mappings |
 | GET | `/api/mappings/source/:source_name/export.tsv` | Export values as TSV |
 | POST | `/api/mappings/source/:source_name/import-csv` | Import mappings from TSV |
 ### Records — `/api/records`
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/api/records/source/:source_name` | List records (paginated) |
 | GET | `/api/records/:id` | Get a single record |
 | POST | `/api/records/search` | Search records |
 | DELETE | `/api/records/:id` | Delete a record |
 | DELETE | `/api/records/source/:source_name/all` | Delete all records for a source |
 ## Typical Workflow
 // 4. Query transformed data
 GET /api/sources/bank_transactions/records
 ```
 1. Create a source (POST /api/sources)
 2. Import CSV data (POST /api/sources/:name/import)
 3. Create transformation rules (POST /api/rules)
 4. Preview rules against real data (GET /api/rules/preview)
 5. Apply transformations (POST /api/sources/:name/transform)
 6. Review unmapped values (GET /api/mappings/source/:name/unmapped)
 7. Add mappings (POST /api/mappings or bulk import via TSV)
 8. Reprocess to apply new mappings (POST /api/sources/:name/reprocess)
 9. Query results (GET /api/sources/:name/view-data)
 ```
 See `examples/GETTING_STARTED.md` for a complete walkthrough with curl examples.
 ## Project Structure
 ```
 dataflow/
-├── database/           # PostgreSQL schema and functions
+├── database/
-│   ├── schema.sql     # Table definitions
+│   ├── schema.sql         # Table definitions
-│   └── functions.sql  # Import and transformation functions
+│   └── functions.sql      # Import/transform/query functions
-├── api/               # Express REST API
+├── api/
-│   ├── server.js     # Main server
+│   ├── server.js          # Express server
-│   └── routes/       # API route handlers
+│   ├── middleware/
-├── examples/          # Sample data and use cases
+│   │   └── auth.js        # Basic auth middleware
-└── docs/             # Additional documentation
+│   ├── lib/
 │   │   └── sql.js         # SQL literal helpers
 │   └── routes/
 │       ├── sources.js
 │       ├── rules.js
 │       ├── mappings.js
 │       └── records.js
 ├── public/                # Built React UI (served as static files)
 ├── examples/
 │   ├── GETTING_STARTED.md
 │   └── bank_transactions.csv
 └── .env.example
 ```
 ## Status
 **Current Phase:** Initial development - building core functionality
 ## License
 MIT