# Getting Started with Dataflow

This guide walks through a complete example using bank transaction data.

## Prerequisites

1. PostgreSQL database running
2. Database created: `CREATE DATABASE dataflow;`
3. `.env` file configured (copy from `.env.example`)

## Step 1: Deploy Database Schema

```bash
cd /opt/dataflow
psql -U postgres -d dataflow -f database/schema.sql
psql -U postgres -d dataflow -f database/functions.sql
```

You should see tables created without errors.

## Step 2: Start the API Server

```bash
npm install
npm start
```

The server should start on port 3000 (or your configured port).

Test it:
```bash
curl http://localhost:3000/health
# Should return: {"status":"ok","timestamp":"..."}
```

## Step 3: Create a Data Source

A source defines where data comes from and how to deduplicate it.

```bash
curl -X POST http://localhost:3000/api/sources \
  -H "Content-Type: application/json" \
  -d '{
    "name": "bank_transactions",
    "dedup_fields": ["date", "description", "amount"]
  }'
```

**What this does:** Records with the same date + description + amount will be considered duplicates.

## Step 4: Create Transformation Rules

Rules extract meaningful data using regex patterns.

### Rule 1: Extract merchant name (first part of description)

```bash
curl -X POST http://localhost:3000/api/rules \
  -H "Content-Type: application/json" \
  -d '{
    "source_name": "bank_transactions",
    "name": "extract_merchant",
    "field": "description",
    "pattern": "^([A-Z][A-Z ]+)",
    "output_field": "merchant",
    "sequence": 1
  }'
```

### Rule 2: Extract location (city + state pattern)

```bash
curl -X POST http://localhost:3000/api/rules \
  -H "Content-Type: application/json" \
  -d '{
    "source_name": "bank_transactions",
    "name": "extract_location",
    "field": "description",
    "pattern": "([A-Z]+) OH",
    "output_field": "location",
    "sequence": 2
  }'
```

## Step 5: Import Data

Import the example CSV file:

```bash
curl -X POST http://localhost:3000/api/sources/bank_transactions/import \
  -F "file=@examples/bank_transactions.csv"
```

Response:
```json
{
  "success": true,
  "imported": 14,
  "duplicates": 0,
  "log_id": 1
}
```

## Step 6: View Imported Records

```bash
curl http://localhost:3000/api/records/source/bank_transactions?limit=5
```

You'll see the raw imported data. Note that `transformed` is `null` - we haven't applied transformations yet!

## Step 7: Apply Transformations

```bash
curl -X POST http://localhost:3000/api/sources/bank_transactions/transform
```

Response:
```json
{
  "success": true,
  "transformed": 14
}
```

Now check the records again:
```bash
curl http://localhost:3000/api/records/source/bank_transactions?limit=2
```

You'll see the `transformed` field now contains the original data plus extracted fields like `merchant` and `location`.

## Step 8: View Extracted Values That Need Mapping

```bash
curl http://localhost:3000/api/mappings/source/bank_transactions/unmapped
```

Response shows extracted merchant names that aren't mapped yet:
```json
[
  {"rule_name": "extract_merchant", "extracted_value": "GOOGLE", "record_count": 2},
  {"rule_name": "extract_merchant", "extracted_value": "TARGET", "record_count": 2},
  {"rule_name": "extract_merchant", "extracted_value": "WALMART", "record_count": 1},
  ...
]
```

## Step 9: Create Value Mappings

Map extracted values to clean, standardized output:

```bash
curl -X POST http://localhost:3000/api/mappings \
  -H "Content-Type: application/json" \
  -d '{
    "source_name": "bank_transactions",
    "rule_name": "extract_merchant",
    "input_value": "GOOGLE",
    "output": {
      "vendor": "Google",
      "category": "Technology"
    }
  }'

curl -X POST http://localhost:3000/api/mappings \
  -H "Content-Type: application/json" \
  -d '{
    "source_name": "bank_transactions",
    "rule_name": "extract_merchant",
    "input_value": "TARGET",
    "output": {
      "vendor": "Target",
      "category": "Retail"
    }
  }'

curl -X POST http://localhost:3000/api/mappings \
  -H "Content-Type: application/json" \
  -d '{
    "source_name": "bank_transactions",
    "rule_name": "extract_merchant",
    "input_value": "WALMART",
    "output": {
      "vendor": "Walmart",
      "category": "Groceries"
    }
  }'
```

## Step 10: Reprocess With Mappings

Clear and reapply transformations to pick up the new mappings:

```bash
curl -X POST http://localhost:3000/api/sources/bank_transactions/reprocess
```

## Step 11: View Final Results

```bash
curl http://localhost:3000/api/records/source/bank_transactions?limit=5
```

Now the `transformed` field contains:
- Original fields (date, description, amount, category)
- Extracted fields (merchant, location)
- Mapped fields (vendor, category from mappings)

Example result:
```json
{
  "id": 1,
  "data": {
    "date": "2024-01-02",
    "description": "GOOGLE *YOUTUBE VIDEOS",
    "amount": "4.26",
    "category": "Services"
  },
  "transformed": {
    "date": "2024-01-02",
    "description": "GOOGLE *YOUTUBE VIDEOS",
    "amount": "4.26",
    "category": "Services",
    "merchant": "GOOGLE",
    "vendor": "Google",
    "category": "Technology"
  }
}
```

## Step 12: Test Deduplication

Try importing the same file again:

```bash
curl -X POST http://localhost:3000/api/sources/bank_transactions/import \
  -F "file=@examples/bank_transactions.csv"
```

Response:
```json
{
  "success": true,
  "imported": 0,
  "duplicates": 14,
  "log_id": 2
}
```

All records were rejected as duplicates! ✓

## Summary

You've now:
- ✅ Created a data source with deduplication rules
- ✅ Defined transformation rules to extract data
- ✅ Imported CSV data
- ✅ Applied transformations
- ✅ Created value mappings for clean output
- ✅ Reprocessed data with mappings
- ✅ Tested deduplication

## Next Steps

- Add more rules for other extraction patterns
- Create more value mappings as needed
- Query the `transformed` data for reporting
- Import additional CSV files

## Useful Commands

```bash
# View all sources
curl http://localhost:3000/api/sources

# View source statistics
curl http://localhost:3000/api/sources/bank_transactions/stats

# View all rules for a source
curl http://localhost:3000/api/rules/source/bank_transactions

# View all mappings for a source
curl http://localhost:3000/api/mappings/source/bank_transactions

# Search for specific records
curl -X POST http://localhost:3000/api/records/search \
  -H "Content-Type: application/json" \
  -d '{
    "source_name": "bank_transactions",
    "query": {"vendor": "Google"},
    "limit": 10
  }'
```

## Troubleshooting

**API won't start:**
- Check `.env` file exists with correct database credentials
- Verify PostgreSQL is running: `psql -U postgres -l`
- Check logs for error messages

**Import fails:**
- Verify source exists: `curl http://localhost:3000/api/sources`
- Check CSV format matches expectations
- Ensure dedup_fields match CSV column names

**Transformations not working:**
- Check rules exist: `curl http://localhost:3000/api/rules/source/bank_transactions`
- Test regex pattern manually
- Check records have the specified field