From ff18ce5bd3ff2707d498fe457f195cc8d2fb8c99 Mon Sep 17 00:00:00 2001 From: Paul Trowbridge Date: Mon, 5 Mar 2018 23:34:53 -0500 Subject: [PATCH 1/2] simplify readme --- readme.md | 148 +++++++----------------------------------------------- 1 file changed, 19 insertions(+), 129 deletions(-) diff --git a/readme.md b/readme.md index aea1511..32671bb 100644 --- a/readme.md +++ b/readme.md @@ -1,140 +1,30 @@ -Overview +Generic Data Transformation Tool ---------------------------------------------- - -``` - +--------------+ - |csv data | - +-----+--------+ - | - | - v -+----web ui----+ +----func+----+ +---table----+ -|import screen +------> |srce.sql +----------> |tps.srce | <-------------------+ -+--------------+ +-------------+ +------------+ | - |p1:srce | | - |p2:file path | | -+-----web ui---+ +-------------+ +----table---+ | -|create map | |tps.map_rm | +--+--db proc-----+ -|profile +---------------------------------> | | |update tps.trans | -+------+-------+ +-----+------+ |column allj to | - | ^ |contain map data | - | | +--+--------------+ - v foreign key ^ -+----web ui+----+ | | -|assign maps | + | -|for return | +---table----+ | -+values +--------------------------------> |tps.map_rv | | -+---------------+ | +---------------------+ - +------------+ - -``` - The goal is to: 1. house external data and prevent duplication on insert -2. apply mappings to the data to make it meaningful -3. be able to reference it from outside sources (no action required) - -There are 5 tables -* tps.srce : definition of source -* tps.trans : actual data -* tps.trans_log : log of inserts -* tps.map_rm : map profile -* tps.map_rv : profile associated values - -# tps.srce schema - { - "name": "WMPD", - "descr": "Williams Paid File", - "type":"csv", - "schema": [ - { - "key": "Carrier", - "type": "text" - }, - { - "key": "Pd Amt", - "type": "numeric" - }, - { - "key": "Pay Dt", - "type": "date" - } - ], - "unique_constraint": { - "fields":[ - "{Pay Dt}", - "{Carrier}" - ] - } - } - -# tps.map_rm schema - { - "name":"Strip Amount Commas", - "description":"the Amount field comes from PNC with commas embeded so it cannot be cast to numeric", - "defn": [ - { - "key": "{Amount}", /*this is a Postgres text array stored in json*/ - "field": "amount", /*key name assigned to result of regex/* - "regex": ",", /*regular expression/* - "flag":"g", - "retain":"y", - "map":"n" - } - ], - "function":"replace", - "where": [ - { - } - ] - } +2. facilitate regular exression operations to extract meaningful data +3. be able to reference it from outside sources (no action required) and maintain reference to original data +It is well suited for data from outside systems that +* requires complex transformation (parsing and mapping) +* original data is retained for reference + +use cases: +* on-going bank feeds +* jumbled product lists +* storing api results +The data is converted to json by the importing program and inserted to the database. +Regex expressions are applied to specified json components and the results can be mapped to other values. +Major Interactions +------------------------ - - - - - -Notes -====================================== - -pull various static files into postgres and do basic transformation without losing the original document -or getting into custom code for each scenario - -the is an in-between for an foreign data wrapper & custom programming - -## Storage -all records are jsonb -applied mappings are in associated jsonb documents - -## Import -`COPY` function utilized - -## Mappings -1. regular expressions are used to extract pieces of the json objects -2. the results of the regular expressions are bumped up against a list of basic mappings and written to an associated jsonb document - -each regex expression within a targeted pattern can be set to map or not. then the mapping items should be joined to map_rv with an `=` as opposed to `@>` to avoid duplication of rows - - -## Transformation tools -* `COPY` -* `regexp_matches()` - -## Difficulties -Non standard file formats will require additional logic -example: PNC loan balance and collateral CSV files -1. External: Anything not in CSV should be converted external to Postgres and then imported as CSV -2. Direct: Outside logic can be setup to push new records to tps.trans direct from non-csv fornmated sources or fdw sources - -## Interface -maybe start out in excel until it gets firmed up -* list existing mappings - * apply mappings to see what results come back -* experiment with new mappings \ No newline at end of file +* Source Definitions (Maint/Inquire) +* Regex Instructions (Maint/Inquire) +* Cross Reference List (Maint/Inquire) +* Run Import (Run Job) From 0badc9f403dcd7d1b92f2884b03d6eadffcff8a8 Mon Sep 17 00:00:00 2001 From: Paul Trowbridge Date: Tue, 6 Mar 2018 00:28:37 -0500 Subject: [PATCH 2/2] add interaction details --- readme.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/readme.md b/readme.md index 32671bb..3577ff5 100644 --- a/readme.md +++ b/readme.md @@ -28,3 +28,28 @@ Major Interactions * Regex Instructions (Maint/Inquire) * Cross Reference List (Maint/Inquire) * Run Import (Run Job) + + + +### Interaction Details +* Source Definitions (Maint/Inquire) + + * display a list of existing sources with display detials/edit options + * create new option + * underlying function is `tps.srce_set(_name text, _defn jsonb)` + +* Regex Instructions (Maint/Inquire) + + * display a list of existing instruction sets with display details/edit options + * create new option + * underlying function is `tps.srce_map_def_set(_srce text, _map text, _defn jsonb, _seq int)` which takes a source "code" and a json + +* Cross Reference List (Maint/Inquire) + + * first step is to populate a list of values returned from the instructions (choose all or unmapped) `tps.report_unmapped(_srce text)` + * the list of rows facilitates additional named column(s) to be added which are used to assign values anytime the result occurs + * function to set the values of the cross reference `tps.srce_map_val_set_multi(_maps jsonb)` + +* Run Import + + * underlying function is `tps.srce_import(_path text, _srce text)`