2018-03-05 23:34:53 -05:00
Generic Data Transformation Tool
2018-05-19 16:52:59 -04:00
=======================================================
2018-02-02 17:34:10 -05:00
The goal is to:
1. house external data and prevent duplication on insert
2018-03-05 23:34:53 -05:00
2. facilitate regular exression operations to extract meaningful data
3. be able to reference it from outside sources (no action required) and maintain reference to original data
2017-10-26 17:34:45 -04:00
2017-07-24 23:13:34 -04:00
2018-03-05 23:34:53 -05:00
It is well suited for data from outside systems that
* requires complex transformation (parsing and mapping)
* original data is retained for reference
2018-03-09 11:34:30 -05:00
* don't feel like writing a map-reduce
2017-07-24 23:13:34 -04:00
2018-03-05 23:34:53 -05:00
use cases:
* on-going bank feeds
* jumbled product lists
* storing api results
2017-07-24 23:13:34 -04:00
2017-10-27 13:06:44 -04:00
2018-03-05 23:34:53 -05:00
The data is converted to json by the importing program and inserted to the database.
Regex expressions are applied to specified json components and the results can be mapped to other values.
2017-10-26 17:34:45 -04:00
2017-07-24 23:13:34 -04:00
2018-03-05 23:34:53 -05:00
Major Interactions
------------------------
2017-07-24 23:13:34 -04:00
2018-03-05 23:34:53 -05:00
* Source Definitions (Maint/Inquire)
* Regex Instructions (Maint/Inquire)
* Cross Reference List (Maint/Inquire)
* Run Import (Run Job)
2018-03-06 00:28:37 -05:00
### Interaction Details
2018-05-19 16:52:59 -04:00
* _Source Definitions (Maint/Inquire)_
2018-03-06 00:28:37 -05:00
* display a list of existing sources with display detials/edit options
* create new option
* underlying function is `tps.srce_set(_name text, _defn jsonb)`
2018-05-19 16:52:59 -04:00
* the current definition of a source includes data based on bad presumptions:
* how to load from a csv file using `COPY`
* setup a Postgres type to reflect the associated columns (if applicable)
* _Regex Instructions (Maint/Inquire)_
2018-03-06 00:28:37 -05:00
* display a list of existing instruction sets with display details/edit options
* create new option
* underlying function is `tps.srce_map_def_set(_srce text, _map text, _defn jsonb, _seq int)` which takes a source "code" and a json
2018-05-19 16:52:59 -04:00
* _Cross Reference List (Maint/Inquire)_
2018-03-06 00:28:37 -05:00
* first step is to populate a list of values returned from the instructions (choose all or unmapped) `tps.report_unmapped(_srce text)`
* the list of rows facilitates additional named column(s) to be added which are used to assign values anytime the result occurs
* function to set the values of the cross reference `tps.srce_map_val_set_multi(_maps jsonb)`
2018-05-19 16:52:59 -04:00
* _Run Import_
2018-03-06 00:28:37 -05:00
* underlying function is `tps.srce_import(_path text, _srce text)`
2018-05-19 16:52:59 -04:00
source definition
----------------------------------------------------------------------
* **load data**
* parsing function reference
* csv_from_client --> load file dialog, send content as post body to backend, backend sends array of json as argument to database?
* custom_pnc_parse --> load file dialog, send content as post body to backend, parse out data to array of json and send to db as large json object?
* _note_ : the database will have to store in the json a reference to these functions which the browser will need to read in order to act on an import request
* the brwosers role is to extract the contents of a file and send them as a post body to the backend for processing under target function (/parse_csv& q=source)
* constraints
* **read data**
* top level key to table as type?
* function that returns a string that is a list of columns?
* custom function to read and return the table?