move everything into one folder

This commit is contained in:
Paul Trowbridge 2018-06-25 00:01:05 -04:00
parent d291c52749
commit 13de00bdf2
55 changed files with 127 additions and 167 deletions

21
LICENSE
View File

@ -1,21 +0,0 @@
MIT License
Copyright (c) 2017
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

View File

@ -1,128 +1,128 @@
Generic Data Transformation Tool Generic Data Transformation Tool
======================================================= =======================================================
The goal is to: The goal is to:
1. house external data and prevent duplication on insert 1. house external data and prevent duplication on insert
2. facilitate regular exression operations to extract meaningful data 2. facilitate regular exression operations to extract meaningful data
3. be able to reference it from outside sources (no action required) and maintain reference to original data 3. be able to reference it from outside sources (no action required) and maintain reference to original data
It is well suited for data from outside systems that It is well suited for data from outside systems that
* requires complex transformation (parsing and mapping) * requires complex transformation (parsing and mapping)
* original data is retained for reference * original data is retained for reference
* don't feel like writing a map-reduce * don't feel like writing a map-reduce
use cases: use cases:
* on-going bank feeds * on-going bank feeds
* jumbled product lists * jumbled product lists
* storing api results * storing api results
The data is converted to json by the importing program and inserted to the database. The data is converted to json by the importing program and inserted to the database.
Regex expressions are applied to specified json components and the results can be mapped to other values. Regex expressions are applied to specified json components and the results can be mapped to other values.
Major Interactions Major Interactions
------------------------ ------------------------
* Source Definitions (Maint/Inquire) * Source Definitions (Maint/Inquire)
* Regex Instructions (Maint/Inquire) * Regex Instructions (Maint/Inquire)
* Cross Reference List (Maint/Inquire) * Cross Reference List (Maint/Inquire)
* Run Import (Run Job) * Run Import (Run Job)
### Interaction Details ### Interaction Details
* _Source Definitions (Maint/Inquire)_ * _Source Definitions (Maint/Inquire)_
* display a list of existing sources with display detials/edit options * display a list of existing sources with display detials/edit options
* create new option * create new option
* underlying function is `tps.srce_set(_name text, _defn jsonb)` * underlying function is `tps.srce_set(_name text, _defn jsonb)`
* the current definition of a source includes data based on bad presumptions: * the current definition of a source includes data based on bad presumptions:
* how to load from a csv file using `COPY` * how to load from a csv file using `COPY`
* setup a Postgres type to reflect the associated columns (if applicable) * setup a Postgres type to reflect the associated columns (if applicable)
* _Regex Instructions (Maint/Inquire)_ * _Regex Instructions (Maint/Inquire)_
* display a list of existing instruction sets with display details/edit options * display a list of existing instruction sets with display details/edit options
* create new option * create new option
* underlying function is `tps.srce_map_def_set(_srce text, _map text, _defn jsonb, _seq int)` which takes a source "code" and a json * underlying function is `tps.srce_map_def_set(_srce text, _map text, _defn jsonb, _seq int)` which takes a source "code" and a json
* _Cross Reference List (Maint/Inquire)_ * _Cross Reference List (Maint/Inquire)_
* first step is to populate a list of values returned from the instructions (choose all or unmapped) `tps.report_unmapped(_srce text)` * first step is to populate a list of values returned from the instructions (choose all or unmapped) `tps.report_unmapped(_srce text)`
* the list of rows facilitates additional named column(s) to be added which are used to assign values anytime the result occurs * the list of rows facilitates additional named column(s) to be added which are used to assign values anytime the result occurs
* function to set the values of the cross reference `tps.srce_map_val_set_multi(_maps jsonb)` * function to set the values of the cross reference `tps.srce_map_val_set_multi(_maps jsonb)`
* _Run Import_ * _Run Import_
* underlying function is `tps.srce_import(_path text, _srce text)` * underlying function is `tps.srce_import(_path text, _srce text)`
source definition source definition
---------------------------------------------------------------------- ----------------------------------------------------------------------
* **load data** * **load data**
* the brwosers role is to extract the contents of a file and send them as a post body to the backend for processing under target function `based on srce defintion` * the brwosers role is to extract the contents of a file and send them as a post body to the backend for processing under target function `based on srce defintion`
* the backend builds a json array of all the rows to be added and sends as an argument to a database insert function * the backend builds a json array of all the rows to be added and sends as an argument to a database insert function
* build constraint key `based on srce definition` * build constraint key `based on srce definition`
* handle violations * handle violations
* increment global key list (this may not be possible depending on if a json with variable length arrays can be traversed) * increment global key list (this may not be possible depending on if a json with variable length arrays can be traversed)
* build an import log * build an import log
* run maps (as opposed to relying on trigger) * run maps (as opposed to relying on trigger)
* **read data** * **read data**
* the `schema` key contains either a text element or a text array in curly braces * the `schema` key contains either a text element or a text array in curly braces
* forcing everything to extract via `#>{}` would be cleaner but may be more expensive than `jsonb_populate_record` * forcing everything to extract via `#>{}` would be cleaner but may be more expensive than `jsonb_populate_record`
* it took 5.5 seconds to parse 1,000,000 rows of an identicle google distance matrix json to a 5 column temp table * it took 5.5 seconds to parse 1,000,000 rows of an identicle google distance matrix json to a 5 column temp table
* top level key to table based on `jsonb_populate_record` extracting from `tps.type` developed from `srce.defn->schema` * top level key to table based on `jsonb_populate_record` extracting from `tps.type` developed from `srce.defn->schema`
* custom function parsing contents based on #> operator and extracting from `srce.defn->schema` * custom function parsing contents based on #> operator and extracting from `srce.defn->schema`
* view that `uses the source definiton` to extrapolate a table? * view that `uses the source definiton` to extrapolate a table?
* a materialized table is built `based on the source definition` and any addtional regex? * a materialized table is built `based on the source definition` and any addtional regex?
* add regex = alter table add column with historic updates? * add regex = alter table add column with historic updates?
* no primary key? * no primary key?
* every document must work out to one row * every document must work out to one row
``` ```
{ {
"name":"dcard", "name":"dcard",
"source":"client_file", "source":"client_file",
"loading_function":"csv" "loading_function":"csv"
"constraint":[ "constraint":[
"{Trans. Date}", "{Trans. Date}",
"{Post Date}" "{Post Date}"
], ],
"schemas":{ "schemas":{
"default":[ "default":[
{ {
"path":"{doc,origin_addresses,0}", "path":"{doc,origin_addresses,0}",
"type":"text", "type":"text",
"column_name":"origin_address" "column_name":"origin_address"
}, },
{ {
"path":"{doc,destination_addresses,0}", "path":"{doc,destination_addresses,0}",
"type":"text", "type":"text",
"column_name":"origin_address" "column_name":"origin_address"
}, },
{ {
"path":"{doc,status}", "path":"{doc,status}",
"type":"text", "type":"text",
"column_name":"status" "column_name":"status"
} }
{ {
"path":"{doc,rows,0,elements,0,distance,value}", "path":"{doc,rows,0,elements,0,distance,value}",
"type":"numeric", "type":"numeric",
"column_name":"distance" "column_name":"distance"
} }
{ {
"path":"{doc,rows,0,elements,0,duration,value}", "path":"{doc,rows,0,elements,0,duration,value}",
"type":"numeric", "type":"numeric",
"column_name":"duration" "column_name":"duration"
} }
], ],
"version2":[] "version2":[]
} }
} }
``` ```

View File

@ -1,19 +0,0 @@
{
"name": "tps_etl",
"version": "1.0.0",
"description": "third party source data transformation",
"main": "index.js",
"scripts": {
"test": "uh"
},
"repository": {
"type": "git",
"url": "git+https://github.com/fleetside72/tps_etl.git"
},
"author": "",
"license": "ISC",
"bugs": {
"url": "https://github.com/fleetside72/tps_etl/issues"
},
"homepage": "https://github.com/fleetside72/tps_etl#readme"
}