move everything into one folder
This commit is contained in:
parent
d291c52749
commit
13de00bdf2
21
LICENSE
21
LICENSE
@ -1,21 +0,0 @@
|
|||||||
MIT License
|
|
||||||
|
|
||||||
Copyright (c) 2017
|
|
||||||
|
|
||||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
||||||
of this software and associated documentation files (the "Software"), to deal
|
|
||||||
in the Software without restriction, including without limitation the rights
|
|
||||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
||||||
copies of the Software, and to permit persons to whom the Software is
|
|
||||||
furnished to do so, subject to the following conditions:
|
|
||||||
|
|
||||||
The above copyright notice and this permission notice shall be included in all
|
|
||||||
copies or substantial portions of the Software.
|
|
||||||
|
|
||||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
||||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
||||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
||||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
||||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
||||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
||||||
SOFTWARE.
|
|
@ -1,128 +1,128 @@
|
|||||||
Generic Data Transformation Tool
|
Generic Data Transformation Tool
|
||||||
=======================================================
|
=======================================================
|
||||||
|
|
||||||
The goal is to:
|
The goal is to:
|
||||||
1. house external data and prevent duplication on insert
|
1. house external data and prevent duplication on insert
|
||||||
2. facilitate regular exression operations to extract meaningful data
|
2. facilitate regular exression operations to extract meaningful data
|
||||||
3. be able to reference it from outside sources (no action required) and maintain reference to original data
|
3. be able to reference it from outside sources (no action required) and maintain reference to original data
|
||||||
|
|
||||||
|
|
||||||
It is well suited for data from outside systems that
|
It is well suited for data from outside systems that
|
||||||
* requires complex transformation (parsing and mapping)
|
* requires complex transformation (parsing and mapping)
|
||||||
* original data is retained for reference
|
* original data is retained for reference
|
||||||
* don't feel like writing a map-reduce
|
* don't feel like writing a map-reduce
|
||||||
|
|
||||||
use cases:
|
use cases:
|
||||||
* on-going bank feeds
|
* on-going bank feeds
|
||||||
* jumbled product lists
|
* jumbled product lists
|
||||||
* storing api results
|
* storing api results
|
||||||
|
|
||||||
|
|
||||||
The data is converted to json by the importing program and inserted to the database.
|
The data is converted to json by the importing program and inserted to the database.
|
||||||
Regex expressions are applied to specified json components and the results can be mapped to other values.
|
Regex expressions are applied to specified json components and the results can be mapped to other values.
|
||||||
|
|
||||||
|
|
||||||
Major Interactions
|
Major Interactions
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
* Source Definitions (Maint/Inquire)
|
* Source Definitions (Maint/Inquire)
|
||||||
* Regex Instructions (Maint/Inquire)
|
* Regex Instructions (Maint/Inquire)
|
||||||
* Cross Reference List (Maint/Inquire)
|
* Cross Reference List (Maint/Inquire)
|
||||||
* Run Import (Run Job)
|
* Run Import (Run Job)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Interaction Details
|
### Interaction Details
|
||||||
* _Source Definitions (Maint/Inquire)_
|
* _Source Definitions (Maint/Inquire)_
|
||||||
|
|
||||||
* display a list of existing sources with display detials/edit options
|
* display a list of existing sources with display detials/edit options
|
||||||
* create new option
|
* create new option
|
||||||
* underlying function is `tps.srce_set(_name text, _defn jsonb)`
|
* underlying function is `tps.srce_set(_name text, _defn jsonb)`
|
||||||
|
|
||||||
* the current definition of a source includes data based on bad presumptions:
|
* the current definition of a source includes data based on bad presumptions:
|
||||||
* how to load from a csv file using `COPY`
|
* how to load from a csv file using `COPY`
|
||||||
* setup a Postgres type to reflect the associated columns (if applicable)
|
* setup a Postgres type to reflect the associated columns (if applicable)
|
||||||
|
|
||||||
|
|
||||||
* _Regex Instructions (Maint/Inquire)_
|
* _Regex Instructions (Maint/Inquire)_
|
||||||
|
|
||||||
* display a list of existing instruction sets with display details/edit options
|
* display a list of existing instruction sets with display details/edit options
|
||||||
* create new option
|
* create new option
|
||||||
* underlying function is `tps.srce_map_def_set(_srce text, _map text, _defn jsonb, _seq int)` which takes a source "code" and a json
|
* underlying function is `tps.srce_map_def_set(_srce text, _map text, _defn jsonb, _seq int)` which takes a source "code" and a json
|
||||||
|
|
||||||
* _Cross Reference List (Maint/Inquire)_
|
* _Cross Reference List (Maint/Inquire)_
|
||||||
|
|
||||||
* first step is to populate a list of values returned from the instructions (choose all or unmapped) `tps.report_unmapped(_srce text)`
|
* first step is to populate a list of values returned from the instructions (choose all or unmapped) `tps.report_unmapped(_srce text)`
|
||||||
* the list of rows facilitates additional named column(s) to be added which are used to assign values anytime the result occurs
|
* the list of rows facilitates additional named column(s) to be added which are used to assign values anytime the result occurs
|
||||||
* function to set the values of the cross reference `tps.srce_map_val_set_multi(_maps jsonb)`
|
* function to set the values of the cross reference `tps.srce_map_val_set_multi(_maps jsonb)`
|
||||||
|
|
||||||
* _Run Import_
|
* _Run Import_
|
||||||
|
|
||||||
* underlying function is `tps.srce_import(_path text, _srce text)`
|
* underlying function is `tps.srce_import(_path text, _srce text)`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
source definition
|
source definition
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
* **load data**
|
* **load data**
|
||||||
* the brwosers role is to extract the contents of a file and send them as a post body to the backend for processing under target function `based on srce defintion`
|
* the brwosers role is to extract the contents of a file and send them as a post body to the backend for processing under target function `based on srce defintion`
|
||||||
* the backend builds a json array of all the rows to be added and sends as an argument to a database insert function
|
* the backend builds a json array of all the rows to be added and sends as an argument to a database insert function
|
||||||
* build constraint key `based on srce definition`
|
* build constraint key `based on srce definition`
|
||||||
* handle violations
|
* handle violations
|
||||||
* increment global key list (this may not be possible depending on if a json with variable length arrays can be traversed)
|
* increment global key list (this may not be possible depending on if a json with variable length arrays can be traversed)
|
||||||
* build an import log
|
* build an import log
|
||||||
* run maps (as opposed to relying on trigger)
|
* run maps (as opposed to relying on trigger)
|
||||||
* **read data**
|
* **read data**
|
||||||
* the `schema` key contains either a text element or a text array in curly braces
|
* the `schema` key contains either a text element or a text array in curly braces
|
||||||
* forcing everything to extract via `#>{}` would be cleaner but may be more expensive than `jsonb_populate_record`
|
* forcing everything to extract via `#>{}` would be cleaner but may be more expensive than `jsonb_populate_record`
|
||||||
* it took 5.5 seconds to parse 1,000,000 rows of an identicle google distance matrix json to a 5 column temp table
|
* it took 5.5 seconds to parse 1,000,000 rows of an identicle google distance matrix json to a 5 column temp table
|
||||||
* top level key to table based on `jsonb_populate_record` extracting from `tps.type` developed from `srce.defn->schema`
|
* top level key to table based on `jsonb_populate_record` extracting from `tps.type` developed from `srce.defn->schema`
|
||||||
* custom function parsing contents based on #> operator and extracting from `srce.defn->schema`
|
* custom function parsing contents based on #> operator and extracting from `srce.defn->schema`
|
||||||
* view that `uses the source definiton` to extrapolate a table?
|
* view that `uses the source definiton` to extrapolate a table?
|
||||||
* a materialized table is built `based on the source definition` and any addtional regex?
|
* a materialized table is built `based on the source definition` and any addtional regex?
|
||||||
* add regex = alter table add column with historic updates?
|
* add regex = alter table add column with historic updates?
|
||||||
* no primary key?
|
* no primary key?
|
||||||
* every document must work out to one row
|
* every document must work out to one row
|
||||||
|
|
||||||
```
|
```
|
||||||
{
|
{
|
||||||
"name":"dcard",
|
"name":"dcard",
|
||||||
"source":"client_file",
|
"source":"client_file",
|
||||||
"loading_function":"csv"
|
"loading_function":"csv"
|
||||||
"constraint":[
|
"constraint":[
|
||||||
"{Trans. Date}",
|
"{Trans. Date}",
|
||||||
"{Post Date}"
|
"{Post Date}"
|
||||||
],
|
],
|
||||||
"schemas":{
|
"schemas":{
|
||||||
"default":[
|
"default":[
|
||||||
{
|
{
|
||||||
"path":"{doc,origin_addresses,0}",
|
"path":"{doc,origin_addresses,0}",
|
||||||
"type":"text",
|
"type":"text",
|
||||||
"column_name":"origin_address"
|
"column_name":"origin_address"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"path":"{doc,destination_addresses,0}",
|
"path":"{doc,destination_addresses,0}",
|
||||||
"type":"text",
|
"type":"text",
|
||||||
"column_name":"origin_address"
|
"column_name":"origin_address"
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"path":"{doc,status}",
|
"path":"{doc,status}",
|
||||||
"type":"text",
|
"type":"text",
|
||||||
"column_name":"status"
|
"column_name":"status"
|
||||||
}
|
}
|
||||||
{
|
{
|
||||||
"path":"{doc,rows,0,elements,0,distance,value}",
|
"path":"{doc,rows,0,elements,0,distance,value}",
|
||||||
"type":"numeric",
|
"type":"numeric",
|
||||||
"column_name":"distance"
|
"column_name":"distance"
|
||||||
}
|
}
|
||||||
{
|
{
|
||||||
"path":"{doc,rows,0,elements,0,duration,value}",
|
"path":"{doc,rows,0,elements,0,duration,value}",
|
||||||
"type":"numeric",
|
"type":"numeric",
|
||||||
"column_name":"duration"
|
"column_name":"duration"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"version2":[]
|
"version2":[]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
19
package.json
19
package.json
@ -1,19 +0,0 @@
|
|||||||
{
|
|
||||||
"name": "tps_etl",
|
|
||||||
"version": "1.0.0",
|
|
||||||
"description": "third party source data transformation",
|
|
||||||
"main": "index.js",
|
|
||||||
"scripts": {
|
|
||||||
"test": "uh"
|
|
||||||
},
|
|
||||||
"repository": {
|
|
||||||
"type": "git",
|
|
||||||
"url": "git+https://github.com/fleetside72/tps_etl.git"
|
|
||||||
},
|
|
||||||
"author": "",
|
|
||||||
"license": "ISC",
|
|
||||||
"bugs": {
|
|
||||||
"url": "https://github.com/fleetside72/tps_etl/issues"
|
|
||||||
},
|
|
||||||
"homepage": "https://github.com/fleetside72/tps_etl#readme"
|
|
||||||
}
|
|
Loading…
Reference in New Issue
Block a user