Module trase.tools.etl
Extract-transform-load from a variety of sources into Pandas dataframes.
The ETL tool handles:
- Loading data into Pandas from a variety of sources (AWS S3, PostgreSQL database, etc.) and in a variety of formats (XLS, CSV)
- Only downloading the data once, yet still re-downloading it when the source data has changed
- Allows you to do some simple pre-processing, such as filtering out rows, or altering some data
- Storing this data locally in a standard directory layout ("original", "processed", etc.)
The ETL tool is primilary targeted for use with SEI-PCS models; however, it is a standalone package.
Sub-modules
trase.tools.etl.contexttrase.tools.etl.exceptionstrase.tools.etl.pandas_wrappertrase.tools.etl.processorstrase.tools.etl.sourcetrase.tools.etl.utilities