Module trase.tools.etl

Extract-transform-load from a variety of sources into Pandas dataframes.

The ETL tool handles:

  • Loading data into Pandas from a variety of sources (AWS S3, PostgreSQL database, etc.) and in a variety of formats (XLS, CSV)
  • Only downloading the data once, yet still re-downloading it when the source data has changed
  • Allows you to do some simple pre-processing, such as filtering out rows, or altering some data
  • Storing this data locally in a standard directory layout ("original", "processed", etc.)

The ETL tool is primilary targeted for use with SEI-PCS models; however, it is a standalone package.

Sub-modules

trase.tools.etl.context
trase.tools.etl.exceptions
trase.tools.etl.pandas_wrapper
trase.tools.etl.processors
trase.tools.etl.source
trase.tools.etl.utilities