Module trase.tools.sei_pcs.dataframe_container
How we store and update Pandas dataframes once they have been loaded from disk into memory.
Functions
def construct_export_dataframe(dataset: str, container: DataFrameContainer, export_columns: List[Export]) ‑> pandas.core.frame.DataFramedef flow_report_by_attribute(dataframes: DataFrameContainer, name, *args, **kwargs)
Classes
class DataFrameContainer (dataframes: Dict[str, pandas.core.frame.DataFrame], links: Dict[str, Dict[str, Link]], defaults: dict = <factory>, validation: dict = <factory>)-
This class has a few functions. On the face of it, it's just a dictionary-like container which allows you to access Pandas dataframes by a name.
More than that, it also executes "left-joins" recursively according to a list of "links". This takes some of the effort out of constantly doing that yourself (see the
trase.tools.sei_pcs.recursive_joinmodule).However, the biggest value it brings is that it adds some safety checks when you want to modify one of the dataframes in the containers. This ensures that the dataframe adheres to its original columns: none can be added, deleted, or change dtype. See
DataFrameContainer.update()for more on how this works.Class variables
var dataframes : Dict[str, pandas.core.frame.DataFrame]var defaults : dictvar links : Dict[str, Dict[str, Link]]var validation : dict
Methods
def get(self, name, copy=True, ids=True)-
Get a dataframe in the container by name. You should assume that the returned object is a copy, not a pointer: altering the object will only alter the copy, not the original stored in this class.
All joins specified in the
linksrecipe when this class was created will be executed (see thetrase.tools.sei_pcs.recursive_joinmodule).Args
copy:bool- if you know that you are not going to alter the return object
and you are looking for extra performance, pass
copy=Falseand you may be able to avoid an in-memory copy operation. If you do this you should never alter the object you receive, or else you may corrupt this class. ids:bool- adds a column called "_id", which is required if you ever want to
call
Dataframe.updatewith this dataframe.
def replace(self, name, df: pandas.core.frame.DataFrame, conserved_columns=None, missing_value_columns='warn', extra_columns='raise')def update(self, name, df: pandas.core.frame.DataFrame, columns, conserved_columns=None)