Module `trase.tools.sei_pcs.flows_container`

Functions

def read_dataset(location, columns: List[Column], column_types: Dict[str, Type])

Load a CSV file into a Pandas dataframe.

Columns and types must be explicitly provided. If a column is specified with a default value, then it will not be read from the file but rather created after load and populated with the default value. If a column is specified without a default value then it is assumed to be in the file.

If one of more columns has key=True, then they must together contain unique values.

Raises

ValueError: if there are any type-casting issues or columns which are marked as "key" but contain duplicated values

def resolve_column_types(datasets: Dict[str, List[Column]]) ‑> Dict[str, Dict[str, Type]]

Given a dictionary of dataset definitions, return a dictionary of column types. For example::

{"my_dataset": [Column("my_column")]}

will become::

{"my_dataset": {"my_column": str}}

This is not just rearranging data structures: it will also handle linked columns, fetching the type of the target column.

Classes

class FlowsContainer (dataframes: DataFrameContainer, flows_columns: List[Column])

FlowsContainer(dataframes: trase.tools.sei_pcs.dataframe_container.DataFrameContainer, flows_columns: List[trase.tools.sei_pcs.definition.Column])

Class variables

var dataframes : DataFrameContainer
var flows_columns : List[Column]

Static methods

def load(data_directory: str, flows_columns: List[Column], datasets: Dict[str, Dataset] = None)

Methods

def get(self, name, subset=None, prefix='', copy=True, ids=True) ‑> pandas.core.frame.DataFrame
def replace(self, df_flows, missing_value_columns='warn', extra_columns='raise', skip_conservation_check_for=None)
def update(self, df_flows, columns, skip_conservation_check_for=None)