Module trase.tools.sei_pcs.definition

The "definition" of an SEI-PCS model. It provides a quick overview of the datasets the model requires, the "topology" of the model (the auxiliary datasets, nodes, constraints, etc), how their columns relate to one another, and how the model results get exported to CSV.

Functions

def determine_load_order(datasets: Dict[str, Dataset])
def e(header: str, flow_attribute: Optional[str] = None)

Construct an export definition for a column.

This object contains information which is used by the functions which construct the results file and the ingest metadata.

Args

header
the name of the resulting header in the CSV file. This should follow the Trase standard conventions, for example COUNTRY_OF_ORIGIN
flow_attribute
the name of the column in the flow. This must relate to a column in the flow definition. You can refer to linked columns using dot syntax, for example "country.trase_id".
def load_definition_from_module(module: module) ‑> Definition
def reload_definition_at_path(path_to_definition_py: str) ‑> Definition

Classes

class Column (name: str, type: Type = builtins.str, key: bool = False, link: str = None, value: Optional[Any] = None, conserve: bool = False, validate: Optional[Validation] = None, only_validate_link: bool = False, non_negative: bool = None)

Define a column of a dataset.

Args

name
the name of the column as it appears in the file.
type
one of int, float, str, bool, List[int], etc.
key
indicates that the column should be considered to be part of the "primary key" of the dataset; in particular, that the values (among all key columns) should be unique.
link
of the form "target_dataset.target_column", indicating that this column should be left-joined on to on the "target_column" column in "target_dataset".
value
a default value that the column should be populated with.
conserve
whether this column should conserve its total sum throughout the model.
validate
a class from the trase.tools.sei_pcs.validation model which performs column-level validation. For example validate=Code(6) will check that every value in the column is a six-digit code.
only_validate_link

by default, if you link a target dataset, all columns of that dataset are added as part of the merge. For large datasets this can significantly increase memory. By setting only_validate_link=True, only the target column will be added.

For example, suppose that we have this definition:

datasets = {
    "state": Dataset([
        Column("name"),
        Column("code"),
    ]),
    "asset": Dataset([
        Column("state", link="state.code"),
    ]),
}

Then, the "asset" dataset will have the following columns:

  • state.name
  • state.code

If, however, we pass only_validate_link=True to the link:

datasets = {
    # ...
    "asset": Dataset([
        Column("state", link="state.code", only_validate_link=True),
    ]),
}

then the "asset" dataset will have only one column: the target of the link:

  • state.code

However, the usual link validation will still occur.

non_negative
add a validation that values are not negative. This defaults to true for numeric types and false otherwise.

Class variables

var conserve : bool
var key : bool
var name : str
var non_negative : bool
var type : Type

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

var validate : Optional[Validation]
var value : Optional[Any]

Instance variables

var is_present_in_file

Methods

def all_validators(self) ‑> List[Validation]
class c (name: str, type: Type = builtins.str, key: bool = False, link: str = None, value: Optional[Any] = None, conserve: bool = False, validate: Optional[Validation] = None, only_validate_link: bool = False, non_negative: bool = None)

Define a column of a dataset.

Args

name
the name of the column as it appears in the file.
type
one of int, float, str, bool, List[int], etc.
key
indicates that the column should be considered to be part of the "primary key" of the dataset; in particular, that the values (among all key columns) should be unique.
link
of the form "target_dataset.target_column", indicating that this column should be left-joined on to on the "target_column" column in "target_dataset".
value
a default value that the column should be populated with.
conserve
whether this column should conserve its total sum throughout the model.
validate
a class from the trase.tools.sei_pcs.validation model which performs column-level validation. For example validate=Code(6) will check that every value in the column is a six-digit code.
only_validate_link

by default, if you link a target dataset, all columns of that dataset are added as part of the merge. For large datasets this can significantly increase memory. By setting only_validate_link=True, only the target column will be added.

For example, suppose that we have this definition:

datasets = {
    "state": Dataset([
        Column("name"),
        Column("code"),
    ]),
    "asset": Dataset([
        Column("state", link="state.code"),
    ]),
}

Then, the "asset" dataset will have the following columns:

  • state.name
  • state.code

If, however, we pass only_validate_link=True to the link:

datasets = {
    # ...
    "asset": Dataset([
        Column("state", link="state.code", only_validate_link=True),
    ]),
}

then the "asset" dataset will have only one column: the target of the link:

  • state.code

However, the usual link validation will still occur.

non_negative
add a validation that values are not negative. This defaults to true for numeric types and false otherwise.

Class variables

var conserve : bool
var key : bool
var name : str
var non_negative : bool
var type : Type

str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.str() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to 'strict'.

var validate : Optional[Validation]
var value : Optional[Any]

Instance variables

var is_present_in_file

Methods

def all_validators(self) ‑> List[Validation]
class Dataset (columns: List[Column])

Dataset(columns: List[trase.tools.sei_pcs.definition.Column])

Class variables

var columns : List[Column]
class Definition (description: str, commodity_equivalence_group_name: str = '', years: List[int] = <factory>, version: str = '1', country: str = 'unknown_country', commodity: str = 'unknown_commodity', datasets: Dict[str, Dataset] = <factory>, constraints: Dict[str, Dataset] = <factory>, flows: List[Column] = <factory>, flows_export: List[Export] = <factory>)

Definition(description: str, commodity_equivalence_group_name: str = '', years: List[int] = , version: str = '1', country: str = 'unknown_country', commodity: str = 'unknown_commodity', datasets: Dict[str, trase.tools.sei_pcs.definition.Dataset] = , constraints: Dict[str, trase.tools.sei_pcs.definition.Dataset] = , flows: List[trase.tools.sei_pcs.definition.Column] = , flows_export: List[trase.tools.sei_pcs.definition.Export] = )

Class variables

var commodity : str
var commodity_equivalence_group_name : str
var constraints : Dict[str, Dataset]
var country : str
var datasets : Dict[str, Dataset]
var description : str
var flows : List[Column]
var flows_export : List[Export]
var version : str
var years : List[int]
class Export (header: str, flow_attribute: str)

Construct an export definition for a column.

This object contains information which is used by the functions which construct the results file and the ingest metadata.

Args

header
the name of the resulting header in the CSV file. This should follow the Trase standard conventions, for example COUNTRY_OF_ORIGIN
flow_attribute
the name of the column in the flow. This must relate to a column in the flow definition. You can refer to linked columns using dot syntax, for example "country.trase_id".

Class variables

var flow_attribute : str
var header : str