Module trase.tools.pandasdb.find
Clean a Pandas dataframe containing node labels.
Functions
def find_biomes_by_trase_id(df, returning: Iterable[str], node_attributes_ref_id: int, trase_id=Identifier('trase_id'), on_extra_columns='error', cur=None, cnx=None)def find_default_name_by_node_id(df, returning: Iterable[str], node_id: psycopg2.sql.Composable = Identifier('node_id'), on_extra_columns='error', cur=None, cnx=None)def find_economic_blocs_by_trase_id(df, returning: Iterable[str], trase_id=Identifier('trase_id'), on_extra_columns='error', cur=None, cnx=None)def find_nodes_by_code(df, returning: Iterable[str], code_id=Identifier('code_id'), value=Identifier('value'), year: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None) ‑> List[Tuple]def find_nodes_by_name(df, returning: Iterable[str], name=Identifier('name'), level: psycopg2.sql.Composable = Literal(None), parent_id: psycopg2.sql.Composable = Literal(None), sub_type_id: psycopg2.sql.Composable = Literal(None), type_id: psycopg2.sql.Composable = Literal(None), year: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None) ‑> List[Tuple]-
Performs a lookup into the
main.nodesandmain.node_namestable given some search criteria.The following search terms are required:
- name (str)
The following search terms are optional:
- parent_id (int), which should be an id of the
main.nodestable - sub_type_id (int), which should be an id of the
main.node_sub_typestable - type_id (int)
- level (int)
- year (int)
The function requires a Pandas dataframe. Each row of the dataframe represents one "lookup" in the main schema table.
The search terms themselves can come from one of two sources:
- A column in the dataframe, indicated using
sql.Identifier - A value that is constant for all rows of the dataframe, indicated using
sql.Literal
For example, the following call would get the name search terms using the "name" column in the dataframe, and would use the level=4 for every search:
from psycopg2 import sql find_nodes_by_name( df, name=sql.Identifier("name"), level=sql.Literal(4), ...The function additionally requires you to specify which information should be returned from the search via the "returning" argument. Supported values are:
- default_name
- lookup_value
- parent_id
- type_id
- sub_type_id
- level
- year
- node_id
- trase_id
- count
Example usage:
import pandas as pd from psycopg2 import sql from trase.tools.pandasdb.find import find_nodes_by_name df = pd.DataFrame({"country_label": ["COOK ISLAS", "ZIMBAWE", "SAMOA AMERICANA"]}) returning = ["default_name", "node_id"] result = find_nodes_by_name( df, returning, name=sql.Identifier("country_label"), level=sql.Literal(1), ) print(result) # [('COOK ISLANDS', 55), ('ZIMBABWE', 211), ('AMERICAN SAMOA', 7)] df_result = pd.Dataframe(result, columns=returning)Let's break down the call to
find_nodes_by_name():-
First we pass the data containing the search terms we want to look up, in the form of a dataframe. This function is designed to take large dataframes:
find_nodes_by_name( df, -
We say that the function should return the (default) name of the node that it identifies, along with the Node ID:
returning=["default_name", "node_id"] -
Next, we say that the data used to match the node name should come from the "country_label" column in the dataframe. Here
sql.Identifermeans a column in the dataframe:name=sql.Identifier("country_label"), -
Then we say that the level should be 1, which is the level for a country. Here
sql.Literalmeans a value that should be the same for all rows:level=sql.Literal(1), -
The result of the function is a list of tuples, which we can convert to a dataframe
df_result = pd.Dataframe(result, columns=returning)
TODO:
- Test that columns adhere to postgres standards (or get around this with a hack!)
Args
df- a non-empty Pandas DataFrame
returning- a list of strings
name- a column in the dataframe or literal value (defaults to the column "name")
on_extra_columns- behaviour if the dataframe contains columns which are not used by this function. One of "error", "warn" or "ignore" (defaults to "error").
Returns
A list of tuples containing the search results. The list will be the same length as the input dataframe, and will be in the same order (i.e. the fifth entry in the return value is the result of the fifth row in the input dataframe). If a search result returned zero or not matches then it will contain None values. The tuples will be in the same order as the order of the "returning" argument.
def find_nodes_by_trase_id(df, returning: Iterable[str], trase_id=Identifier('trase_id'), year: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None) ‑> List[Tuple]def find_parent_regions_by_trase_id(df, returning: Iterable[str], parent_region_level_name: psycopg2.sql.Composable, trase_id=Identifier('trase_id'), on_extra_columns='error', cur=None, cnx=None)def find_trader_groups_by_name(df, returning: Iterable[str], group_name=Identifier('group_name'), year: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None) ‑> List[Tuple]def find_trader_groups_by_trader_id(df, returning: Iterable[str], trader_id=Identifier('trader_id'), year: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None) ‑> List[Tuple]-
Example
>>> df = pd.DataFrame({"group_name": ["H B EMPREENDIMENTOS"]}) >>> df_result = find_trader_groups_by_name( ... df, ... returning=["node_id"], ... ) >>> print(df_result) [(1828696,)] def find_traders_and_groups_by_label(df, returning: Iterable[str], trader_label=Identifier('trader_label'), country_id: psycopg2.sql.Composable = Literal(None), year: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None)-
TODO: unclear if the "count" return type is from the trader lookup or the group lookup. Make this clearer.
def find_traders_and_groups_by_trase_id(df, returning: Iterable[str], trase_id=Identifier('trase_id'), year: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None)def find_traders_by_label(df, returning: Iterable[str], trader_label=Identifier('trader_label'), country_id: psycopg2.sql.Composable = Literal(None), on_extra_columns='error', cur=None, cnx=None) ‑> List[Tuple]-
Query the traders hierarchy in the database to clean the trader name and identify the trader group.
The following search terms are required:
- trader_label (str)
The following search terms are optional:
- country_id (int), which should be an id of the
main.nodestable for a country node (type 'REGION', level 1) when specified
The function requires a Pandas dataframe. Each row of the dataframe represents one "lookup" in the main schema table.
The search terms themselves can come from one of two sources:
- A column in the dataframe, indicated using
sql.Identifier - A value that is constant for all rows of the dataframe, indicated using
sql.Literal
When applying the function to the exporter, the "country_id" can be provided as a
sql.Literalargument as it is the same for all rows:from psycopg2 import sql find_nodes_by_name( df, trader_label=sql.Identifier("exporter"), country_id=sql.Literal(27), ...When applying the function to the importer, each row will have its own country of import, so the "country_id" should be supplied in the dataframe alongside with "trader_label" as
sql.Identifier:from psycopg2 import sql find_nodes_by_name( df, trader_label=sql.Identifier("importer"), country_id=sql.Identifier("country_of_import_id"), ...The function additionally requires you to specify which information should be returned from the search via the "returning" argument. Supported values are:
- trader_id
- trader_name
- trase_id
- lookup_value
- count
Example usage:
import pandas as pd from psycopg2 import sql from trase.tools.pandasdb.find import find_traders_by_label returning = ["trader_name", "trase_id", "count"] df = pd.DataFrame({"trader_label": ["CARGILL"]}) result = find_traders_by_label(df, returning) print(result) # [ # ( # 'CARGILL', # 'BR-TRADER-53404893', # 1, # ), # ] df = pd.DataFrame({"trader_label": ["BAYER SA"], "country_id": [12]}) result = find_traders_by_label(df, returning, country_id=sql.Identifier("country_id")) print(result) # [ # ( # 'BAYER S.A.', # 'AR-TRADER-3050381106', # 1, # ), # ] df = pd.DataFrame({"trader_label": ["JOSE GARCIA"], "country_id": [27]}) result = find_traders_by_label(df, returning, country_id=sql.Identifier("country_id")) print(result) # [ # ( # None, # None, # 8, # ), # ]Args
df- a non-empty Pandas DataFrame
returning- a list of strings
trader_label- a column in the dataframe (defaults to the column "trader_label")
country_id- a column in the dataframe or literal value (defaults to the column "country_id")
on_extra_columns- behaviour if the dataframe contains columns which are not used by this function. One of "error", "warn" or "ignore" (defaults to "error").
Returns
A list of tuples containing the search results. The list will be the same length as the input dataframe, and will be in the same order (i.e. the fifth entry in the return value is the result of the fifth row in the input dataframe). If a search result returned zero or not matches then it will contain None values. The tuples will be in the same order as the order of the "returning" argument.
def none_factory()
Classes
class LookupBase-
Subclasses
- LookupBiomesByTraseId
- LookupByNodeCode
- LookupByNodeName
- LookupByNodeTraseId
- LookupDefaultNameByNodeId
- LookupEconomicBlocsByTraseId
- LookupParentRegionsByTraseId
- LookupTraderGroupsByName
- LookupTraderGroupsByTraderId
- LookupTradersByLabel
Methods
def find(self, df, returning, on_extra_columns, cur, cnx)def report(self) ‑> str
class LookupBiomesByTraseId (trase_id: psycopg2.sql.Composable, ref_id: psycopg2.sql.Literal)-
LookupBiomesByTraseId(trase_id: psycopg2.sql.Composable, ref_id: psycopg2.sql.Literal)
Ancestors
Class variables
var SUPPORTS_RETURNINGvar ref_id : psycopg2.sql.Literalvar trase_id : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupByNodeCode (code_id: psycopg2.sql.Composable, value: psycopg2.sql.Composable, year: psycopg2.sql.Composable = <factory>)-
LookupByNodeCode(code_id: psycopg2.sql.Composable, value: psycopg2.sql.Composable, year: psycopg2.sql.Composable =
) Ancestors
Class variables
var SUPPORTS_RETURNINGvar code_id : psycopg2.sql.Composablevar value : psycopg2.sql.Composablevar year : psycopg2.sql.Composable
Methods
def query(self, table_identifier: psycopg2.sql.Identifier)
class LookupByNodeName (name: psycopg2.sql.Composable, level: psycopg2.sql.Composable = <factory>, parent_id: psycopg2.sql.Composable = <factory>, sub_type_id: psycopg2.sql.Composable = <factory>, type_id: psycopg2.sql.Composable = <factory>, year: psycopg2.sql.Composable = <factory>)-
LookupByNodeName(name: psycopg2.sql.Composable, level: psycopg2.sql.Composable =
, parent_id: psycopg2.sql.Composable = , sub_type_id: psycopg2.sql.Composable = , type_id: psycopg2.sql.Composable = , year: psycopg2.sql.Composable = ) Ancestors
Class variables
var SUPPORTS_RETURNINGvar level : psycopg2.sql.Composablevar name : psycopg2.sql.Composablevar parent_id : psycopg2.sql.Composablevar sub_type_id : psycopg2.sql.Composablevar type_id : psycopg2.sql.Composablevar year : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupByNodeTraseId (trase_id: psycopg2.sql.Composable, year: psycopg2.sql.Composable = <factory>)-
LookupByNodeTraseId(trase_id: psycopg2.sql.Composable, year: psycopg2.sql.Composable =
) Ancestors
Class variables
var SUPPORTS_RETURNINGvar trase_id : psycopg2.sql.Composablevar year : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupDefaultNameByNodeId (node_id: psycopg2.sql.Composable)-
LookupDefaultNameByNodeId(node_id: psycopg2.sql.Composable)
Ancestors
Class variables
var SUPPORTS_RETURNINGvar node_id : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupEconomicBlocsByTraseId (trase_id: psycopg2.sql.Composable)-
LookupEconomicBlocsByTraseId(trase_id: psycopg2.sql.Composable)
Ancestors
Class variables
var SUPPORTS_RETURNINGvar trase_id : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupParentRegionsByTraseId (parent_region_level_name: psycopg2.sql.Composable, trase_id: psycopg2.sql.Composable)-
LookupParentRegionsByTraseId(parent_region_level_name: psycopg2.sql.Composable, trase_id: psycopg2.sql.Composable)
Ancestors
Class variables
var SUPPORTS_RETURNINGvar parent_region_level_name : psycopg2.sql.Composablevar trase_id : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupTraderGroupsByName (group_name: psycopg2.sql.Composable, year: psycopg2.sql.Composable = <factory>)-
LookupTraderGroupsByName(group_name: psycopg2.sql.Composable, year: psycopg2.sql.Composable =
) Ancestors
Class variables
var SUPPORTS_RETURNINGvar group_name : psycopg2.sql.Composablevar year : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupTraderGroupsByTraderId (trader_id: psycopg2.sql.Composable, year: psycopg2.sql.Composable = <factory>)-
LookupTraderGroupsByTraderId(trader_id: psycopg2.sql.Composable, year: psycopg2.sql.Composable =
) Ancestors
Class variables
var SUPPORTS_RETURNINGvar trader_id : psycopg2.sql.Composablevar year : psycopg2.sql.Composable
Methods
def query(self, table_identifier)
class LookupTradersByLabel (trader_label: psycopg2.sql.Composable, country_id: psycopg2.sql.Composable)-
LookupTradersByLabel(trader_label: psycopg2.sql.Composable, country_id: psycopg2.sql.Composable)
Ancestors
Class variables
var SUPPORTS_RETURNINGvar country_id : psycopg2.sql.Composablevar trader_label : psycopg2.sql.Composable
Methods
def query(self, table_identifier)