Module trase.tools.sei_pcs.pandas_lp

Pandas-friendly wrapper functions for common linear programming problems that we encounter in SEI-PCS.

Functions

def make_variable_name(df, columns, truncate_after=20, check=True) ‑> pandas.core.series.Series

Make a Pulp variable name from a dataframe.

The return value will be a pd.Series object where each row consists of the provided columns concatenated together. To ensure uniqueness we also prepend the index of the dataframe. Since the columns can contain abitrary strings, we slugify their contents and truncate them if they are too long.

Example

>>> df = pd.DataFrame([
...     {"port": "ACRE", "company": "PIMEX"},
...     {"port": "AIOI", "company": "SODRU"},
...     {"port": "COGO", "company": "ABC SHIPPING"},
... ])
>>> names = make_variable_name(df_2, ["port", "company"], truncate_after=10)
>>> names
0         0_acre_pimex
1         1_aioi_sodru
2    2_cogo_abc-shippi
dtype: object
>>> names.apply(pulp.LpVariable) # constructing PuLP variables using these names

Args

df
dataframe containing the necessary data to construct the variable names
columns
the columns of the dataframe that will be included in the name. All values will be cast to strings. Note that the index of the dataframe will also be included.
truncate_after : int, optional
when to truncate values (default: 20 characters)
check : bool, optional
optionally raise an error if the names are not unique (default: True)

Returns

A pd.Series object containing strings

def solve_2_step_transportation_problem(supply: pandas.core.series.Series, output: pandas.core.series.Series, demand: pandas.core.series.Series, costs_1: pandas.core.series.Series, costs_2: pandas.core.series.Series, output_deviation_cost_factor=None, solver=None, on_missing: str = 'raise', eq_constraints_1=None, geq_constraints_1=None, leq_constraints_1=None, eq_constraints_2=None, geq_constraints_2=None, leq_constraints_2=None, commodity_ratios=None, solve_kwargs=None)

Solve 2-step transportation problem using linear programming. Suppose a country has M farms ("sources"), N processing facilities ("via") and P ports ("sinks"). Commodities are transported from the farms to the processing facilities, and to the ports. Each farm has a maximum quantity of commodity it can supply, each processing facility has a maximum quantity of commodity it can process, and each port has a demand quantity.

We are given the transportation costs between every pair of farms and processing facilities, and between each pair of processing facilities and ports, and these costs are assumed to be linear to the quantity of commodity.

The problem is to meet the demand at each sink at minimum cost without exhausting the supply at any source, and without exceeding the output at each via point.

If a cost is missing for any (source, via) or (via, sink) pair then it is assumed that delivery is not possible. If a supply output is missing for any source it is assumed to be zero. If demand quantity is missing for any sink it is assumed to be zero.

It is possible to construct a problem where it is impossible to satisfy one or more sinks and/or via points. This happens when there are no reachable sources with a defined capacity. That is to say, there are either no cost entries for any via-sink pair, or not cost entries for any source-via pair connected to the sink, or there are cost entries but there are no supply constraints for any of the sources. In this scenario the function will, by default, raise an error. However you can control the behaviour using the on_missing argument.

Any number of additional constraints (equality, greater or equal, or lower or equal) can be specified for either steps of the path with the …constraints… arguments.

Args

supply
a pd.Series of supply capacities, one row per source
demand
a pd.Series of demand quantities, one row per sink
output
a pd.Series of capacities, one row per via point
costs_1
a pd.Series of costs with a multi-index of (source, via) pairs
costs_2
a pd.Series of costs with a multi-index of (via, sink) pairs
output_deviation_cost_factor
when value is None, the output for all "via" points needs to be met exactly. When value is non-null, any quantity in excess or deficit compared with the output adds a cost to the objective function by multiplying the amount with the cost factor. One way to understand the meaning of the factor is the following: when the factor has value X, sourcing trough a via point V1 without available output is as costly as sourcing through another via point V2 with available output located "further" by X units of transportation cost. For instance, if transportation costs are in minutes, a cost factor of 120 means that it would be cheaper to source through any via point with available output located less than 2 hours further from the final destination than from a via point that has no available output.
eq_constraints_1
a pd.Series of expected allocation between sources and via points.
geq_constraints_1
a pd.Series of lower limit of allocation between sources and via points.
leq_constraints_1
a pd.Series of upper limit of allocation between sources and via points.
eq_constraints_2
a pd.Series of expected allocation between via points and sinks.
geq_constraints_2
a pd.Series of lower limit of allocation between via points and sinks.
leq_constraints_2
a pd.Series of upper limit of allocation between via points and sinks.

Returns

allocation
a pd.Series of quantities with a multi-index of (source, via, sink) pairs
leftover_supply
a pd.Series of unallocated supply, one row per source
leftover_output
a pd.Series of unallocated output, one row per via point
def solve_transportation_problem(supply: pandas.core.series.Series, demand: pandas.core.series.Series, costs: pandas.core.series.Series, solver=None, on_missing: str = 'raise', on_lp_error: str = 'raise', allow_deviation=False, eq_constraints=None, geq_constraints=None, leq_constraints=None, commodity_ratios=None, solve_kwargs=None)
def warn_if_supply_sheds_demand_exceeds_supply(supply: pandas.core.series.Series, demand: pandas.core.series.Series, costs: pandas.core.series.Series)

The total LP supply sometimes appears sufficient to meet the total demand, but the details of the available paths in the cost matrix make it impossible for some of the supply to reach the demand. When such issues occur, the LP just doesn't find a solution, but does not give any hint about what may be the issue. This function identifies supply sheds: groups of sources connected to the same group of sinks, and flags supply sheds for which there is excess demand.

Classes

class TraseGlpkCommand (logPath='glpk.log', msg=False, **kwargs)

This is Pulp's GLPK_CMD but with a trick to capture stdout

Usually GLPK_CMD has two modes:

  1. msg=True, in which case glpk is run as a fork of the current Python process
  2. msg=False, in which case the output of glpl is sent to /dev/null

The intention of 1. is that the stdout of glpk is sent to the user. However, we have an odd problem that is very specific to DeforestationFree where the output is simply lost.

You can reproduce this problem by running the following code:

import os
os.spawnvp(os.P_WAIT, "echo", ["echo", "hi"])

If you run this locally you will see the output "hi". However if you run it in a Jupyter notebook on DeforestationFree you see nothing. I suspect that the output is being sent to the terminal rather than the notebook, perhaps the same problem as https://github.com/jupyterlab/jupyterlab/issues/9668. I hope that an upgrade or re-deployment of JupyterHub will resolve this issue.

Until then I have resorted to this clever trick, which is to patch os.devnull with a filename and pass msg=False. The pulp code runs open(os.devnull, "w") which actually results it in opening the text file!

To use this class do the following:

solver = TraseGlpkCommand()
solve_transportation_problem(supply, demand, costs, solver=solver)

:param bool mip: if False, assume LP even if integer variables :param bool msg: if False, no log is shown :param float timeLimit: maximum time for solver (in seconds) :param list options: list of additional options to pass to solver :param bool keepFiles: if True, files are saved in the current directory and not deleted after solving :param str path: path to the solver binary

Ancestors

  • pulp.apis.glpk_api.GLPK_CMD
  • pulp.apis.core.LpSolver_CMD
  • pulp.apis.core.LpSolver

Methods

def actualSolve(self, lp)

Solve a well formulated lp problem