Module trase.tools.sei_pcs.pandas_lp
Pandas-friendly wrapper functions for common linear programming problems that we encounter in SEI-PCS.
Functions
def make_variable_name(df, columns, truncate_after=20, check=True) ‑> pandas.core.series.Series-
Make a Pulp variable name from a dataframe.
The return value will be a
pd.Seriesobject where each row consists of the provided columns concatenated together. To ensure uniqueness we also prepend the index of the dataframe. Since the columns can contain abitrary strings, we slugify their contents and truncate them if they are too long.Example
>>> df = pd.DataFrame([ ... {"port": "ACRE", "company": "PIMEX"}, ... {"port": "AIOI", "company": "SODRU"}, ... {"port": "COGO", "company": "ABC SHIPPING"}, ... ]) >>> names = make_variable_name(df_2, ["port", "company"], truncate_after=10) >>> names 0 0_acre_pimex 1 1_aioi_sodru 2 2_cogo_abc-shippi dtype: object >>> names.apply(pulp.LpVariable) # constructing PuLP variables using these namesArgs
df- dataframe containing the necessary data to construct the variable names
columns- the columns of the dataframe that will be included in the name. All values will be cast to strings. Note that the index of the dataframe will also be included.
truncate_after:int, optional- when to truncate values (default: 20 characters)
check:bool, optional- optionally raise an error if the names are not unique (default: True)
Returns
A
pd.Seriesobject containing strings def solve_2_step_transportation_problem(supply: pandas.core.series.Series, output: pandas.core.series.Series, demand: pandas.core.series.Series, costs_1: pandas.core.series.Series, costs_2: pandas.core.series.Series, output_deviation_cost_factor=None, solver=None, on_missing: str = 'raise', eq_constraints_1=None, geq_constraints_1=None, leq_constraints_1=None, eq_constraints_2=None, geq_constraints_2=None, leq_constraints_2=None, commodity_ratios=None, solve_kwargs=None)-
Solve 2-step transportation problem using linear programming. Suppose a country has M farms ("sources"), N processing facilities ("via") and P ports ("sinks"). Commodities are transported from the farms to the processing facilities, and to the ports. Each farm has a maximum quantity of commodity it can supply, each processing facility has a maximum quantity of commodity it can process, and each port has a demand quantity.
We are given the transportation costs between every pair of farms and processing facilities, and between each pair of processing facilities and ports, and these costs are assumed to be linear to the quantity of commodity.
The problem is to meet the demand at each sink at minimum cost without exhausting the supply at any source, and without exceeding the output at each via point.
If a cost is missing for any (source, via) or (via, sink) pair then it is assumed that delivery is not possible. If a supply output is missing for any source it is assumed to be zero. If demand quantity is missing for any sink it is assumed to be zero.
It is possible to construct a problem where it is impossible to satisfy one or more sinks and/or via points. This happens when there are no reachable sources with a defined capacity. That is to say, there are either no cost entries for any via-sink pair, or not cost entries for any source-via pair connected to the sink, or there are cost entries but there are no supply constraints for any of the sources. In this scenario the function will, by default, raise an error. However you can control the behaviour using the
on_missingargument.Any number of additional constraints (equality, greater or equal, or lower or equal) can be specified for either steps of the path with the …constraints… arguments.
Args
supply- a
pd.Seriesof supply capacities, one row per source demand- a
pd.Seriesof demand quantities, one row per sink output- a
pd.Seriesof capacities, one row per via point costs_1- a
pd.Seriesof costs with a multi-index of (source, via) pairs costs_2- a
pd.Seriesof costs with a multi-index of (via, sink) pairs output_deviation_cost_factor- when value is None, the output for all "via" points needs to be met exactly. When value is non-null, any quantity in excess or deficit compared with the output adds a cost to the objective function by multiplying the amount with the cost factor. One way to understand the meaning of the factor is the following: when the factor has value X, sourcing trough a via point V1 without available output is as costly as sourcing through another via point V2 with available output located "further" by X units of transportation cost. For instance, if transportation costs are in minutes, a cost factor of 120 means that it would be cheaper to source through any via point with available output located less than 2 hours further from the final destination than from a via point that has no available output.
eq_constraints_1- a
pd.Seriesof expected allocation between sources and via points. geq_constraints_1- a
pd.Seriesof lower limit of allocation between sources and via points. leq_constraints_1- a
pd.Seriesof upper limit of allocation between sources and via points. eq_constraints_2- a
pd.Seriesof expected allocation between via points and sinks. geq_constraints_2- a
pd.Seriesof lower limit of allocation between via points and sinks. leq_constraints_2- a
pd.Seriesof upper limit of allocation between via points and sinks.
Returns
allocation- a
pd.Seriesof quantities with a multi-index of (source, via, sink) pairs leftover_supply- a
pd.Seriesof unallocated supply, one row per source leftover_output- a
pd.Seriesof unallocated output, one row per via point
def solve_transportation_problem(supply: pandas.core.series.Series, demand: pandas.core.series.Series, costs: pandas.core.series.Series, solver=None, on_missing: str = 'raise', on_lp_error: str = 'raise', allow_deviation=False, eq_constraints=None, geq_constraints=None, leq_constraints=None, commodity_ratios=None, solve_kwargs=None)def warn_if_supply_sheds_demand_exceeds_supply(supply: pandas.core.series.Series, demand: pandas.core.series.Series, costs: pandas.core.series.Series)-
The total LP supply sometimes appears sufficient to meet the total demand, but the details of the available paths in the cost matrix make it impossible for some of the supply to reach the demand. When such issues occur, the LP just doesn't find a solution, but does not give any hint about what may be the issue. This function identifies supply sheds: groups of sources connected to the same group of sinks, and flags supply sheds for which there is excess demand.
Classes
class TraseGlpkCommand (logPath='glpk.log', msg=False, **kwargs)-
This is Pulp's GLPK_CMD but with a trick to capture stdout
Usually GLPK_CMD has two modes:
msg=True, in which case glpk is run as a fork of the current Python processmsg=False, in which case the output of glpl is sent to /dev/null
The intention of 1. is that the stdout of glpk is sent to the user. However, we have an odd problem that is very specific to DeforestationFree where the output is simply lost.
You can reproduce this problem by running the following code:
import os os.spawnvp(os.P_WAIT, "echo", ["echo", "hi"])If you run this locally you will see the output "hi". However if you run it in a Jupyter notebook on DeforestationFree you see nothing. I suspect that the output is being sent to the terminal rather than the notebook, perhaps the same problem as https://github.com/jupyterlab/jupyterlab/issues/9668. I hope that an upgrade or re-deployment of JupyterHub will resolve this issue.
Until then I have resorted to this clever trick, which is to patch
os.devnullwith a filename and passmsg=False. The pulp code runsopen(os.devnull, "w")which actually results it in opening the text file!To use this class do the following:
solver = TraseGlpkCommand() solve_transportation_problem(supply, demand, costs, solver=solver):param bool mip: if False, assume LP even if integer variables :param bool msg: if False, no log is shown :param float timeLimit: maximum time for solver (in seconds) :param list options: list of additional options to pass to solver :param bool keepFiles: if True, files are saved in the current directory and not deleted after solving :param str path: path to the solver binary
Ancestors
- pulp.apis.glpk_api.GLPK_CMD
- pulp.apis.core.LpSolver_CMD
- pulp.apis.core.LpSolver
Methods
def actualSolve(self, lp)-
Solve a well formulated lp problem