Introduction to Pywr

Pywr is a generalised network resource allocation model written in Python.

It aims to be fast, free, and extendable.

by James Tomlinson

Overview

This presentation covers:

  • A quick background to Pywr
  • A simple example model using the JSON format
  • An overview of how to extend Pywr

Background to Pywr

  • Pywr is a tool for solving network resource allocation problems at discrete timesteps using a linear programming approach.
  • It's principal application is in resource allocation in water supply networks, although other uses are conceivable.
  • Nodes in the network can be given constraints (e.g. minimum/maximum flows) and costs, and can be connected as required.
  • Parameters in the model can vary time according to boundary conditions (e.g. an inflow timeseries) or based on states in the model (e.g. the current volume of a reservoir).
  • Models can be developed using the Python API, either in a script or interactively using IPython/Jupyter.
  • Alternatively, models can be defined in a rich JSON-based document format.

Networks networks networks

Design goals

  • Pywr is a tool for solving network resource allocation problems.
    • Some similarities with other software packages such as WEAP, Wathnet and Aquator, but also has some significant differences.
  • Pywr’s principle design goals are that it is:
    • Fast enough to handle large datasets, and large numbers of scenarios and function evaluations required by advanced decision making methodologies;
    • Free to use without restriction – licenced under the GNU General Public Licence;
    • Extendable – using the Python programming language to define complex operational rules and control model runs

Conceptual Pywr model run

The following is a pseudo-code conceptualisation of a Pywr model run. The actual code is a little (but not much!) more complicated.

model = Model.load('mymodel.json')  # Load a model from a JSON definition.
model.setup()  # Do some initial setup.
for timestep in model.timesteps:
    model.before()  # Update Nodes etc. before the solve this time-step.
    model.solve()   # Allocate the resource around the network.
    model.after()   # Update Nodes etc. after the solve this time-step.
model.finish()  # Finalise anything (e.g. close files).

Technical overview

  • Native support for multiple scenarios.
    • Not parallel execution but multiple scenarios can be run during a single simulation.
    • Easy to define which data vary in which scenarios (most of your data does not vary!).
    • Scenarios can be sliced for different runs and therefore can exploit batch runnning (multiple processors).
  • Resource agnostic.
    • Primarily used for water networks.
    • However, any resource flow could be modelled. Including multi-resource models.
  • JSON input file format.
    • Models can be defined using a JSON format.
    • This allows models to be shared (e.g. over the web), templated and manipulated by existing JSON tools.

Technical overview (2)

  • Native support for multiple scenarios.
    • Not parallel execution but multiple scenarios can be run during a single simulation.
    • Easy to define which data vary in which scenarios (most of your data does not vary!).
    • Scenarios can be sliced for different runs and therefore can exploit batch runnning (multiple processors).
  • Resource agnostic.
    • Primarily used for water networks.
    • However, any resource flow could be modelled. Including multi-resource models.
  • JSON input file format.
    • Models can be defined using a JSON format.
    • This allows models to be shared (e.g. over the web), templated and manipulated by existing JSON tools.

The most basic example

Let's work through the most basic Pywr example.

This example uses the Python API (as opposed to JSON input file).

First import the required classes from Pywr ...

In [2]:
from pywr.core import Model
from pywr.nodes import Input, Output, Link
from pywr.notebook import draw_graph

... create a new Model instance ...

In [3]:
model = Model()

... add some nodes ...

In [4]:
input_node = Input(model, name='Input')
link_node = Link(model, name='Link')
output_node = Output(model, name='Output')

... connect them together ....

In [5]:
input_node.connect(link_node)
link_node.connect(output_node)

... and finally draw a representation of the network.

In [6]:
draw_graph(model, height=100)
In [7]:
model.run();

output_node.flow
Out[7]:
array([ 0.])

Update the input flow

- Maximum flow of 10 $Mm^3$/day.
- Negative cost induces a flow through the only route in the model.
In [8]:
input_node.max_flow = 10
input_node.cost = -1

model.run();
output_node.flow
Out[8]:
array([ 10.])

The most interesting behaviour comes where,

  • There are complex networks of interconnected nodes, and
  • The properties of the network (flows and costs) changes with time and state of the model.

More on this later!

JSON Format

Models can also be defined in an external file format rather than using the Python API. What is JSON?

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate.

This is an example JSON document describing information about a person:

{
  "firstName": "John",
  "lastName": "Smith",
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021"
  },
}

The overall structure of the a Pywr model is given below.

  • The most important are the nodes and edges sections.
{
    "metadata": {},
    "timestepper": {},
    "solver": {},
    "scenarios": [],
    "includes": [], 
    "nodes": [],
    "edges": [],
    "parameters": {},
    "recorders": {}
}

Metadata

The metadata section includes information about the model as key-value pairs. It is expected as a minimum to include a "title" and "description" and may additionally include keys such as "author".

{"metadata": {
    "title": "Example",
    "description": "An example for the documentation",
    "author": "John Smith"
}}

Timestepper

The timestepper defines the period a model is run for and the timestep used. It corresponds directly to the pywr.core.Timestepper instance on the model. It has three properties: the start date, end date and timestep.

The example below describes a model that will run from 1st January 2016 to 31st December 2016 using a 7 day timestep.

{"timestepper": {
    "start": "2016-01-01",
    "end": "2016-12-31",
    "timestep": 7
}}

Nodes

The nodes section describes the nodes in the model. As a minimum a node must have a name and a type.

There are two fundamental types of node in Pywr which have different properties:

- pywr.core.Node  
- pywr.core.Storage 

Non-storage nodes

There are three fundamental non-storage nodes:

  • input nodes for adding flow to the network,
  • link nodes for transporting and constraining flow around the network, and
  • output nodes for removing flow from the network.

The Node type and it’s subtypes have a max_flow and cost property, both of which have default values.

{"nodes": [
    {
        "name": "groundwater",
        "type": "input",
        "max_flow": 23.0,
        "cost": 10.0
    }
]}

In addition to the basic input, output and link types, subtypes can be created by specifying the appropriate name.

  • Some subtypes will provide additional properties; often these correspond directly to the keyword arguments of the class.
  • See the example of a Catchment type below which has a flow property rather than a max_flow:
{"nodes": [
    {
        "name": "my_catchment",
        "type": "catchment",
        "flow": 23.0,
        "cost": 10.0
    }
]}

A second example, a river gauge which has a soft MRF constraint is demonstrated below. The mrf property is the minimum residual flow required, the mrf_cost is the cost applied to that minimum flow, and the cost property is the cost associated with the residual flow.

{"nodes": [
    {
        "name": "Teddington GS",
        "type": "rivergauge",
        "mrf": 200.0,
        "cost": 0.0,
        "mrf_cost": -1000.0
    }
]}

Storage nodes

The Storage type and it’s subtypes have a max_volume, min_volume and initial_volume, as well as num_inputs and num_outputs. The maximum and initial volumes must be specified, whereas the others have default values.

{"nodes": [
    {
        "name": "Big Wet Lake",
        "type": "storage",
        "max_volume": 1000,
        "initial_volume": 700,
        "min_volume": 0,
        "num_inputs": 1,
        "num_outputs": 1,
        "cost": -10.0
    }
]}

When defining a storage node with multiple inputs or outputs connections need to be made using the slot notation (discussed in the edges section).

Edges

The edges section describes the connections between nodes. As a minimum an edge is defined as a two-item list containing the names of the nodes to connect (given in the order corresponding to the direction of flow), e.g.:

{"edges": [
    ["supply", "intermediate"],
    ["intermediate", "demand"]
]}

Additionally the to and from slots can be specified. For example the code below connects reservoirA slot 2 to reservoirB slot 3.

{"edges": [
    ["reservoirA", "reservoirB", 2, 3]
]}

Parameters

Sometimes it is convenient to define a Parameter used in the model in the "parameters" section instead of inside a node, for instance if the parameter is needed by more than one node.

{
    "nodes": [
        {
            "name": "groundwater",
            "type": "input",
            "max_flow": "gw_flow"
        }
    ],
    "parameters": [
        {
            "name": "gw_flow",
            "type": "constant",
            "value": 23.0
        }
    ]
}

Parameters can be more complicated than simple scalar values. For instance, a time varying parameter can be defined using a monthly or daily profile which repeats each year.

{"parameters": [
    {
        "name": "mrf_profile",
        "type": "monthlyprofile",
        "values": [10, 10, 10, 10, 50, 50, 50, 50, 20, 20, 10, 10]
    }
]}

External data

Instead of defining the data inline using the "values" property, external data can be referenced as below. The URL should be relative to the JSON document not the current working directory.

{"parameters": [
    {
        "name": "catchment_inflow",
        "type": "dataframe",
        "url": "data/catchmod_outputs_v2.csv",
        "column": "Flow",
        "index_col": "Date",
        "parse_dates": true
    }
]}

Putting it all together

In [12]:
Model.loads("""
{
    "metadata": {"title": "Simple 1", "description": "A very simple example.", "minimum_version": "0.1"},
    "timestepper": {"start": "2015-01-01", "end": "2015-12-31", "timestep": 1},
    "nodes": [
        {"name": "supply1", "type": "Input", "max_flow": 15},
        {"name": "link1",   "type": "Link"},
        {"name": "demand1", "type": "Output","max_flow": 10, "cost": -10}
    ],
    "edges": [
        ["supply1", "link1"],
        ["link1", "demand1"]
    ]
}
""").run();