Skip to content

About the notebooks

Background

The notebooks in this folder demonstrate the core functionality of the patientflow package. They have been written by me, Dr Zella King, the primary author of this repository. My aim is to introduce, in a step-by-step approach, how to structure your data for use with the package, and how to use the functions. I conclude with a fully worked example of how we use these functions at University College London Hospital (UCLH) to predict emergency demand for beds.

Outline of the notebooks

The first notebook explains how to set up your environment to run the notebooks that follow. Instructions are also provided at the bottom of this README.

I then explain who are the intended users of predictive models of patient flow.

There is then a series of notebooks on preparing patient snapshots, training models on them, and evaluating the performance of those models. I also introduce the real data provided by UCLH in a summary notebook.

Next is a series of notebooks on preparing group snapshots, generating predictions for group snapshots, and evaluating the predictions.

A set of notebooks follow, that show how we assembled the building blocks from the 3x_ notebooks into a production system at UCLH to predict demand for beds.

  • 4_Specify_demand_model: Specifies the operational requirements for demand predictions at UCLH, bridges from the 3x_ notebooks, and provides an overview of the notebooks that follow.
  • 4a_Organise_predictions_for_a_production_pipeline: Introduces the structured data classes (FlowInputs, ServicePredictionInputs, DemandPredictor, FlowSelection, PredictionBundle) that organise predictions for production use.
  • 4b_Stratify_predictions_by_patient_subgroups: Shows how to stratify predictions by observable patient characteristics (e.g. children vs adults vs older adults, men vs women) using MultiSubgroupPredictor.
  • 4c_Predict_demand: Shows the full prediction pipeline, combining patients currently in the ED with those yet to arrive, to predict demand at UCLH.
  • 4d_Evaluate_demand_predictions: Evaluates all production model components systematically across the test set, using the evaluation methods introduced in the 3x_ notebooks.
  • 4e_Generate_predictions_using_hierarchy: Shows the use of a hierarchical approach to generate demand predictions at different levels of a hospital's reporting hierarchy.

Data used in the notebooks

The early notebooks (2a, 2b, 3a, 3e) generate fake data on-the-fly so you can run them immediately without any external files. From notebook 2c onwards, most notebooks use real data from University College London Hospital (UCLH), available on Zenodo. If you don't have the public data, change data_folder_name from 'data-public' to 'data-synthetic' to use the bundled synthetic dataset instead. Notebook 4a also generates fake data on-the-fly, to introduce the production data classes without requiring external data.

Preparing your notebook environment

Installation

You can install the patientflow package directly from PyPI:

pip install patientflow

For development purposes or to run these notebooks with the latest code, you may still want to use the Github repository directly. In that case, the PATH_TO_PATIENTFLOW environment variable needs to be set so notebooks know where the patientflow repository resides on your computer. You have various options:

  • use a virtual environment and set PATH_TO_PATIENTFLOW up within that
  • set PATH_TO_PATIENTFLOW globally on your computer
  • let each notebook infer PATH_TO_PATIENTFLOW from the location of the notebook file, or specify it within the notebook

To set the PATH_TO_PATIENTFLOW environment variable within your virtual environment

Conda environments

Add PATH_TO_PATIENTFLOW to the environment.yml file:

variables:
  PATH_TO_PATIENTFLOW: /path/to/patientflow

venv environment

Add path_to_patientflow to the venv activation script:

echo 'export PATH_TO_PATIENTFLOW=/path/to/patientflow' >> venv/bin/activate  # Linux/Mac
echo 'set PATH_TO_PATIENTFLOW=/path/to/patientflow' >> venv/Scripts/activate.bat  # Windows

The environment variable will be set whenever you activate the virtual environment and unset when you deactivate it. Replace /path/to/patientflow with your repository path.

To set the project_root environment variable from within each notebook

A function called set_project_root() can be run in each notebook. If you include the name of a environment variable as shown below, the function will look in your global environment for a variable of this name.

Alternatively, if you call the function without any arguments, the function will try to infer the location of the patientflow repo from your currently active path.

# to specify an environment variable that has been set elsewhere
project_root = set_project_root(env_var ="PATH_TO_PATIENTFLOW")

# to let the notebook infer the path
project_root = set_project_root()

You can also set an environment variable from within a notebook cell:

Linux/Mac:

%env PATH_TO_PATIENTFLOW=/path/to/patientflow

Windows:

%env PATH_TO_PATIENTFLOW=C:\path\to\patientflow

Replace /path/to/patientflow with the actual path to your cloned repository.

To set project_root environment variable permanently on your system

Linux/Mac:

# Add to ~/.bashrc or ~/.zshrc:
export PATH_TO_PATIENTFLOW=/path/to/patientflow

Windows:

Open System Properties > Advanced > Environment Variables
Under User Variables, click New
Variable name: PATH_TO_PATIENTFLOW
Variable value: C:\path\to\patientflow
Click OK

Replace /path/to/patientflow with your repository path. Restart your terminal/IDE after setting.