3c. Predict bed counts without using patient snapshots

There are situations where we might want to predict bed count distributions without having details of the patients, for example when predicting the number of beds needed for incoming patients such as:

Incoming patients who haven't yet arrived at the Emergency Department, but who will need a bed within a prediction window
Emergency patients who arrive via other routes than the ED, and become inpatients (such as emergency transfers from other hospitals)
Elective admissions of patients. Planned elective admissions are recorded on a 'To Come In' (TCI) list by patient number. In data, the hospital visit (known as an encounter) for the elective procedure begins at the moment a patient arrives, but may not be recorded in data prior to that point. In a simple case without making use of any data on TCI lists, we might want to predict bed requirements for such patients based on past patterns of such arrivals between a prediction time (eg 12:00) and the end of a prediction window (eg 8 hours later).

For these situations, we can use patientflow to learn patterns from past data, and use these to predict a bed count distribution at the aggregate level.

In this notebook, I'll use the example of predicting the number of beds needed for incoming patients (patients yet to arrive to the Emergency Department who will need a bed within a prediction window). I'll show three approaches:

a Poisson model trained on past arrival rates of patients who both arrived and were admitted within a prediction window
a weighted Poisson model using an empirical survival curve; arrival rates of patients who were admitted (at some point), are weighted by their probability of being admitted within a prediction window, calculated from a survival curve learned form past data
a weighted Poisson model using an aspirational approach; instead of using a survival curve learned form past data, it is assumed that the ED is meeting 4-hour targets for time to admission.

I also demonstrate making predictions by specialty for demand from incoming patients.

# Reload functions every time
%load_ext autoreload 
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Create fake arrival times

I will generate some fake data on patients in an Emergency Department (ED) using the same method as in previous notebooks. This time, I only need patients who were admitted after the ED visit.

from patientflow.generate import create_fake_finished_visits
visits_df, _, _ = create_fake_finished_visits('2023-01-01', '2023-04-01', mean_patients_per_day=50, admitted_only=True)
visits_df.head()

	patient_id	visit_number	arrival_datetime	departure_datetime	age	is_admitted	specialty
0	3031	33	2023-01-01 06:12:03	2023-01-01 15:51:03	29	1	medical
1	8829	15	2023-01-01 06:55:24	2023-01-01 21:28:24	34	1	paediatric
2	10479	27	2023-01-01 07:53:54	2023-01-01 11:56:54	61	1	medical
3	10856	8	2023-01-01 08:02:10	2023-01-01 13:16:10	31	1	medical
4	6011	43	2023-01-01 10:05:20	2023-01-01 20:16:20	40	1	medical

I'll rename the departure_datetime column to admitted_to_ward_datetime since all admitted patients depart to a ward.

Note - if you follow my approach using your own data, make sure you start with a dataset of arrival and admitted-to-ward times that has only one row per visit. Do not use your snapshot dataset for this, if it includes multiple rows per visit.

import pandas as pd

inpatient_arrivals = visits_df.rename(columns = {'departure_datetime': 'admitted_to_ward_datetime'}).drop(columns = 'is_admitted')
inpatient_arrivals['arrival_datetime'] = pd.to_datetime(inpatient_arrivals['arrival_datetime'])

I will generate an array of dates covered by the data I've loaded. I'm calling these snapshot_dates for consistency.

from datetime import timedelta, date

# Create date range
snapshot_dates = []
start_date = date(2023, 1, 1)
end_date = date(2023, 4, 1)

current_date = start_date
while current_date < end_date:
    snapshot_dates.append(current_date)
    current_date += timedelta(days=1)

print('First ten snapshot dates')
snapshot_dates[0:10]

First ten snapshot dates





[datetime.date(2023, 1, 1),
 datetime.date(2023, 1, 2),
 datetime.date(2023, 1, 3),
 datetime.date(2023, 1, 4),
 datetime.date(2023, 1, 5),
 datetime.date(2023, 1, 6),
 datetime.date(2023, 1, 7),
 datetime.date(2023, 1, 8),
 datetime.date(2023, 1, 9),
 datetime.date(2023, 1, 10)]

Predicting future arrivals, with model trained on past data.

When predicting how many patients will arrive after a prediction time, and be admitted before the end of a prediction window, we need some way to decide how long it takes admitted patients to be processed by the ED.

The first two approaches, shown in this section, use past data to determine how long a patient will be in the ED before admission.

A Poisson model based on counts of patients who arrived and were admitted within in a prediction window (eg arrived after 06:00 and were admitted before 14:00) during the training set period
A survival curve based on empirical data from the training set period

As in previous notebooks, I'll apply a temporal split to the data.

from datetime import date   
from patientflow.prepare import create_temporal_splits

# set the temporal split
start_training_set = date(2023, 1, 1) 
start_validation_set = date(2023, 2, 15) # 6 week training set 
start_test_set = date(2023, 3, 1) # 2 week validation set 
end_test_set = date(2023, 4, 1) # 1 month test set

# create the temporal splits
train_visits, valid_visits, test_visits = create_temporal_splits(
    inpatient_arrivals,
    start_training_set,
    start_validation_set,
    start_test_set,
    end_test_set,
    col_name="arrival_datetime", # states which column contains the date to use when making the splits 

)

Split sizes: [2214, 710, 1584]

Approach 1 using past data: Poisson model trained on past arrival rates of patients who both arrived and were admitted within a prediction window

The function below generates counts by snapshot date of patients who arrived after the prediction time and were admitted before the end of the prediction window.

After applying the function, the count data is shown below.

from patientflow.calculate.arrival_rates import count_yet_to_arrive

prediction_times = [(6,0), (9,30), (12, 0), (15,30), (22,0)] 

yet_to_arrive_counts = count_yet_to_arrive(inpatient_arrivals, 
                                           snapshot_dates, 
                                           prediction_times, 
                                           prediction_window_hours=8)
yet_to_arrive_counts.head()

	snapshot_date	prediction_time	count
0	2023-01-01	(6, 0)	2
1	2023-01-01	(9, 30)	1
2	2023-01-01	(12, 0)	6
3	2023-01-01	(15, 30)	3
4	2023-01-01	(22, 0)	1

Here I use the mean daily count as the mean of a Poisson distribution.

from scipy import stats
poisson_mean = yet_to_arrive_counts[yet_to_arrive_counts['prediction_time'] == (12, 0)]['count'].mean()
poisson_model = stats.poisson(poisson_mean)

I use the Poisson model to predict a bed count distribution for the patients yet-to-arrive.

prob_dist_data = [poisson_model.pmf(k) for k in range(20)]

from patientflow.viz.probability_distribution import plot_prob_dist
from patientflow.viz.utils import format_prediction_time
title = (
    f'Probability distribution for number of beds needed for patients'
    f'\nwho will arrive after {format_prediction_time(prediction_times[0])} on {snapshot_dates[0]} and need a bed before 20:00'
)
plot_prob_dist(prob_dist_data, title,  
    include_titles=True, truncate_at_beds=40)

png

Approach 2: Using an empirical survival curve

Another approach is to learn from past data about how long it takes patients to be admitted. To illustrate this, the plot_admission_time_survival_curve function will take a dataset of start and end times, and draw a survival curve.

(Note - survival curves are often used for time to mortality or an adverse event; here 'survival' here means how long the patient remained in the ED prior to admission.)

In the fake data I'm using here, I've deliberately set the mean length of stay to be longer than 4 hours, to illustrate the differences between learning from the past data and (later in the notebook) taking an aspirational approach and assuming the ED meets its 4-hour targets.

from patientflow.viz.survival_curve import plot_admission_time_survival_curve
title = 'Survival curve showing probability of still being in the ED after a given elapsed time since arrival'
survival_df = plot_admission_time_survival_curve(train_visits, 
                                   start_time_col="arrival_datetime",
                                   end_time_col="admitted_to_ward_datetime",
                                   title=title,
                                   ylabel='Proportion of patients not yet admitted',
                                   xlabel='Elapsed time since arrival',
                                   target_hours=[4, 8],
                                   annotation_string="{:.1%} of patients admitted\nby {:.0f} hours",
                                   return_df=True
)

png

The function has an option to return the survival curve.

survival_df.head()

	time_hours	survival_probability	event_probability
0	0.000000	1.000000	0.000000
1	1.283333	0.999548	0.000452
2	1.333333	0.999097	0.000903
3	1.350000	0.998645	0.001355
4	1.400000	0.998193	0.001807

From the curve, we can read the probability of still being in the ED seven hours after arrival. The probability of being admitted within seven hours is one minus this number.

from patientflow.calculate.admission_in_prediction_window import get_survival_probability
print(f'Probability of still being in the ED seven hours after arrival: {get_survival_probability(survival_df, 7):.2f}')
print(f'Probability of being admitted within seven hours of arrival: {1-get_survival_probability(survival_df, 7):.2f}')

Probability of still being in the ED seven hours after arrival: 0.52
Probability of being admitted within seven hours of arrival: 0.48

If I'm making a prediction at 12:00 for number of admissions by 20:00, I could divide that 8-hour window into 8 slices of one hour, calculate a mean arrival rate for each hour, and use the survival curve to get the probability of the patients who arrive in each hour being admitted by the end of the prediction window.

First I calculate the arrival rate for a one hour slice:

num_days_in_training_data = (start_validation_set - start_training_set).days
print(f'Number of days in training data: {num_days_in_training_data}')

print(f'Number of admitted patients arriving between 19:00 and 20:00 in training data: {len(train_visits[(train_visits.arrival_datetime.dt.hour == 19) ])}')
print(f'Average number of admitted patients arriving between 19:00 and 20:00 in training data: {(len(train_visits[(train_visits.arrival_datetime.dt.hour == 19) ])/num_days_in_training_data):.2f}')

Number of days in training data: 45
Number of admitted patients arriving between 19:00 and 20:00 in training data: 50
Average number of admitted patients arriving between 19:00 and 20:00 in training data: 1.11

The arrival rate can be multiplied by the probability of admission within the prediction window and used as a weighted mean in a Poisson distribution for each hour. The eight Poisson distributions (one for each hour) can then be convolved into a single distribution. This is shown below.

(As noted above this may seem complicated. I show it because a similar approach informs the aspirational approach that follows.)

from scipy import stats
import numpy as np

from patientflow.calculate.admission_in_prediction_window import get_survival_probability

# Initialize arrays to store probabilities and rates
hours = range(12, 20)
probabilities = []
rates = []

# Calculate probabilities and rates for each hour
for _hour in hours:
    probability_of_admission = 1-get_survival_probability(survival_df, 20-_hour)
    arrival_rate = len(train_visits[(train_visits.arrival_datetime.dt.hour == _hour)])/num_days_in_training_data
    probabilities.append(probability_of_admission)
    rates.append(arrival_rate)

# Create individual Poisson distributions
poisson_dists = [stats.poisson(r*p) for r, p in zip(rates, probabilities)]

# Get PMF for each distribution 
max_value = 20  
x = np.arange(max_value)
pmfs = [dist.pmf(x) for dist in poisson_dists]

# Convolve all distributions together
combined_pmf = pmfs[0]
for pmf in pmfs[1:]:
    combined_pmf = np.convolve(combined_pmf, pmf)

# Create final distribution
combined_dist = stats.rv_discrete(values=(range(len(combined_pmf)), combined_pmf))

The result is a very similar distribution to that calculated by the more simple approach above.

from patientflow.viz.probability_distribution import plot_prob_dist
from patientflow.viz.utils import format_prediction_time
title = (
    f'Probability distribution for number of beds needed for patients'
    f'\nwho will arrive after {format_prediction_time((12,0))} on {snapshot_dates[0]} and need a bed before 20:00.'
    f'\nTime to admission is estimated using an empirical survival curve'
)
plot_prob_dist(combined_dist, title,  
    include_titles=True, truncate_at_beds=40)

png

Using the `EmpiricalIncomingAdmissionPredictor` class

A custom class EmpiricalIncomingAdmissionPredictor has been created, for this purpose, that follows the same logic as the code snippet above.

A survival curve is generated from the training data, and used determine the probability of admission before the end of the prediction window, for a patient who arrives at a particular moment in the window.

To predict how many patients will arrive and need admission within a prediction window (e.g., the next 8 hours), the class breaks this window into smaller time segments based on the yta_time_interval parameter. For example, with 15-minute intervals, an 8-hour window becomes 32 segments.

For each segment, the class calculates two key values: the expected patient arrival rate and the probability that patients arriving in that segment will be admitted by the end of the full window. These values are multiplied together to create a weighted mean for a Poisson distribution representing that segment.

Finally, all 32 segment distributions are combined to produce an overall probability distribution showing the total number of beds likely needed during the prediction window.

I demonstrate how you fit the EmpiricalIncomingAdmissionPredictor below.

from patientflow.predictors.incoming_admission_predictors import EmpiricalIncomingAdmissionPredictor

train_visits_copy = train_visits.copy(deep=True)

yta_model_empirical =  EmpiricalIncomingAdmissionPredictor(verbose=True)
num_days = (start_validation_set - start_training_set).days

# the arrival_datetime column needs to be set as the index of the dataframe
if 'arrival_datetime' in train_visits_copy.columns:
    train_visits_copy.set_index('arrival_datetime', inplace=True)

yta_model_empirical.fit(train_visits_copy, 
                        prediction_window=timedelta(hours=8), 
                        yta_time_interval=timedelta(minutes=15), 
                        prediction_times=prediction_times, 
                        num_days=num_days,
                        start_time_col='arrival_datetime',
                        end_time_col='admitted_to_ward_datetime')

Calculating time-varying arrival rates for data provided, which spans 45 unique dates
EmpiricalIncomingAdmissionPredictor trained for these times: [(6, 0), (9, 30), (12, 0), (15, 30), (22, 0)]
using prediction window of 8:00:00 after the time of prediction
and time interval of 0:15:00 within the prediction window.
The error value for prediction will be 1e-07
To see the weights saved by this model, used the get_weights() method
EmpiricalIncomingAdmissionPredictor has been fitted with survival curve containing 881 time points

EmpiricalIncomingAdmissionPredictor(filters={}, verbose=True)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The survival curve that was calculated from the training set is saved with the object returned

yta_model_empirical.survival_df

	time_hours	survival_probability	event_probability
0	0.000000	1.000000	0.000000
1	1.283333	0.999548	0.000452
2	1.333333	0.999097	0.000903
3	1.350000	0.998645	0.001355
4	1.400000	0.998193	0.001807
...	...	...	...
876	40.333333	0.001807	0.998193
877	40.650000	0.001355	0.998645
878	41.050000	0.000903	0.999097
879	42.666667	0.000452	0.999548
880	44.383333	0.000000	1.000000

881 rows × 3 columns

arrival_rates_by_time_interval = yta_model_empirical.weights['unfiltered'][(12,0)]['arrival_rates']
print(
    f'The calculated arrival rates for the first 10 discrete time intervals '
    f'for the 12:00 prediction time are: {[round(v, 3) for v in arrival_rates_by_time_interval[0:10]]}')

The calculated arrival rates for the first 10 discrete time intervals for the 12:00 prediction time are: [1.289, 1.067, 1.356, 1.2, 1.289, 1.356, 1.2, 1.178, 0.933, 0.889]

prediction_context = {
    'unfiltered': {
        'prediction_time': tuple([12,0])
    }
}
weighted_poisson_empirical = yta_model_empirical.predict(prediction_context)

from patientflow.viz.probability_distribution import plot_prob_dist
from patientflow.viz.utils import format_prediction_time
title = (
    f'Probability distribution for number of beds needed for patients'
    f'\nwho will arrive after {format_prediction_time((12,0))} and need a bed before 20:00.'
    f'\nTime to admission is estimated using an empirical survival curve'
)
plot_prob_dist(weighted_poisson_empirical['unfiltered'], title,  
    include_titles=True, truncate_at_beds=40)

png

It is also possible to generate predictions by specialty, by passing a dictionary comprised of the required subgroups (the key) and a nested dictionary (the value) specifying how to identify them, with a column name (the nested key) and the values to filter from that column.

from patientflow.predictors.incoming_admission_predictors import EmpiricalIncomingAdmissionPredictor

train_visits_copy = train_visits.copy(deep=True)
num_days = (start_validation_set - start_training_set).days
if 'arrival_datetime' in train_visits_copy.columns:
    train_visits_copy.set_index('arrival_datetime', inplace=True)

specialty_filters = filters={
    'medical': {'specialty': 'medical'},
    'surgical': {'specialty': 'surgical'},
    'haem/onc': {'specialty': 'haem/onc'},
    'paediatric': {'specialty': 'paediatric'}
    }
yta_model_by_spec_empirical =  EmpiricalIncomingAdmissionPredictor(filters = specialty_filters, verbose=True)

yta_model_by_spec_empirical.fit(train_visits_copy, 
                        prediction_window=timedelta(hours=8), 
                        yta_time_interval=timedelta(minutes=15), 
                        prediction_times=prediction_times, 
                        num_days=num_days,
                        start_time_col='arrival_datetime',
                        end_time_col='admitted_to_ward_datetime')

Calculating time-varying arrival rates for data provided, which spans 45 unique dates
Calculating time-varying arrival rates for data provided, which spans 45 unique dates
Calculating time-varying arrival rates for data provided, which spans 45 unique dates
Calculating time-varying arrival rates for data provided, which spans 45 unique dates
EmpiricalIncomingAdmissionPredictor trained for these times: [(6, 0), (9, 30), (12, 0), (15, 30), (22, 0)]
using prediction window of 8:00:00 after the time of prediction
and time interval of 0:15:00 within the prediction window.
The error value for prediction will be 1e-07
To see the weights saved by this model, used the get_weights() method
EmpiricalIncomingAdmissionPredictor has been fitted with survival curve containing 881 time points

EmpiricalIncomingAdmissionPredictor(filters={'haem/onc': {'specialty': 'haem/onc'},
                                             'medical': {'specialty': 'medical'},
                                             'paediatric': {'specialty': 'paediatric'},
                                             'surgical': {'specialty': 'surgical'}},
                                    verbose=True)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

from patientflow.viz.probability_distribution import plot_prob_dist
from patientflow.viz.utils import format_prediction_time

for specialty in train_visits.specialty.unique():
    prediction_context = {
        specialty: {
            'prediction_time': tuple([12,0])
        }
}
    weighted_poisson_empirical = yta_model_by_spec_empirical.predict(prediction_context)

    title = (
        f'Probability distribution for number of {specialty} beds needed for patients'
        f'\nwho will arrive after {format_prediction_time((12,0))} and need a bed before 20:00.'
        f'\nTime to admission is estimated using an empirical survival curve'
    )
    plot_prob_dist(weighted_poisson_empirical[specialty], title,  
        include_titles=True, truncate_at_beds=20)

png

Train a weighted Poisson model using an aspirational approach

The model above has learned the rates of arrivals of patients who are later admitted within a prediction window from past data.

A problem with this approach is that rates are learned from periods of poor performance. Currently, in England Emergency Departments have a target of processing all patients within four hours of their arrival time. However, EDs across the country have not hit targets since the end of the Covid pandemic.

The poor performance is illustrated by the chart below, which shows how performance within Emergency Departments (also known as Accident & Emergency Departments, or A&E) has deteriorated over time. The chart is based on NHS England data on A&E Attendances and Emergency Admissions, and was copied from the King's Fund website on 9 June 2025.

Performance is worse for Type 1 A&E attendances; Type 1 refers to patients attending what are considered major emergency departments in hospitals.

from IPython.display import Image
Image(filename='img/thumbnail_KingsFund_AE_performance.png')

png

patientflow offers an aspirational weighted Poisson model, that will calculate each patient's probability of being admitted within the prediction window, if targets are met. Targets are set using the parameters set in config.yaml, as shown below.

from patientflow.load import load_config_file, set_file_paths, set_project_root
project_root = set_project_root()

_, _, _, config_path = set_file_paths(project_root, data_folder_name = 'data-public', verbose = False)
params = load_config_file(config_path)

x1, y1, x2, y2 = params["x1"], params["y1"], params["x2"], params["y2"]

print(f'The aspiration is that within {str(x1)} hours of arrival, {str(y1*100)}% of patients will have been admitted, and that witin {str(x2)} hours of arrival, {str(y2*100)}% of patients will have been admitted')

Inferred project root: /Users/zellaking/Repos/patientflow
The aspiration is that within 4.0 hours of arrival, 76.0% of patients will have been admitted, and that witin 12.0 hours of arrival, 99.0% of patients will have been admitted

The aspiration can be plotted as a parameterised curve, as shown below. It is the equivalent of a survival curve that has been inverted. However, unless the empirical survival curve above, this curve is defined by parameters that can be changed according to the aspirational targets set.

To change the targets, you can vary the values of x1, y1, x2 and y2.

from patientflow.viz.aspirational_curve import plot_curve

figsize = (6,3)

plot_curve(
    title = 'Aspirational curve reflecting a ' + str(int(x1)) + ' hour target for ' + str(int(y1*100)) + \
        '% of patients\nand a '+ str(int(x2)) + ' hour target for ' + str(int(y2*100)) + '% of patients',
    x1 = x1,
    y1 = y1,
    x2 = x2,
    y2 = y2,
    include_titles=True,
    annotate_points=True,
)

png

Below I demonstrate the use of a Weighted Poisson predictor.

Its fit() method will, for each prediction time:

filter the dataframe if a filtering criteria is given (more detail below)
calculate arrival rates for a series of discrete time intervals (where the duration of each time interval is specified as yta_time_interval minutes) within a 24 hour period; yta_time_interval must divide evenly into a 24 hour period (ie be a factor of 24* 60)
return the arrival rates for the intervals between the prediction time and the end of the prediction window in a dictionary; if the data is unfiltered it will use a generic key of 'unfiltered'; if the data is filtered, it will use the filters as keys

The predict() method will:

retrieve the arrival rates saved for the prediction window
for each discrete time interval, using the aspirational curve introduced above, and taking into account the time remaining before end of window, calculate a probability of admission in prediction window
weight the arrival rates for each time interval by this probability
generate a Poisson distribution for each time interval
convolute the distributions to return a single distribution for admissions within the prediction window of patients yet-to-arrive

from patientflow.predictors.incoming_admission_predictors import ParametricIncomingAdmissionPredictor

train_visits_copy = train_visits.copy(deep=True)

yta_model_parametric =  ParametricIncomingAdmissionPredictor(verbose=True)
num_days = (start_validation_set - start_training_set).days
if 'arrival_datetime' in train_visits_copy.columns:
    train_visits_copy.set_index('arrival_datetime', inplace=True)

yta_model_parametric.fit(train_visits_copy, 
              prediction_window=timedelta(hours=8), 
              yta_time_interval=timedelta(minutes=15), 
              prediction_times=prediction_times, 
              num_days=num_days)

Calculating time-varying arrival rates for data provided, which spans 45 unique dates
ParametricIncomingAdmissionPredictor trained for these times: [(6, 0), (9, 30), (12, 0), (15, 30), (22, 0)]
using prediction window of 8:00:00 after the time of prediction
and time interval of 0:15:00 within the prediction window.
The error value for prediction will be 1e-07
To see the weights saved by this model, used the get_weights() method

ParametricIncomingAdmissionPredictor(filters={}, verbose=True)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Below we view the results of the fit method for the 12:00 prediction time.

arrival_rates_by_time_interval = yta_model_parametric.weights['unfiltered'][(12,0)]['arrival_rates']
print(
    f'The calculated arrival rates for the first 10 discrete time intervals '
    f'for the 12:00 prediction time are: {[round(v, 3) for v in arrival_rates_by_time_interval[0:10]]}')

The calculated arrival rates for the first 10 discrete time intervals for the 12:00 prediction time are: [1.289, 1.067, 1.356, 1.2, 1.289, 1.356, 1.2, 1.178, 0.933, 0.889]

To use the weighted poisson for prediction, a prediction_context argument specifies the required prediction time and filtering. The aspirations for time to admission can be changed at any point. Here, I'm going to set the target at 95% within 4 hours.

from patientflow.viz.probability_distribution import plot_prob_dist
from patientflow.viz.utils import format_prediction_time

prediction_context = {
    'unfiltered': {
        'prediction_time': tuple([12,0])
    }
}

aspirational_prediction = yta_model_parametric.predict(prediction_context, x1=x1, y1=y1, x2=x2, y2=y2)

The charts below compare the results of using this weighted predictor to generate a prediction for bed needed for patients yet-to-arrive, if the ED meets the 4-hour target for 95% of patients. The numbers are higher than the equivalent chart above.

title = (
    f'Probability distribution for number of beds needed for patients '
    f'who will arrive after {format_prediction_time((12,0))} on {snapshot_dates[0]} '
    f'\nand need a bed before 20:00. Time to admission estimated using an empirical survival curve'
)
plot_prob_dist(combined_dist, title,  
    include_titles=True, truncate_at_beds=40)

title = (
    f'Probability distribution for number of beds needed for patients '
    f'who will arrive after {format_prediction_time((12,0))} on {snapshot_dates[0]} '
    f'\nand need a bed before 20:00 '
    f'if the ED is meeting the target of {int(x1)} hours for {y1*100}% of patients'
)
plot_prob_dist(aspirational_prediction['unfiltered'], title,  
    include_titles=True,
    truncate_at_beds=40)

png

Summary

Here I have demonstrated the use of patientflow to generate bed counts for groups of patients for which patient-level data is not yet available.

If you have count data on past visits that approximate to a statistical distribution, preparing a model to predict a bed count distribution is simple to do with standard libraries like scipy. You don't need patientflow functions for that.

However there might be cases where the historical data don't reflect the desired performance of the ED, as in the example shown here. In that case, the users of your predictions might be more interested in understanding demand if their ED were performing per their aspirations. patientflow provides functions that enable you to produce such predictions.

Postscript - how to evaluate bed counts based on survival curves

The survival curve plot function can be used with multiple datasets. This may be useful to check the alignment of training set and test set survival curves. If the ED has become slower to process patients, this difference may show up in a difference between survival curves. We encountered this issue in our own work, and showed how to mitigate it using a sliding window approach for the survival curve in our Nature Digital Medicine paper. The problem does not show up below because these curves are based on synthetic data, but it might in your dataset.

from patientflow.viz.survival_curve import plot_admission_time_survival_curve
title = 'Compare survival curves for train, valid and test sets'
survival_df = plot_admission_time_survival_curve([train_visits, valid_visits, test_visits], 
                                                 labels=['train', 'valid', 'test'],
                                   start_time_col="arrival_datetime",
                                   end_time_col="admitted_to_ward_datetime",
                                   title=title,
                                   ylabel='Proportion of patients not yet admitted',
                                   xlabel='Elapsed time since arrival',
                                   return_df=False
)

png

The function below can be used to compare the predicted with the observed bed counts. This function will retrieve the observed counts and save them alongside their predicted distribution from the model.

from patientflow.aggregate import get_prob_dist_using_survival_curve
from datetime import timedelta
from patientflow.load import get_model_key

prob_dist_dict_all = {}
test_set_dates = [dt for dt in snapshot_dates if dt >= start_test_set]

for prediction_time in prediction_times:
    model_key = get_model_key('yet_to_arrive', prediction_time)
    prob_dist_dict_all[model_key]= get_prob_dist_using_survival_curve(
        snapshot_dates=test_set_dates,
        test_visits=test_visits,
        category='unfiltered',
        prediction_time=prediction_time,
        prediction_window=timedelta(minutes=8*60),
        start_time_col='arrival_datetime',
        end_time_col='admitted_to_ward_datetime',
        model=yta_model_empirical,
        verbose=False,
    )

The result can be plotted using the same functions. The model appears as a series of vertical lines because the EmpiricalIncomingAdmissionPredictor is trained only on a time of day so there is minimal variation in the predicted distributions. I have included it here as a placeholder, to show how modelling of yet-to-arrive patients using past data on time to admission could be done, and how it could be evaluated. You could modify the function to include a weekday/weekend variable, or replace it with a different approach based on moving averages (such as ARIMA).

from patientflow.viz.epudd import plot_epudd

plot_epudd(prediction_times,
                 prob_dist_dict_all,
                 model_name='yet_to_arrive',
                 suptitle="EPUDD plots for yet-to-arrive patients for each prediction time",
                 plot_all_bounds=False)

png

3c. Predict bed counts without using patient snapshots

Create fake arrival times

Predicting future arrivals, with model trained on past data.

Approach 1 using past data: Poisson model trained on past arrival rates of patients who both arrived and were admitted within a prediction window

Approach 2: Using an empirical survival curve

Using the EmpiricalIncomingAdmissionPredictor class

Train a weighted Poisson model using an aspirational approach

Summary

Postscript - how to evaluate bed counts based on survival curves

Using the `EmpiricalIncomingAdmissionPredictor` class