API reference
PatientFlow: A package for predicting short-term hospital bed demand.
This package provides tools and models for analysing patient flow data and making predictions about emergency demand, elective demand, and hospital discharges.
aggregate
Aggregate Prediction From Patient-Level Probabilities
This submodule provides functions to aggregate patient-level predicted probabilities into a probability distribution. The module uses symbolic mathematics to generate and manipulate expressions, enabling the computation of aggregate probabilities based on individual patient-level predictions.
Functions:
Name | Description |
---|---|
create_symbols : function |
Generate a sequence of symbolic objects intended for use in mathematical expressions. |
compute_core_expression : function |
Compute a symbolic expression involving a basic mathematical operation with a symbol and a constant. |
build_expression : function |
Construct a cumulative product expression by combining individual symbolic expressions. |
expression_subs : function |
Substitute values into a symbolic expression based on a mapping from symbols to predictions. |
return_coeff : function |
Extract the coefficient of a specified power from an expanded symbolic expression. |
model_input_to_pred_proba : function |
Use a predictive model to convert model input data into predicted probabilities. |
pred_proba_to_agg_predicted : function |
Convert individual probability predictions into aggregate predicted probability distribution using optional weights. |
get_prob_dist_for_prediction_moment : function |
Calculate both predicted distributions and observed values for a given date using test data. |
get_prob_dist : function |
Calculate probability distributions for each snapshot date based on given model predictions. |
get_prob_dist_without_patient_snapshots : function |
Calculate probability distributions for each snapshot date using an EmpiricalSurvivalPredictor. |
build_expression(syms, n)
Construct a cumulative product expression by combining individual symbolic expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
syms
|
iterable
|
Iterable containing symbols to use in the expressions. |
required |
n
|
int
|
The number of terms to include in the cumulative product. |
required |
Returns:
Type | Description |
---|---|
Expr
|
The cumulative product of the expressions. |
Source code in src/patientflow/aggregate.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
compute_core_expression(ri, s)
Compute a symbolic expression involving a basic mathematical operation with a symbol and a constant.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ri
|
float
|
The constant value to substitute into the expression. |
required |
s
|
Symbol
|
The symbolic object used in the expression. |
required |
Returns:
Type | Description |
---|---|
Expr
|
The symbolic expression after substitution. |
Source code in src/patientflow/aggregate.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
create_symbols(n)
Generate a sequence of symbolic objects intended for use in mathematical expressions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n
|
int
|
Number of symbols to create. |
required |
Returns:
Type | Description |
---|---|
tuple
|
A tuple containing the generated symbolic objects. |
Source code in src/patientflow/aggregate.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
|
expression_subs(expression, n, predictions)
Substitute values into a symbolic expression based on a mapping from symbols to predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expression
|
Expr
|
The symbolic expression to perform substitution on. |
required |
n
|
int
|
Number of symbols and corresponding predictions. |
required |
predictions
|
list
|
List of numerical predictions to substitute. |
required |
Returns:
Type | Description |
---|---|
Expr
|
The expression after performing the substitution. |
Source code in src/patientflow/aggregate.py
115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
|
get_prob_dist(snapshots_dict, X_test, y_test, model, weights=None, verbose=False, category_filter=None, normal_approx_threshold=30)
Calculate probability distributions for each snapshot date based on given model predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
snapshots_dict
|
dict
|
A dictionary mapping snapshot dates to indices in |
required |
X_test
|
DataFrame or array - like
|
Input test data to be passed to the model. |
required |
y_test
|
array - like
|
Observed target values. |
required |
model
|
object or TrainedClassifier
|
Either a predictive model which provides a |
required |
weights
|
Series
|
A Series containing weights for the test data points, which may influence the prediction,
by default None. If provided, the weights should be indexed similarly to |
None
|
verbose
|
(bool, optional(default=False))
|
If True, print progress information. |
False
|
category_filter
|
array - like
|
Boolean mask indicating which samples belong to the specific outcome category being analyzed. Should be the same length as y_test. |
None
|
normal_approx_threshold
|
(int, optional(default=30))
|
If the number of rows in a snapshot exceeds this threshold, use a Normal distribution approximation. Set to None or a very large number to always use the exact symbolic computation. |
30
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary mapping snapshot dates to probability distributions. |
Raises:
Type | Description |
---|---|
ValueError
|
If snapshots_dict is not properly formatted or empty. If model has no predict_proba method and is not a TrainedClassifier. |
Source code in src/patientflow/aggregate.py
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 |
|
get_prob_dist_for_prediction_moment(X_test, model, weights=None, inference_time=False, y_test=None, category_filter=None, normal_approx_threshold=30)
Calculate both predicted distributions and observed values for a given date using test data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X_test
|
array - like
|
Test features for a specific snapshot date. |
required |
model
|
object or TrainedClassifier
|
Either a predictive model which provides a |
required |
weights
|
array - like
|
Weights to apply to the predictions for aggregate calculation. |
None
|
inference_time
|
(bool, optional(default=False))
|
If True, do not calculate or return actual aggregate. |
False
|
y_test
|
array - like
|
Actual outcomes corresponding to the test features. Required if inference_time is False. |
None
|
category_filter
|
array - like
|
Boolean mask indicating which samples belong to the specific outcome category being analyzed. Should be the same length as y_test. |
None
|
normal_approx_threshold
|
(int, optional(default=30))
|
If the number of rows in X_test exceeds this threshold, use a Normal distribution approximation. Set to None or a very large number to always use the exact symbolic computation. |
30
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary with keys 'agg_predicted' and, if inference_time is False, 'agg_observed'. |
Raises:
Type | Description |
---|---|
ValueError
|
If y_test is not provided when inference_time is False. If model has no predict_proba method and is not a TrainedClassifier. |
Source code in src/patientflow/aggregate.py
280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 |
|
get_prob_dist_using_survival_curve(snapshot_dates, test_visits, category, prediction_time, prediction_window, start_time_col, end_time_col, model, verbose=False)
Calculate probability distributions for each snapshot date using an EmpiricalIncomingAdmissionPredictor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
snapshot_dates
|
array - like
|
Array of dates for which to calculate probability distributions. |
required |
test_visits
|
DataFrame
|
DataFrame containing test visit data. Must have either: - start_time_col as a column and end_time_col as a column, or - start_time_col as the index and end_time_col as a column |
required |
category
|
str
|
Category to use for predictions (e.g., 'medical', 'surgical') |
required |
prediction_time
|
tuple
|
Tuple of (hour, minute) representing the time of day for predictions |
required |
prediction_window
|
timedelta
|
The prediction window duration |
required |
start_time_col
|
str
|
Name of the column containing start times (or index name if using index) |
required |
end_time_col
|
str
|
Name of the column containing end times |
required |
model
|
EmpiricalSurvivalPredictor
|
A fitted instance of EmpiricalSurvivalPredictor |
required |
verbose
|
(bool, optional(default=False))
|
If True, print progress information |
False
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary mapping snapshot dates to probability distributions. |
Raises:
Type | Description |
---|---|
ValueError
|
If test_visits does not have the required columns or if model is not fitted. |
Source code in src/patientflow/aggregate.py
496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 |
|
model_input_to_pred_proba(model_input, model)
Use a predictive model to convert model input data into predicted probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_input
|
array - like
|
The input data to the model, typically as features used for predictions. |
required |
model
|
object
|
A model object with a |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A pandas DataFrame containing the predicted probabilities for the positive class, with one column labeled 'pred_proba'. |
Source code in src/patientflow/aggregate.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
|
pred_proba_to_agg_predicted(predictions_proba, weights=None, normal_approx_threshold=30)
Convert individual probability predictions into aggregate predicted probability distribution using optional weights. Uses a Normal approximation for large datasets (> normal_approx_threshold) for better performance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
predictions_proba
|
DataFrame
|
A DataFrame containing the probability predictions; must have a single column named 'pred_proba'. |
required |
weights
|
array - like
|
An array of weights, of the same length as the DataFrame rows, to apply to each prediction. |
None
|
normal_approx_threshold
|
(int, optional(default=30))
|
If the number of rows in predictions_proba exceeds this threshold, use a Normal distribution approximation. Set to None or a very large number to always use the exact symbolic computation. |
30
|
Returns:
Type | Description |
---|---|
DataFrame
|
A DataFrame with a single column 'agg_proba' showing the aggregated probability, indexed from 0 to n, where n is the number of predictions. |
Source code in src/patientflow/aggregate.py
187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
|
return_coeff(expression, i)
Extract the coefficient of a specified power from an expanded symbolic expression.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expression
|
Expr
|
The expression to expand and extract from. |
required |
i
|
int
|
The power of the term whose coefficient is to be extracted. |
required |
Returns:
Type | Description |
---|---|
number
|
The coefficient of the specified power in the expression. |
Source code in src/patientflow/aggregate.py
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
calculate
Calculation module for patient flow metrics.
This module provides functions for calculating various patient flow metrics such as arrival rates and admission probabilities within prediction windows.
admission_in_prediction_window
This module provides functions to model and analyze a curve consisting of an exponential growth segment followed by an exponential decay segment. It includes functions to create the curve, calculate specific points on it, and evaluate probabilities based on its shape.
Its intended use is to derive the probability of a patient being admitted to a hospital within a certain elapsed time after their arrival in the Emergency Department (ED), given the hospital's aspirations for the time it takes patients to be admitted. For this purpose, two points on the curve are required as parameters:
* (x1,y1) : The target proportion of patients y1 (eg 76%) who have been admitted or discharged by time x1 (eg 4 hours).
* (x2, y2) : The time x2 by which all but a small proportion y2 of patients have been admitted.
It is assumed that values of y where x < x1 is a growth curve grow exponentially towards x1 and that (x1,y1) the curve switches to a decay curve.
Functions:
Name | Description |
---|---|
growth_curve : function |
Calculate exponential growth at a point where x < x1. |
decay_curve : function |
Calculate exponential decay at a point where x >= x1. |
create_curve : function |
Generate a full curve with both growth and decay segments. |
get_y_from_aspirational_curve : function |
Read from the curve a value for y, the probability of being admitted, for a given moment x hours after arrival |
calculate_probability : function |
Compute the probability of a patient being admitted by the end of a prediction window, given how much time has elapsed since their arrival. |
get_survival_probability : function |
Calculate the probability of a patient still being in the ED after a certain time using survival curve data. |
calculate_probability(elapsed_los, prediction_window, x1, y1, x2, y2)
Calculates the probability of an admission occurring within a specified prediction window after the moment of prediction, based on the patient's elapsed time in the ED prior to the moment of prediction and the length of the window
Parameters:
Name | Type | Description | Default |
---|---|---|---|
elapsed_los
|
timedelta
|
The elapsed time since the patient arrived at the ED. |
required |
prediction_window
|
timedelta
|
The duration of the prediction window after the point of prediction, for which the probability is calculated. |
required |
x1
|
float
|
The time target for the first key point on the curve. |
required |
y1
|
float
|
The proportion target for the first key point (e.g., 76% of patients admitted by time x1). |
required |
x2
|
float
|
The time target for the second key point on the curve. |
required |
y2
|
float
|
The proportion target for the second key point (e.g., 99% of patients admitted by time x2). |
required |
Returns:
Type | Description |
---|---|
float
|
The probability of the event occurring within the given prediction window. |
Edge Case Handling
When elapsed_los is extremely high, such as values significantly greater than x2, the admission probability prior to the current time (prob_admission_prior_to_now
) can reach 1.0 despite the curve being asymptotic. This scenario can cause computational errors when calculating the conditional probability, as it involves a division by zero. In such cases, this function directly returns a probability of 1.0, reflecting certainty of admission.
Example
Calculate the probability that a patient, who has already been in the ED for 3 hours, will be admitted in the next 2 hours. The ED targets that 76% of patients are admitted or discharged within 4 hours, and 99% within 12 hours.
from datetime import timedelta calculate_probability(timedelta(hours=3), timedelta(hours=2), 4, 0.76, 12, 0.99)
Source code in src/patientflow/calculate/admission_in_prediction_window.py
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 |
|
create_curve(x1, y1, x2, y2, a=0.01, generate_values=False)
Generates parameters for an exponential growth and decay curve. Optionally generates x-values and corresponding y-values across a default or specified range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x1
|
float
|
The x-value where the curve transitions from growth to decay. |
required |
y1
|
float
|
The y-value at the transition point x1. |
required |
x2
|
float
|
The x-value defining the end of the decay curve for calculation purposes. |
required |
y2
|
float
|
The y-value at x2, intended to fine-tune the decay rate. |
required |
a
|
float
|
The initial value coefficient for the growth curve, defaults to 0.01. |
0.01
|
generate_values
|
bool
|
Flag to determine whether to generate x-values and y-values for visualization purposes. |
False
|
Returns:
Type | Description |
---|---|
tuple
|
If generate_values is False, returns (gamma, lamda, a). If generate_values is True, returns (gamma, lamda, a, x_values, y_values). |
Source code in src/patientflow/calculate/admission_in_prediction_window.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
decay_curve(x, x1, y1, lamda)
Calculate the exponential decay value at a given x using specified parameters. The function supports both scalar and array inputs for x.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
float or ndarray
|
The x-value(s) at which to evaluate the curve. |
required |
x1
|
float
|
The x-value where the growth curve transitions to the decay curve. |
required |
y1
|
float
|
The y-value at the transition point, where the decay curve starts. |
required |
lamda
|
float
|
The decay rate coefficient. |
required |
Returns:
Type | Description |
---|---|
float or ndarray
|
The y-value(s) of the decay curve at x. |
Source code in src/patientflow/calculate/admission_in_prediction_window.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
get_survival_probability(survival_df, time_hours)
Calculate the probability of a patient still being in the ED after a specified time using survival curve data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
survival_df
|
DataFrame
|
DataFrame containing survival curve data with columns: - time_hours: Time points in hours - survival_probability: Probability of still being in ED at each time point |
required |
time_hours
|
float
|
The time point (in hours) at which to calculate the survival probability |
required |
Returns:
Type | Description |
---|---|
float
|
The probability of still being in the ED at the specified time |
Notes
- If the exact time_hours is not in the survival curve data, the function will interpolate between the nearest time points
- If time_hours is less than the minimum time in the data, returns 1.0
- If time_hours is greater than the maximum time in the data, returns the last known survival probability
Examples:
>>> survival_df = pd.DataFrame({
... 'time_hours': [0, 2, 4, 6],
... 'survival_probability': [1.0, 0.8, 0.5, 0.2]
... })
>>> get_survival_probability(survival_df, 3.5)
0.65 # interpolated between 0.8 and 0.5
Source code in src/patientflow/calculate/admission_in_prediction_window.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 |
|
get_y_from_aspirational_curve(x, x1, y1, x2, y2)
Calculate the probability y that a patient will have been admitted by a specified x after their arrival, by reading from the aspirational curve that has been constrained to pass through points (x1, y1) and (x2, y2) with an exponential growth curve where x < x1 and an exponential decay where x < x2
The function handles scalar or array inputs for x and determines y using either an exponential growth curve (for x < x1) or an exponential decay curve (for x >= x1). The curve parameters are derived to ensure the curve passes through specified points (x1, y1) and (x2, y2).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
float or ndarray
|
The x-coordinate(s) at which to calculate the y-value on the curve. Can be a single value or an array of values. |
required |
x1
|
float
|
The x-coordinate of the first key point on the curve, where the growth phase ends and the decay phase begins. |
required |
y1
|
float
|
The y-coordinate of the first key point (x1), representing the target proportion of patients admitted by time x1. |
required |
x2
|
float
|
The x-coordinate of the second key point on the curve, beyond which all but a few patients are expected to be admitted. |
required |
y2
|
float
|
The y-coordinate of the second key point (x2), representing the target proportion of patients admitted by time x2. |
required |
Returns:
Type | Description |
---|---|
float or ndarray
|
The calculated y-value(s) (probability of admission) at the given x. The type of the return matches the input type for x (either scalar or array). |
Source code in src/patientflow/calculate/admission_in_prediction_window.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
growth_curve(x, a, gamma)
Calculate the exponential growth value at a given x using specified parameters. The function supports both scalar and array inputs for x.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
float or ndarray
|
The x-value(s) at which to evaluate the curve. |
required |
a
|
float
|
The coefficient that defines the starting point of the growth curve when x is 0. |
required |
gamma
|
float
|
The growth rate coefficient of the curve. |
required |
Returns:
Type | Description |
---|---|
float or ndarray
|
The y-value(s) of the growth curve at x. |
Source code in src/patientflow/calculate/admission_in_prediction_window.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
arrival_rates
Calculate and process time-varying arrival rates and admission probabilities.
This module provides functions for calculating arrival rates, admission probabilities, and unfettered demand rates for inpatient arrivals using an aspirational approach.
Functions:
Name | Description |
---|---|
time_varying_arrival_rates : function |
Calculate arrival rates for each time interval across the dataset's date range. |
time_varying_arrival_rates_lagged : function |
Create lagged arrival rates based on time intervals. |
admission_probabilities : function |
Compute cumulative and hourly admission probabilities using aspirational curves. |
weighted_arrival_rates : function |
Aggregate weighted arrival rates for specific time intervals. |
unfettered_demand_by_hour : function |
Estimate inpatient demand by hour using historical data and aspirational curves. |
count_yet_to_arrive : function |
Count patients who arrived after prediction times and were admitted within prediction windows. |
Notes
- All times are handled in local timezone
- Arrival rates are normalized by the number of unique days in the dataset
- Demand calculations consider both historical patterns and admission probabilities
- Time intervals must divide evenly into 24 hours
- Aspirational curves use (x1,y1) and (x2,y2) coordinates to model admission probabilities
Examples:
>>> # Generate random arrival times over a week
>>> np.random.seed(42) # For reproducibility
>>> n_arrivals = 1000
>>> random_times = [
... pd.Timestamp('2024-01-01') +
... pd.Timedelta(days=np.random.randint(0, 7)) +
... pd.Timedelta(hours=np.random.randint(0, 24)) +
... pd.Timedelta(minutes=np.random.randint(0, 60))
... for _ in range(n_arrivals)
... ]
>>> df = pd.DataFrame(index=sorted(random_times))
>>>
>>> # Calculate various rates and demand
>>> rates = time_varying_arrival_rates(df, yta_time_interval=60)
>>> lagged_rates = time_varying_arrival_rates_lagged(df, lagged_by=4)
>>> demand = unfettered_demand_by_hour(df, x1=4, y1=0.8, x2=8, y2=0.95)
admission_probabilities(hours_since_arrival, x1, y1, x2, y2)
Calculate probability of admission for each hour since arrival.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hours_since_arrival
|
ndarray
|
Array of hours since arrival. |
required |
x1
|
float
|
First x-coordinate of the aspirational curve. |
required |
y1
|
float
|
First y-coordinate of the aspirational curve. |
required |
x2
|
float
|
Second x-coordinate of the aspirational curve. |
required |
y2
|
float
|
Second y-coordinate of the aspirational curve. |
required |
Returns:
Type | Description |
---|---|
Tuple[ndarray, ndarray]
|
A tuple containing: - np.ndarray: Cumulative admission probabilities - np.ndarray: Hourly admission probabilities |
Notes
The aspirational curve is defined by two points (x1,y1) and (x2,y2) and is used to model the probability of admission over time.
Source code in src/patientflow/calculate/arrival_rates.py
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
|
count_yet_to_arrive(df, snapshot_dates, prediction_times, prediction_window_hours)
Count patients who arrived after prediction times and were admitted within prediction windows.
This function counts patients who arrived after specified prediction times and were admitted to a ward within the specified prediction window for each combination of snapshot date and prediction time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
A DataFrame containing patient data with 'arrival_datetime', 'admitted_to_ward_datetime', and 'patient_id' columns. |
required |
snapshot_dates
|
list
|
List of dates (datetime.date objects) to analyze. |
required |
prediction_times
|
list
|
List of (hour, minute) tuples representing prediction times. |
required |
prediction_window_hours
|
float
|
Length of prediction window in hours after the prediction time. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with columns: - 'snapshot_date': The date of the snapshot - 'prediction_time': Tuple of (hour, minute) for the prediction time - 'count': Number of unique patients who arrived after prediction time and were admitted within the prediction window |
Raises:
Type | Description |
---|---|
TypeError
|
If df is not a DataFrame or if required columns are missing. |
ValueError
|
If prediction_window_hours is not positive. |
Notes
This function is useful for analyzing historical patterns of patient arrivals and admissions to inform predictive models for emergency department demand. Only patients with non-null admitted_to_ward_datetime are counted.
Examples:
>>> import pandas as pd
>>> from datetime import date, time
>>> prediction_times = [(12, 0), (15, 30)]
>>> snapshot_dates = [date(2023, 1, 1), date(2023, 1, 2)]
>>> results = count_yet_to_arrive(df, snapshot_dates, prediction_times, 8.0)
Source code in src/patientflow/calculate/arrival_rates.py
553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 |
|
process_arrival_rates(arrival_rates_dict)
Process arrival rates dictionary into formats needed for plotting.
Parameters
Parameters
arrival_rates_dict : Dict[datetime.time, float]
Mapping of times to arrival rates.
Returns
Returns
Tuple[List[float], List[str], List[int]]
A tuple containing:
- List[float]: Arrival rate values
- List[str]: Formatted hour range strings (e.g., "09-
10") - List[int]: Integers for x-axis positioning
Notes
Notes
The hour labels are formatted with line breaks for better plot readability.
Source code in src/patientflow/calculate/arrival_rates.py
297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 |
|
time_varying_arrival_rates(df, yta_time_interval, num_days=None, verbose=False)
Calculate the time-varying arrival rates for a dataset indexed by datetime.
This function computes the arrival rates for each time interval specified, across the entire date range present in the dataframe. The arrival rate is calculated as the number of entries in the dataframe for each time interval, divided by the number of days in the dataset's timespan.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
A DataFrame indexed by datetime, representing the data for which arrival rates are to be calculated. The index of the DataFrame should be of datetime type. |
required |
yta_time_interval
|
int or timedelta
|
The time interval for which the arrival rates are to be calculated.
If int, assumed to be in minutes. If timedelta, will be converted to minutes.
For example, if |
required |
num_days
|
int
|
The number of days that the DataFrame spans. If not provided, the number of days is calculated from the date of the min and max arrival datetimes. |
None
|
verbose
|
bool
|
If True, enable info-level logging. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
OrderedDict[time, float]
|
A dictionary mapping times to arrival rates, where times are datetime.time objects and rates are float values. |
Raises:
Type | Description |
---|---|
TypeError
|
If 'df' is not a pandas DataFrame, 'yta_time_interval' is not an integer or timedelta, or the DataFrame index is not a DatetimeIndex. |
ValueError
|
If 'yta_time_interval' is less than or equal to 0 or does not divide evenly into 24 hours. |
Notes
The minimum and maximum dates in the dataset are used to determine the timespan if num_days is not provided.
Source code in src/patientflow/calculate/arrival_rates.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 |
|
time_varying_arrival_rates_lagged(df, lagged_by, num_days=None, yta_time_interval=60)
Calculate lagged time-varying arrival rates for a dataset indexed by datetime.
This function first calculates the basic arrival rates and then adjusts them by a specified lag time, returning the rates sorted by the lagged times.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
A DataFrame indexed by datetime, representing the data for which arrival rates are to be calculated. The index must be a DatetimeIndex. |
required |
lagged_by
|
int
|
Number of hours to lag the arrival times. |
required |
num_days
|
int
|
The number of days that the DataFrame spans. If not provided, the number of days is calculated from the date of the min and max arrival datetimes. |
None
|
yta_time_interval
|
int or timedelta
|
The time interval for which the arrival rates are to be calculated. If int, assumed to be in minutes. If timedelta, will be converted to minutes. Defaults to 60. |
60
|
Returns:
Type | Description |
---|---|
OrderedDict[time, float]
|
A dictionary mapping lagged times (datetime.time objects) to their corresponding arrival rates. |
Raises:
Type | Description |
---|---|
TypeError
|
If df is not a DataFrame, lagged_by is not an integer, yta_time_interval is not an integer or timedelta, or DataFrame index is not DatetimeIndex. |
ValueError
|
If lagged_by is negative or yta_time_interval is not positive. |
Notes
The lagged times are calculated by adding the specified number of hours to each time in the original arrival rates dictionary.
Source code in src/patientflow/calculate/arrival_rates.py
203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
|
unfettered_demand_by_hour(df, x1, y1, x2, y2, yta_time_interval=60, max_hours_since_arrival=10, num_days=None)
Calculate true inpatient demand by hour based on historical arrival data.
This function estimates demand rates using historical arrival data and an aspirational curve for admission probabilities. It takes a DataFrame of historical arrivals and parameters defining an aspirational curve to calculate hourly demand rates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
A DataFrame indexed by datetime, representing historical arrival data. The index must be a DatetimeIndex. |
required |
x1
|
float
|
First x-coordinate of the aspirational curve. |
required |
y1
|
float
|
First y-coordinate of the aspirational curve (0-1). |
required |
x2
|
float
|
Second x-coordinate of the aspirational curve. |
required |
y2
|
float
|
Second y-coordinate of the aspirational curve (0-1). |
required |
yta_time_interval
|
int or timedelta
|
Time interval for which the arrival rates are to be calculated. If int, assumed to be in minutes. If timedelta, will be converted to minutes. Defaults to 60. |
60
|
max_hours_since_arrival
|
int
|
Maximum hours since arrival to consider. Defaults to 10. |
10
|
num_days
|
int
|
The number of days that the DataFrame spans. If not provided, the number of days is calculated from the date of the min and max arrival datetimes. |
None
|
Returns:
Type | Description |
---|---|
OrderedDict[time, float]
|
A dictionary mapping times (datetime.time objects) to their corresponding demand rates. |
Raises:
Type | Description |
---|---|
TypeError
|
If df is not a DataFrame, coordinates are not floats, or DataFrame index is not DatetimeIndex. |
ValueError
|
If coordinates are outside valid ranges, yta_time_interval is not positive, or doesn't divide evenly into 24 hours. |
Notes
The function combines historical arrival patterns with admission probabilities to estimate true inpatient demand. The aspirational curve is used to model how admission probabilities change over time.
Source code in src/patientflow/calculate/arrival_rates.py
409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 |
|
weighted_arrival_rates(weighted_rates, elapsed_hours, hour_idx, num_intervals)
Calculate sum of weighted arrival rates for a specific time interval.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
weighted_rates
|
ndarray
|
Array of weighted arrival rates. |
required |
elapsed_hours
|
range
|
Range of elapsed hours to consider. |
required |
hour_idx
|
int
|
Current interval index. |
required |
num_intervals
|
int
|
Total number of intervals in a day. |
required |
Returns:
Type | Description |
---|---|
float
|
Sum of weighted arrival rates. |
Notes
The function calculates the sum of weighted arrival rates by iterating through the elapsed hours and considering the appropriate interval index for each hour.
Source code in src/patientflow/calculate/arrival_rates.py
376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 |
|
survival_curve
calculate_survival_curve(df, start_time_col, end_time_col)
Calculate survival curve data from patient visit data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing patient visit data |
required |
start_time_col
|
str
|
Name of the column containing the start time (e.g., arrival_datetime) |
required |
end_time_col
|
str
|
Name of the column containing the end time (e.g., departure_datetime) |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with columns: - time_hours: Time points in hours - survival_probability: Survival probabilities at each time point - event_probability: Event probabilities (1 - survival_probability) |
Source code in src/patientflow/calculate/survival_curve.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
errors
Custom exception classes for model loading and validation.
This module defines specialized exceptions used during model loading
Classes:
Name | Description |
---|---|
ModelLoadError |
Raised when a model fails to load due to an unspecified error. |
MissingKeysError |
Raised when expected keys are missing from a dictionary of special parameters. |
MissingKeysError
Bases: ValueError
Exception raised when required keys are missing from special_params.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
missing_keys
|
list or set
|
The keys that are required but missing from the input dictionary. |
required |
Attributes:
Name | Type | Description |
---|---|---|
missing_keys |
list or set
|
Stores the missing keys that caused the exception. |
Source code in src/patientflow/errors.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
ModelLoadError
Bases: Exception
Exception raised when a model fails to load.
This generic exception can be used to signal a failure during the model loading process due to unexpected issues such as file corruption, invalid formats, or unsupported configurations.
Source code in src/patientflow/errors.py
16 17 18 19 20 21 22 23 24 25 |
|
evaluate
Patient Flow Evaluation Module
This module provides functions for evaluating and comparing different prediction models for non-clincal outcomes in a healthcare setting. It includes utilities for calculating metrics such as Mean Absolute Error (MAE) and Mean Percentage Error (MPE), as well as functions for predicting admissions based on historical data and combining different prediction models.
Functions:
Name | Description |
---|---|
calculate_results : function |
Calculate evaluation metrics based on expected and observed values |
calc_mae_mpe : function |
Calculate MAE and MPE for probability distribution predictions |
calculate_admission_probs_relative_to_prediction : function |
Calculate admission probabilities for arrivals relative to a prediction time window |
get_arrivals_with_admission_probs : function |
Get arrivals before and after prediction time with their admission probabilities |
calculate_weighted_observed : function |
Calculate actual admissions assuming ED targets are met |
create_time_mask : function |
Create a mask for times before/after a specific hour:minute |
predict_using_previous_weeks : function |
Predict admissions using average from previous weeks |
evaluate_six_week_average : function |
Evaluate the six-week average prediction model |
combine_distributions : function |
Combine two probability distributions using convolution |
evaluate_combined_model : function |
Evaluate a combined prediction model |
calc_mae_mpe(prob_dist_dict_all, use_most_probable=False)
Calculate MAE and MPE for all prediction times in the given probability distribution dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prob_dist_dict_all
|
Dict[Any, Dict[Any, Dict[str, Any]]]
|
Nested dictionary containing probability distributions. |
required |
use_most_probable
|
bool
|
Whether to use the most probable value or mathematical expectation of the distribution. Default is False. |
False
|
Returns:
Type | Description |
---|---|
Dict[Any, Dict[str, Union[List[Union[int, float]], float]]]
|
Dictionary of results sorted by prediction time, containing: - expected : List[Union[int, float]] Expected values for each prediction - observed : List[float] Observed values for each prediction - mae : float Mean Absolute Error - mpe : float Mean Percentage Error |
Source code in src/patientflow/evaluate.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
calculate_admission_probs_relative_to_prediction(df, prediction_datetime, prediction_window, x1, y1, x2, y2, is_before=True)
Calculate admission probabilities for arrivals relative to a prediction time window.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing arrival_datetime column. |
required |
prediction_datetime
|
datetime
|
Datetime for prediction window start. |
required |
prediction_window
|
int
|
Window length in minutes. |
required |
x1
|
float
|
First x-coordinate for aspirational curve. |
required |
y1
|
float
|
First y-coordinate for aspirational curve. |
required |
x2
|
float
|
Second x-coordinate for aspirational curve. |
required |
y2
|
float
|
Second y-coordinate for aspirational curve. |
required |
is_before
|
bool
|
Boolean indicating if arrivals are before prediction time. Default is True. |
True
|
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with added probability columns: - hours_before_pred_window : float Hours before prediction window (if is_before=True) - hours_after_pred_window : float Hours after prediction window (if is_before=False) - prob_admission_before_pred_window : float Probability of admission before prediction window - prob_admission_in_pred_window : float Probability of admission within prediction window |
Source code in src/patientflow/evaluate.py
176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
|
calculate_results(expected_values, observed_values)
Calculate evaluation metrics based on expected and observed values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
expected_values
|
List[Union[int, float]]
|
List of expected values. |
required |
observed_values
|
List[float]
|
List of observed values. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Union[List[Union[int, float]], float]]
|
Dictionary containing: - expected : List[Union[int, float]] Original expected values - observed : List[float] Original observed values - mae : float Mean Absolute Error - mpe : float Mean Percentage Error |
Source code in src/patientflow/evaluate.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
|
calculate_weighted_observed(df, dt, prediction_window, x1, y1, x2, y2, prediction_time)
Calculate weighted observed admissions for a specific date and prediction window.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame with arrival_datetime column. |
required |
dt
|
date
|
Target date for calculation. |
required |
prediction_window
|
int
|
Window length in minutes. |
required |
x1
|
float
|
First x-coordinate for aspirational curve. |
required |
y1
|
float
|
First y-coordinate for aspirational curve. |
required |
x2
|
float
|
Second x-coordinate for aspirational curve. |
required |
y2
|
float
|
Second y-coordinate for aspirational curve. |
required |
prediction_time
|
tuple
|
Tuple of (hour, minute) for prediction time. |
required |
Returns:
Type | Description |
---|---|
float
|
Weighted sum of observed admissions for the specified time period. |
Source code in src/patientflow/evaluate.py
344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 |
|
combine_distributions(dist1, dist2)
Combine two probability distributions using convolution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dist1
|
DataFrame
|
First probability distribution. |
required |
dist2
|
DataFrame
|
Second probability distribution. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
Combined probability distribution with columns: - agg_predicted : float Combined probability values |
Source code in src/patientflow/evaluate.py
621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 |
|
create_time_mask(df, hour, minute)
Create a mask for times before/after a specific hour:minute.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing arrival_datetime column. |
required |
hour
|
int
|
Target hour (0-23). |
required |
minute
|
int
|
Target minute (0-59). |
required |
Returns:
Type | Description |
---|---|
Series
|
Boolean mask indicating times after the specified hour:minute. |
Source code in src/patientflow/evaluate.py
401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 |
|
evaluate_combined_model(prob_dist_dict_all, df, yta_preds, prediction_window, x1, y1, x2, y2, prediction_time, num_weeks, model_name, use_most_probable=True)
Evaluate the combined prediction model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prob_dist_dict_all
|
Dict[Any, Dict[Any, Dict[str, Any]]]
|
Nested dictionary containing probability distributions. |
required |
df
|
DataFrame
|
DataFrame containing patient data. |
required |
yta_preds
|
DataFrame
|
Yet-to-arrive predictions. |
required |
prediction_window
|
int
|
Window length in minutes. |
required |
x1
|
float
|
First x-coordinate for aspirational curve. |
required |
y1
|
float
|
First y-coordinate for aspirational curve. |
required |
x2
|
float
|
Second x-coordinate for aspirational curve. |
required |
y2
|
float
|
Second y-coordinate for aspirational curve. |
required |
prediction_time
|
Tuple[int, int]
|
Hour and minute of prediction. |
required |
num_weeks
|
int
|
Number of previous weeks to consider. |
required |
model_name
|
str
|
Name of the model. |
required |
use_most_probable
|
bool
|
Whether to use the most probable value or expected value. Default is True. |
True
|
Returns:
Type | Description |
---|---|
Dict[Any, Dict[str, Union[List[Union[int, float]], float]]]
|
Dictionary containing evaluation results: - expected : List[Union[int, float]] Expected values for each prediction - observed : List[float] Observed values for each prediction - mae : float Mean Absolute Error - mpe : float Mean Percentage Error |
Source code in src/patientflow/evaluate.py
652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 |
|
evaluate_six_week_average(prob_dist_dict_all, df, prediction_window, x1, y1, x2, y2, prediction_time, num_weeks, model_name)
Evaluate the six-week average prediction model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prob_dist_dict_all
|
Dict[Any, Dict[Any, Dict[str, Any]]]
|
Nested dictionary containing probability distributions. |
required |
df
|
DataFrame
|
DataFrame containing patient data. |
required |
prediction_window
|
int
|
Prediction window in minutes. |
required |
x1
|
float
|
First x-coordinate for aspirational curve. |
required |
y1
|
float
|
First y-coordinate for aspirational curve. |
required |
prediction_time
|
Tuple[int, int]
|
Hour and minute of prediction. |
required |
num_weeks
|
int
|
Number of previous weeks to consider. |
required |
model_name
|
str
|
Name of the model. |
required |
Returns:
Type | Description |
---|---|
Dict[Any, Dict[str, Union[List[Union[int, float]], float]]]
|
Dictionary containing evaluation results: - expected : List[Union[int, float]] Expected values for each prediction - observed : List[float] Observed values for each prediction |
Source code in src/patientflow/evaluate.py
554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 |
|
get_arrivals_with_admission_probs(df, prediction_datetime, prediction_window, prediction_time, x1, y1, x2, y2, date_range=None, target_date=None, target_weekday=None)
Get arrivals before and after prediction time with their admission probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame with arrival_datetime column. |
required |
prediction_datetime
|
datetime
|
Datetime for prediction window start. |
required |
prediction_window
|
int
|
Window length in minutes. |
required |
prediction_time
|
tuple
|
Tuple of (hour, minute) for prediction time. |
required |
x1
|
float
|
First x-coordinate for aspirational curve. |
required |
y1
|
float
|
First y-coordinate for aspirational curve. |
required |
x2
|
float
|
Second x-coordinate for aspirational curve. |
required |
y2
|
float
|
Second y-coordinate for aspirational curve. |
required |
date_range
|
tuple
|
Optional tuple of (start_date, end_date) to filter data. |
None
|
target_date
|
date
|
Optional specific date to analyze. |
None
|
target_weekday
|
int
|
Optional specific weekday to filter for (0-6, where 0 is Monday). |
None
|
Returns:
Type | Description |
---|---|
tuple
|
Tuple of (arrived_before, arrived_after) DataFrames containing: - arrived_before : pandas.DataFrame DataFrame with arrivals before prediction time - arrived_after : pandas.DataFrame DataFrame with arrivals after prediction time |
Source code in src/patientflow/evaluate.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
|
predict_using_previous_weeks(df, dt, prediction_window, x1, y1, x2, y2, prediction_time, num_weeks, weighted=True)
Calculate predicted admissions remaining until midnight.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing patient data. |
required |
dt
|
datetime
|
Date for prediction. |
required |
prediction_window
|
int
|
Window length in minutes. |
required |
x1
|
float
|
First x-coordinate for aspirational curve. |
required |
y1
|
float
|
First y-coordinate for aspirational curve. |
required |
x2
|
float
|
Second x-coordinate for aspirational curve. |
required |
y2
|
float
|
Second y-coordinate for aspirational curve. |
required |
prediction_time
|
Tuple[int, int]
|
Hour and minute of prediction. |
required |
num_weeks
|
int
|
Number of previous weeks to consider. |
required |
weighted
|
bool
|
Whether to weight the numbers according to aspirational ED targets. Default is True. |
True
|
Returns:
Type | Description |
---|---|
float
|
Predicted number of admissions remaining until midnight. |
Source code in src/patientflow/evaluate.py
424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 |
|
generate
Generate fake Emergency Department visit data.
This module provides functions to generate fake datasets for patient visits to an emergency department (ED). It generates arrival and departure times, triage scores, lab orders, and patient admissions. The functions are used for illustrative purposes in some of the notebooks.
Functions:
Name | Description |
---|---|
create_fake_finished_visits |
Generate synthetic patient visits, triage observations, and lab orders. |
create_fake_snapshots |
Create patient-level snapshots at specific times with visit, triage, and lab features. |
create_fake_finished_visits(start_date, end_date, mean_patients_per_day, admitted_only=False)
Generate synthetic patient visit data for an emergency department.
This function simulates a realistic distribution of patient arrivals, triage scores, lengths of stay, admissions, and lab orders over a specified date range. Some patients may have multiple visits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_date
|
str or datetime
|
The starting date for the simulation (inclusive). Can be a datetime object or a string in 'YYYY-MM-DD' format. |
required |
end_date
|
str or datetime
|
The ending date for the simulation (exclusive). Can be a datetime object or a string in 'YYYY-MM-DD' format. |
required |
mean_patients_per_day
|
float
|
The average number of patient visits to generate per day. |
required |
admitted_only
|
bool
|
If True, only return admitted patients. The mean_patients_per_day will be adjusted to maintain the same total number of admitted patients as would be expected in the full dataset. |
False
|
Returns:
Name | Type | Description |
---|---|---|
visits_df |
DataFrame
|
DataFrame containing visit records with the following columns: - 'visit_number' - 'patient_id' - 'arrival_datetime' - 'departure_datetime' - 'is_admitted' - 'specialty' - 'age' |
observations_df |
DataFrame
|
DataFrame containing triage score observations with columns: - 'visit_number' - 'observation_datetime' - 'triage_score' |
lab_orders_df |
DataFrame
|
DataFrame containing lab test orders with columns: - 'visit_number' - 'order_datetime' - 'lab_name' |
Notes
- Patients are more likely to arrive during daytime hours.
- 20% of patients will have more than one visit during the simulation period.
- Lab test ordering likelihood depends on the severity of the triage score.
- When admitted_only=True, the mean_patients_per_day is adjusted to maintain the same number of admitted patients as would be expected in the full dataset.
Source code in src/patientflow/generate.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
|
create_fake_snapshots(prediction_times, start_date, end_date, df=None, observations_df=None, lab_orders_df=None, mean_patients_per_day=50)
Generate patient-level snapshots at specific times for prediction modeling.
For each specified time on each date in the range, this function returns a snapshot of patients who are currently in the emergency department, along with their visit features, latest triage score, and number of lab tests ordered prior to that time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_times
|
list of tuple of int
|
A list of (hour, minute) tuples indicating times of day to create snapshots. |
required |
start_date
|
str or datetime
|
The starting date for generating snapshots (inclusive). |
required |
end_date
|
str or datetime
|
The ending date for generating snapshots (exclusive). |
required |
df
|
DataFrame
|
Patient visit data from |
None
|
observations_df
|
DataFrame
|
Triage score data from |
None
|
lab_orders_df
|
DataFrame
|
Lab order data from |
None
|
mean_patients_per_day
|
float
|
Average number of patients per day (used only if synthetic data is generated). |
50
|
Returns:
Name | Type | Description |
---|---|---|
final_df |
DataFrame
|
A DataFrame with one row per patient visit present at the snapshot time. Columns include:
- 'snapshot_date'
- 'prediction_time'
- 'patient_id'
- 'visit_number'
- 'is_admitted'
- 'age'
- 'latest_triage_score'
- One column per lab test: 'num_ |
Notes
- Only patients present in the ED at the snapshot time are included.
- Lab order columns reflect counts of tests ordered before the snapshot time.
- If no patients are present at a snapshot time, that snapshot is omitted.
Source code in src/patientflow/generate.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 |
|
load
This module provides functionality for loading configuration files, data from CSV files, and trained machine learning models.
It includes the following features:
- Loading Configurations: Parse YAML configuration files and extract necessary parameters for data processing and modeling.
- Data Handling: Load and preprocess data from CSV files, including optional operations like setting an index, sorting, and applying literal evaluation on columns.
- Model Management: Load saved machine learning models, customize model filenames based on time, and categorize DataFrame columns into predefined groups for analysis.
The module handles common file and parsing errors, returning appropriate error messages or exceptions.
Functions:
Name | Description |
---|---|
parse_args: |
Parses command-line arguments for training models. |
set_project_root: |
Validates project root path from specified environment variable. |
load_config_file: |
Load a YAML configuration file and extract key parameters. |
set_file_paths: |
Sets up the file paths based on UCLH-specific or default parameters. |
set_data_file_names: |
Set file locations based on UCLH-specific or default data sources. |
safe_literal_eval: |
Safely evaluate string literals into Python objects when loading from csv. |
load_data: |
Load and preprocess data from a CSV or pickle file. |
get_model_key: |
Generate a model name based on the time of day. |
load_saved_model: |
Load a machine learning model saved in a joblib file. |
get_dict_cols: |
Categorize columns from a DataFrame into predefined groups for analysis. |
data_from_csv(csv_path, index_column=None, sort_columns=None, eval_columns=None)
Loads data from a CSV file, with optional transformations. LEGACY!
This function loads a CSV file into a pandas DataFrame and provides the following optional features: - Setting a specified column as the index. - Sorting the DataFrame by one or more specified columns. - Applying safe literal evaluation to specified columns to handle string representations of Python objects.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_path
|
str
|
The relative or absolute path to the CSV file. |
required |
index_column
|
str
|
The column to set as the index of the DataFrame. If not provided, no index column is set. |
None
|
sort_columns
|
list of str
|
A list of columns by which to sort the DataFrame. If not provided, the DataFrame is not sorted. |
None
|
eval_columns
|
list of str
|
A list of columns to which |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
A pandas DataFrame containing the loaded data with any specified transformations applied. |
Raises:
Type | Description |
---|---|
SystemExit
|
If the file cannot be found or another error occurs during loading or processing. |
Notes
The function will terminate the program with a message if the file is not found or if any errors
occur while loading the data. If sorting columns or applying safe_literal_eval
fails,
a warning message is printed, but execution continues.
Source code in src/patientflow/load.py
363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
|
get_dict_cols(df)
Categorize DataFrame columns into predefined groups.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame to categorize. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary where keys are column group names and values are lists of column names in each group. |
Source code in src/patientflow/load.py
550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 |
|
get_model_key(model_name, prediction_time)
Create a model name based on the time of day.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The base name of the model. |
required |
prediction_time
|
tuple of int
|
A tuple representing the time of day (hour, minute). |
required |
Returns:
Type | Description |
---|---|
str
|
A string representing the model name based on the time of day. |
Source code in src/patientflow/load.py
527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 |
|
load_config_file(config_file_path, return_start_end_dates=False)
Load configuration from a YAML file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_file_path
|
str
|
The path to the configuration file. |
required |
return_start_end_dates
|
bool
|
If True, return only the start and end dates from the file (default is False). |
False
|
Returns:
Type | Description |
---|---|
dict or tuple or None
|
If |
Source code in src/patientflow/load.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 |
|
load_data(data_file_path, file_name, index_column=None, sort_columns=None, eval_columns=None, home_path=None, encoding=None)
Loads data from CSV or pickle file with optional transformations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_file_path
|
str
|
Directory path containing the data file |
required |
file_name
|
str
|
Name of the CSV or pickle file to load |
required |
index_column
|
str
|
Column to set as DataFrame index |
None
|
sort_columns
|
list of str
|
Columns to sort DataFrame by |
None
|
eval_columns
|
list of str
|
Columns to apply safe_literal_eval to |
None
|
home_path
|
str or Path
|
Base path to use instead of user's home directory |
None
|
encoding
|
str
|
The encoding to use when reading CSV files (e.g., 'utf-8', 'latin1') |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
Loaded and transformed DataFrame |
Raises:
Type | Description |
---|---|
FileNotFoundError
|
If the specified file does not exist |
ValueError
|
If the file format is not supported or other processing errors occur |
Source code in src/patientflow/load.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 |
|
parse_args()
Parse command-line arguments for the training script.
Returns: argparse.Namespace: The parsed arguments containing 'data_folder_name' and 'uclh' keys.
Source code in src/patientflow/load.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
safe_literal_eval(s)
Safely evaluate a string literal into a Python object. Handles list-like strings by converting them to lists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s
|
str
|
The string to evaluate. |
required |
Returns:
Type | Description |
---|---|
Any, list, or None
|
The evaluated Python object if successful, a list if the input is list-like, or None for empty/null values. |
Source code in src/patientflow/load.py
325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 |
|
set_data_file_names(uclh, data_file_path, config_file_path=None)
Set file locations based on UCLH or default data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
uclh
|
bool
|
If True, use UCLH-specific file locations. If False, use default file locations. |
required |
data_file_path
|
Path
|
The base path to the data directory. |
required |
config_file_path
|
str
|
The path to the configuration file, required if |
None
|
Returns:
Type | Description |
---|---|
tuple
|
Paths to the required files (visits, arrivals) based on the configuration. |
Source code in src/patientflow/load.py
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 |
|
set_file_paths(project_root, data_folder_name, train_dttm=None, inference_time=False, config_file='config.yaml', prefix=None, verbose=True)
Sets up the file paths
Args: project_root (Path): Root path of the project data_folder_name (str): Name of the folder where data files are located train_dttm (Optional[str], optional): A string representation of the datetime at which training commenced. Defaults to None inference_time (bool, optional): A flag indicating whether it is inference time or not. Defaults to False config_file (str, optional): Name of config file. Defaults to "config.yaml" prefix (Optional[str], optional): String to prefix model folder names. Defaults to None verbose (bool, optional): Whether to print path information. Defaults to True
Returns: tuple: Contains (data_file_path, media_file_path, model_file_path, config_path)
Source code in src/patientflow/load.py
215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 |
|
set_project_root(env_var=None)
Sets project root path from environment variable or infers it from current path.
First checks specified environment variable for project root path. If not found, searches current path hierarchy for highest-level 'patientflow' directory.
Args: env_var (Optional[str]): Name of environment variable containing project root path
Returns: Path: Validated project root path
Raises: ValueError: If environment variable not set and 'patientflow' not found in path NotADirectoryError: If path doesn't exist TypeError: If env_var is not None and not a string
Source code in src/patientflow/load.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
model_artifacts
Model training results containers.
This module defines a set of data classes to organise results from model training, including hyperparameter tuning, cross-validation fold metrics, and final trained classifier artifacts. These classes serve as structured containers for various types of model evaluation outputs and metadata.
Classes:
Name | Description |
---|---|
HyperParameterTrial |
Container for storing hyperparameter tuning trial results. |
FoldResults |
Stores evaluation metrics from a single cross-validation fold. |
TrainingResults |
Encapsulates comprehensive evaluation metrics and metadata from model training. |
TrainedClassifier |
Container for a trained model and associated training results. |
FoldResults
dataclass
Store evaluation metrics for a single fold.
Attributes:
Name | Type | Description |
---|---|---|
auc |
float
|
Area Under the ROC Curve (AUC) for this fold. |
logloss |
float
|
Logarithmic loss (cross-entropy loss) for this fold. |
auprc |
float
|
Area Under the Precision-Recall Curve (AUPRC) for this fold. |
Source code in src/patientflow/model_artifacts.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
HyperParameterTrial
dataclass
Container for a single hyperparameter tuning trial.
Attributes:
Name | Type | Description |
---|---|---|
parameters |
dict of str to Any
|
Dictionary of hyperparameters used in the trial. |
cv_results |
dict of str to float
|
Cross-validation metrics obtained using the specified parameters. |
Source code in src/patientflow/model_artifacts.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
|
TrainedClassifier
dataclass
Container for trained model artifacts and their associated information.
Attributes:
Name | Type | Description |
---|---|---|
training_results |
TrainingResults
|
Evaluation metrics and training metadata for the classifier. |
pipeline |
(Pipeline or None, optional)
|
The scikit-learn pipeline representing the trained classifier. |
calibrated_pipeline |
(Pipeline or None, optional)
|
The calibrated version of the pipeline, if model calibration was performed. |
Source code in src/patientflow/model_artifacts.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
TrainingResults
dataclass
Store comprehensive evaluation metrics and metadata from model training.
Attributes:
Name | Type | Description |
---|---|---|
prediction_time |
tuple of int
|
Start and end time of prediction, represented as UNIX timestamps. |
training_info |
dict of str to Any, optional
|
Metadata or logs collected during training. |
calibration_info |
dict of str to Any, optional
|
Information about model calibration, if applicable. |
test_results |
dict of str to float, optional
|
Evaluation metrics computed on the test dataset. None if test evaluation was not performed. |
balance_info |
dict of str to bool or int or float, optional
|
Information related to class balance (e.g., whether data was balanced, class ratios). |
Source code in src/patientflow/model_artifacts.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|
predict
Prediction module for patient flow forecasting.
This module provides functions for making predictions about future patient flow, including emergency demand forecasting and other predictive analytics.
emergency_demand
Emergency demand prediction module.
This module provides functionality for predicting emergency department demand, including specialty-specific predictions for both current patients and yet-to-arrive patients. It handles probability calculations, model predictions, and threshold-based resource estimation.
The module integrates multiple prediction models: - Admission prediction classifier - Specialty sequence predictor - Yet-to-arrive weighted Poisson predictor
Functions:
Name | Description |
---|---|
add_missing_columns : function |
Add missing columns required by the prediction pipeline |
find_probability_threshold_index : function |
Find index where cumulative probability exceeds threshold |
get_specialty_probs : function |
Calculate specialty probability distributions |
create_predictions : function |
Create predictions for emergency demand |
add_missing_columns(pipeline, df)
Add missing columns required by the prediction pipeline from the training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline
|
Pipeline
|
The trained pipeline containing the feature transformer |
required |
df
|
DataFrame
|
Input dataframe that may be missing required columns |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with missing columns added and filled with appropriate default values |
Notes
Adds columns with default values based on column name patterns: - lab_orders_, visited_, has_ : False - num_, total_ : 0 - latest_ : pd.NA - arrival_method : "None" - others : pd.NA
Source code in src/patientflow/predict/emergency_demand.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
create_predictions(models, prediction_time, prediction_snapshots, specialties, prediction_window, x1, y1, x2, y2, cdf_cut_points, use_admission_in_window_prob=True)
Create predictions for emergency demand for a single prediction moment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
models
|
Tuple[TrainedClassifier, Union[SequenceToOutcomePredictor, ValueToOutcomePredictor], ParametricIncomingAdmissionPredictor]
|
Tuple containing: - classifier: TrainedClassifier containing admission predictions - spec_model: SequenceToOutcomePredictor or ValueToOutcomePredictor for specialty predictions - yet_to_arrive_model: ParametricIncomingAdmissionPredictor for yet-to-arrive predictions |
required |
prediction_time
|
Tuple
|
Hour and minute of time for model inference |
required |
prediction_snapshots
|
DataFrame
|
DataFrame containing prediction snapshots. Must have an 'elapsed_los' column of type timedelta. |
required |
specialties
|
List[str]
|
List of specialty names for predictions (e.g., ['surgical', 'medical']) |
required |
prediction_window
|
timedelta
|
Prediction window as a timedelta object |
required |
x1
|
float
|
X-coordinate of first point for probability curve |
required |
y1
|
float
|
Y-coordinate of first point for probability curve |
required |
x2
|
float
|
X-coordinate of second point for probability curve |
required |
y2
|
float
|
Y-coordinate of second point for probability curve |
required |
cdf_cut_points
|
List[float]
|
List of cumulative distribution function cut points (e.g., [0.9, 0.7]) |
required |
use_admission_in_window_prob
|
bool
|
Whether to use probability calculation for admission within prediction window for patients already in the ED. If False, probability is set to 1.0 for all current ED patients. This parameter does not affect the yet-to-arrive predictions. By default True |
True
|
Returns:
Type | Description |
---|---|
Dict[str, Dict[str, List[int]]]
|
Nested dictionary containing predictions for each specialty: { 'specialty_name': { 'in_ed': [pred1, pred2, ...], 'yet_to_arrive': [pred1, pred2, ...] } } |
Raises:
Type | Description |
---|---|
TypeError
|
If any of the models are not of the expected type or if prediction_window is not a timedelta |
ValueError
|
If models have not been fit or if prediction parameters don't match training parameters If 'elapsed_los' column is missing or not of type timedelta |
Notes
The models in the models dictionary must be ModelResults objects that contain either a 'pipeline' or 'calibrated_pipeline' attribute. The pipeline will be used for making predictions, with calibrated_pipeline taking precedence if both exist.
Source code in src/patientflow/predict/emergency_demand.py
232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 |
|
find_probability_threshold_index(sequence, threshold)
Find index where cumulative probability exceeds threshold.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sequence
|
List[float]
|
The probability mass function (PMF) of resource needs |
required |
threshold
|
float
|
The probability threshold (e.g., 0.9 for 90%) |
required |
Returns:
Type | Description |
---|---|
int
|
The index where the cumulative probability exceeds 1 - threshold, indicating the number of resources needed with the specified probability |
Examples:
>>> pmf = [0.05, 0.1, 0.2, 0.3, 0.2, 0.1, 0.05]
>>> find_probability_threshold_index(pmf, 0.9)
5
# This means there is a 90% probability of needing at least 5 beds
Source code in src/patientflow/predict/emergency_demand.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
get_specialty_probs(specialties, specialty_model, snapshots_df, special_category_func=None, special_category_dict=None)
Calculate specialty probability distributions for patient visits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
specialties
|
str
|
List of specialty names for which predictions are required |
required |
specialty_model
|
object
|
Trained model for making specialty predictions |
required |
snapshots_df
|
DataFrame
|
DataFrame containing the data on which predictions are to be made. Must include the input_var column if no special_category_func is applied |
required |
special_category_func
|
callable
|
A function that takes a DataFrame row (Series) as input and returns True if the row belongs to a special category that requires a fixed probability distribution |
None
|
special_category_dict
|
dict
|
A dictionary containing the fixed probability distribution for special category cases. Required if special_category_func is provided |
None
|
Returns:
Type | Description |
---|---|
Series
|
A Series containing dictionaries as values. Each dictionary represents the probability distribution of specialties for each patient visit |
Raises:
Type | Description |
---|---|
ValueError
|
If special_category_func is provided but special_category_dict is None |
Source code in src/patientflow/predict/emergency_demand.py
153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
|
predictors
Predictor models for patient flow analysis.
This module contains various predictor model implementations, including sequence-based predictors and weighted Poisson predictors for modeling patient flow patterns.
incoming_admission_predictors
Hospital Admissions Forecasting Predictors.
This module implements custom predictors to estimate the number of hospital admissions within a specified prediction window using historical admission data. It provides two approaches: parametric curves with Poisson-binomial distributions and empirical survival curves with convolution of Poisson distributions. Both predictors accommodate different data filters for tailored predictions across various hospital settings.
Classes:
Name | Description |
---|---|
IncomingAdmissionPredictor : BaseEstimator, TransformerMixin |
Base class for admission predictors that handles filtering and arrival rate calculation. |
ParametricIncomingAdmissionPredictor : IncomingAdmissionPredictor |
Predicts the number of admissions within a given prediction window based on historical data and Poisson-binomial distribution using parametric aspirational curves. |
EmpiricalIncomingAdmissionPredictor : IncomingAdmissionPredictor |
Predicts the number of admissions using empirical survival curves and convolution of Poisson distributions instead of parametric curves. |
Notes
The ParametricIncomingAdmissionPredictor uses a combination of Poisson and binomial distributions to model the probability of admissions within a prediction window using parametric curves defined by transition points (x1, y1, x2, y2).
The EmpiricalIncomingAdmissionPredictor inherits the arrival rate calculation and filtering logic but replaces the parametric approach with empirical survival probabilities and convolution of individual Poisson distributions for each time interval.
Both predictors take into account historical data patterns and can be filtered for specific hospital settings or specialties.
EmpiricalIncomingAdmissionPredictor
Bases: IncomingAdmissionPredictor
A predictor that uses empirical survival curves instead of parameterised curves.
This predictor inherits all the arrival rate calculation and filtering logic from IncomingAdmissionPredictor but uses empirical survival probabilities and convolution of Poisson distributions for prediction instead of the Poisson-binomial approach.
The survival curve is automatically calculated from the training data during the fit process by analysing time-to-admission patterns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filters
|
dict
|
Optional filters for data categorization. If None, no filtering is applied. |
None
|
verbose
|
bool
|
Whether to enable verbose logging. |
False
|
Attributes:
Name | Type | Description |
---|---|---|
survival_df |
DataFrame
|
The survival data calculated from training data, containing time-to-event information for empirical probability calculations. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 |
|
__init__(filters=None, verbose=False)
Initialize the EmpiricalIncomingAdmissionPredictor.
Source code in src/patientflow/predictors/incoming_admission_predictors.py
775 776 777 778 |
|
fit(train_df, prediction_window, yta_time_interval, prediction_times, num_days, epsilon=10 ** -7, y=None, start_time_col='arrival_datetime', end_time_col='departure_datetime')
Fit the model to the training data and calculate empirical survival curve.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_df
|
DataFrame
|
The training dataset with historical admission data. Expected to have start_time_col as the index and end_time_col as a column. Alternatively, both can be regular columns. |
required |
prediction_window
|
int or timedelta
|
The prediction window in minutes. If timedelta, will be converted to minutes. If int, assumed to be in minutes. |
required |
yta_time_interval
|
int or timedelta
|
The interval in minutes for splitting the prediction window. If timedelta, will be converted to minutes. If int, assumed to be in minutes. |
required |
prediction_times
|
list
|
Times of day at which predictions are made, in hours. |
required |
num_days
|
int
|
The number of days that the train_df spans. |
required |
epsilon
|
float
|
A small value representing acceptable error rate to enable calculation of the maximum value of the random variable representing number of beds. |
1e-7
|
y
|
None
|
Ignored, present for compatibility with scikit-learn's fit method. |
None
|
start_time_col
|
str
|
Name of the column containing the start time (e.g., arrival time). Expected to be the DataFrame index, but can also be a regular column. |
'arrival_datetime'
|
end_time_col
|
str
|
Name of the column containing the end time (e.g., departure time). |
'departure_datetime'
|
Returns:
Type | Description |
---|---|
EmpiricalIncomingAdmissionPredictor
|
The instance itself, fitted with the training data. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 |
|
get_survival_curve()
Get the survival curve calculated during fitting.
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame containing the survival curve with columns: - time_hours: Time points in hours - survival_probability: Survival probabilities at each time point - event_probability: Event probabilities (1 - survival_probability) |
Raises:
Type | Description |
---|---|
RuntimeError
|
If the model has not been fitted yet. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 |
|
predict(prediction_context, **kwargs)
Predict the number of admissions using empirical survival curves.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_context
|
dict
|
A dictionary defining the context for which predictions are to be made. It should specify either a general context or one based on the applied filters. |
required |
**kwargs
|
Additional keyword arguments for prediction configuration: max_value : int, default=20 Maximum value for the discrete distribution support. |
{}
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary with predictions for each specified context. |
Raises:
Type | Description |
---|---|
ValueError
|
If filter key is not recognized or prediction_time is not provided. |
KeyError
|
If required keys are missing from the prediction context. |
RuntimeError
|
If survival_df was not provided during fitting. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 |
|
IncomingAdmissionPredictor
Bases: BaseEstimator
, TransformerMixin
, ABC
Base class for admission predictors that handles filtering and arrival rate calculation.
This abstract base class provides the common functionality for predicting hospital admissions, including data filtering, arrival rate calculation, and basic prediction infrastructure. Subclasses implement specific prediction strategies.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filters
|
dict
|
Optional filters for data categorization. If None, no filtering is applied. |
None
|
verbose
|
bool
|
Whether to enable verbose logging. |
False
|
Attributes:
Name | Type | Description |
---|---|---|
filters |
dict
|
Filters for data categorization. |
verbose |
bool
|
Verbose logging flag. |
metrics |
dict
|
Stores metadata about the model and training data. |
weights |
dict
|
Model parameters computed during fitting. |
Notes
The predictor implements scikit-learn's BaseEstimator and TransformerMixin interfaces for compatibility with scikit-learn pipelines.
Source code in src/patientflow/predictors/incoming_admission_predictors.py
278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 |
|
__init__(filters=None, verbose=False)
Initialize the IncomingAdmissionPredictor with optional filters.
Args: filters (dict, optional): A dictionary defining filters for different categories or specialties. If None or empty, no filtering will be applied. verbose (bool, optional): If True, enable info-level logging. Defaults to False.
Source code in src/patientflow/predictors/incoming_admission_predictors.py
309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 |
|
filter_dataframe(df, filters)
Apply a set of filters to a dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
The DataFrame to filter. |
required |
filters
|
dict
|
A dictionary where keys are column names and values are the criteria or function to filter by. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A filtered DataFrame. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
|
fit(train_df, prediction_window, yta_time_interval, prediction_times, num_days, epsilon=10 ** -7, y=None)
Fit the model to the training data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_df
|
DataFrame
|
The training dataset with historical admission data. |
required |
prediction_window
|
timedelta
|
The prediction window as a timedelta object. |
required |
yta_time_interval
|
timedelta
|
The interval for splitting the prediction window as a timedelta object. |
required |
prediction_times
|
list
|
Times of day at which predictions are made, in hours. |
required |
num_days
|
int
|
The number of days that the train_df spans. |
required |
epsilon
|
float
|
A small value representing acceptable error rate to enable calculation of the maximum value of the random variable representing number of beds. |
1e-7
|
y
|
None
|
Ignored, present for compatibility with scikit-learn's fit method. |
None
|
Returns:
Type | Description |
---|---|
IncomingAdmissionPredictor
|
The instance itself, fitted with the training data. |
Raises:
Type | Description |
---|---|
TypeError
|
If prediction_window or yta_time_interval are not timedelta objects. |
ValueError
|
If prediction_window/yta_time_interval is not greater than 1. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 |
|
get_weights()
Get the weights computed by the fit method.
Returns:
Type | Description |
---|---|
dict
|
The weights computed during model fitting. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
557 558 559 560 561 562 563 564 565 |
|
predict(prediction_context, **kwargs)
abstractmethod
Predict the number of admissions for the given context.
This is an abstract method that must be implemented by subclasses.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_context
|
dict
|
A dictionary defining the context for which predictions are to be made. It should specify either a general context or one based on the applied filters. |
required |
**kwargs
|
Additional keyword arguments specific to the prediction method. |
{}
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary with predictions for each specified context. |
Raises:
Type | Description |
---|---|
ValueError
|
If filter key is not recognized or prediction_time is not provided. |
KeyError
|
If required keys are missing from the prediction context. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 |
|
ParametricIncomingAdmissionPredictor
Bases: IncomingAdmissionPredictor
A predictor for estimating hospital admissions using parametric curves.
This predictor uses a combination of Poisson and binomial distributions to forecast future admissions, excluding patients who have already arrived. The prediction is based on historical data and can be filtered for specific hospital settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filters
|
dict
|
Optional filters for data categorization. If None, no filtering is applied. |
None
|
verbose
|
bool
|
Whether to enable verbose logging. |
False
|
Attributes:
Name | Type | Description |
---|---|---|
filters |
dict
|
Filters for data categorization. |
verbose |
bool
|
Verbose logging flag. |
metrics |
dict
|
Stores metadata about the model and training data. |
weights |
dict
|
Model parameters computed during fitting. |
Notes
The predictor implements scikit-learn's BaseEstimator and TransformerMixin interfaces for compatibility with scikit-learn pipelines.
Source code in src/patientflow/predictors/incoming_admission_predictors.py
596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 |
|
predict(prediction_context, **kwargs)
Predict the number of admissions for the given context using parametric curves.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_context
|
dict
|
A dictionary defining the context for which predictions are to be made. It should specify either a general context or one based on the applied filters. |
required |
**kwargs
|
Additional keyword arguments for parametric curve configuration: x1 : float The x-coordinate of the first transition point on the aspirational curve, where the growth phase ends and the decay phase begins. y1 : float The y-coordinate of the first transition point (x1), representing the target proportion of patients admitted by time x1. x2 : float The x-coordinate of the second transition point on the curve, beyond which all but a few patients are expected to be admitted. y2 : float The y-coordinate of the second transition point (x2), representing the target proportion of patients admitted by time x2. |
{}
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary with predictions for each specified context. |
Raises:
Type | Description |
---|---|
ValueError
|
If filter key is not recognized or prediction_time is not provided. |
KeyError
|
If required keys are missing from the prediction context. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 |
|
aggregate_probabilities(lam, kmax, theta, time_index)
Aggregate probabilities for a range of values using the weighted Poisson-Binomial distribution.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lam
|
ndarray
|
An array of lambda values for each time interval. |
required |
kmax
|
int
|
The maximum number of events to consider. |
required |
theta
|
ndarray
|
An array of theta values for each time interval. |
required |
time_index
|
int
|
The current time index for which to calculate probabilities. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Aggregated probabilities for the given time index. |
Raises:
Type | Description |
---|---|
ValueError
|
If kmax < 0, time_index < 0, or array lengths are invalid. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
convolute_distributions(dist_a, dist_b)
Convolutes two probability distributions represented as dataframes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dist_a
|
DataFrame
|
The first distribution with columns ['sum', 'prob']. |
required |
dist_b
|
DataFrame
|
The second distribution with columns ['sum', 'prob']. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The convoluted distribution. |
Raises:
Type | Description |
---|---|
ValueError
|
If DataFrames do not contain required 'sum' and 'prob' columns. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
find_nearest_previous_prediction_time(requested_time, prediction_times)
Find the nearest previous time of day in prediction_times relative to requested time.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
requested_time
|
tuple
|
The requested time as (hour, minute). |
required |
prediction_times
|
list
|
List of available prediction times. |
required |
Returns:
Type | Description |
---|---|
tuple
|
The closest previous time of day from prediction_times. |
Notes
If the requested time is earlier than all times in prediction_times, returns the latest time in prediction_times.
Source code in src/patientflow/predictors/incoming_admission_predictors.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
|
poisson_binom_generating_function(NTimes, arrival_rates, theta, epsilon)
Generate a distribution based on the aggregate of Poisson and Binomial distributions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
NTimes
|
int
|
The number of time intervals. |
required |
arrival_rates
|
ndarray
|
An array of lambda values for each time interval. |
required |
theta
|
ndarray
|
An array of theta values for each time interval. |
required |
epsilon
|
float
|
The desired error threshold. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The generated distribution. |
Raises:
Type | Description |
---|---|
ValueError
|
If NTimes <= 0 or epsilon is not between 0 and 1. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
weighted_poisson_binomial(i, lam, theta)
Calculate weighted probabilities using Poisson and Binomial distributions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
i
|
int
|
The upper bound of the range for the binomial distribution. |
required |
lam
|
float
|
The lambda parameter for the Poisson distribution. |
required |
theta
|
float
|
The probability of success for the binomial distribution. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
An array of weighted probabilities. |
Raises:
Type | Description |
---|---|
ValueError
|
If i < 0, lam < 0, or theta is not between 0 and 1. |
Source code in src/patientflow/predictors/incoming_admission_predictors.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
sequence_to_outcome_predictor
This module implements a SequenceToOutcomePredictor
class that models and predicts the probability distribution
of sequences in categorical data. The class builds a model based on training data, where input sequences
are mapped to specific outcome categories. It provides methods to fit the model, compute sequence-based
probabilities, and make predictions on an unseen datatset of input sequences.
Classes:
Name | Description |
---|---|
SequenceToOutcomePredictor : sklearn.base.BaseEstimator, sklearn.base.TransformerMixin |
A model that predicts the probability of ending in different outcome categories based on input sequences. Note: All sequence inputs are expected to be tuples. Lists will be automatically converted to tuples, and None values will be converted to empty tuples. |
SequenceToOutcomePredictor
Bases: BaseEstimator
, TransformerMixin
A class to model sequence-based predictions for categorical data using input and grouping sequences.
This class implements both the fit
and predict
methods from the parent sklearn classes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_var
|
str
|
Name of the column representing the input sequence in the DataFrame. |
required |
grouping_var
|
str
|
Name of the column representing the grouping sequence in the DataFrame. |
required |
outcome_var
|
str
|
Name of the column representing the outcome category in the DataFrame. |
required |
apply_special_category_filtering
|
bool
|
Whether to filter out special categories of patients before fitting the model. |
True
|
admit_col
|
str
|
Name of the column indicating whether a patient was admitted. |
'is_admitted'
|
Attributes:
Name | Type | Description |
---|---|---|
weights |
dict
|
A dictionary storing the probabilities of different input sequences leading to specific outcome categories. |
input_to_grouping_probs |
DataFrame
|
A DataFrame that stores the computed probabilities of input sequences being associated with different grouping sequences. |
special_params |
(dict, optional)
|
The special category parameters used for filtering, only populated if apply_special_category_filtering=True. |
metrics |
dict
|
A dictionary to store metrics related to the training process. |
Source code in src/patientflow/predictors/sequence_to_outcome_predictor.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 |
|
__repr__()
Return a string representation of the estimator.
Source code in src/patientflow/predictors/sequence_to_outcome_predictor.py
71 72 73 74 75 76 77 78 79 80 81 82 |
|
fit(X)
Fits the predictor based on training data by computing the proportion of each input variable sequence ending in specific outcome variable categories.
Automatically preprocesses the data before fitting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
A pandas DataFrame containing at least the columns specified by |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
SequenceToOutcomePredictor
|
The fitted SequenceToOutcomePredictor model with calculated probabilities for each sequence. |
Source code in src/patientflow/predictors/sequence_to_outcome_predictor.py
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
|
predict(input_sequence)
Predicts the probabilities of ending in various outcome categories for a given input sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_sequence
|
tuple[str, ...]
|
A tuple containing the categories that have been observed for an entity in the order they have been encountered. An empty tuple represents an entity with no observed categories. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary of categories and the probabilities that the input sequence will end in them. |
Source code in src/patientflow/predictors/sequence_to_outcome_predictor.py
346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 |
|
value_to_outcome_predictor
This module implements a ValueToOutcomePredictor
class that models and predicts the probability distribution
of outcomes based on a single categorical input. The class builds a model based on training data, where
input values are mapped to specific outcome categories through an intermediate grouping variable. It provides
methods to fit the model, compute probabilities, and make predictions on unseen data.
Classes:
Name | Description |
---|---|
ValueToOutcomePredictor : sklearn.base.BaseEstimator, sklearn.base.TransformerMixin |
A model that predicts the probability of ending in different outcome categories based on a single input value. Note: All inputs are expected to be strings. None values will be converted to empty strings during preprocessing. |
ValueToOutcomePredictor
Bases: BaseEstimator
, TransformerMixin
A class to model predictions for categorical data using a single input value and grouping variable.
This class implements both the fit
and predict
methods from the parent sklearn classes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_var
|
str
|
Name of the column representing the input value in the DataFrame. |
required |
grouping_var
|
str
|
Name of the column representing the grouping value in the DataFrame. |
required |
outcome_var
|
str
|
Name of the column representing the outcome category in the DataFrame. |
required |
apply_special_category_filtering
|
bool
|
Whether to filter out special categories of patients before fitting the model. |
True
|
admit_col
|
str
|
Name of the column indicating whether a patient was admitted. |
'is_admitted'
|
Attributes:
Name | Type | Description |
---|---|---|
weights |
dict
|
A dictionary storing the probabilities of different input values leading to specific outcome categories. |
input_to_grouping_probs |
DataFrame
|
A DataFrame that stores the computed probabilities of input values being associated with different grouping values. |
special_params |
(dict, optional)
|
The special category parameters used for filtering, only populated if apply_special_category_filtering=True. |
metrics |
dict
|
A dictionary to store metrics related to the training process. |
Source code in src/patientflow/predictors/value_to_outcome_predictor.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 |
|
__repr__()
Return a string representation of the estimator.
Source code in src/patientflow/predictors/value_to_outcome_predictor.py
69 70 71 72 73 74 75 76 77 78 79 80 |
|
fit(X)
Fits the predictor based on training data by computing the proportion of each input value ending in specific outcome variable categories.
Automatically preprocesses the data before fitting. During preprocessing, any null values in the input and grouping variables are converted to empty strings. These empty strings are then used as keys in the model's weights dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
DataFrame
|
A pandas DataFrame containing at least the columns specified by |
required |
Returns:
Name | Type | Description |
---|---|---|
self |
ValueToOutcomePredictor
|
The fitted ValueToOutcomePredictor model with calculated probabilities for each input value. The weights dictionary will contain an empty string key ('') for any null values from the input data. |
Source code in src/patientflow/predictors/value_to_outcome_predictor.py
140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
|
predict(input_value)
Predicts the probabilities of ending in various outcome categories for a given input value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_value
|
str
|
The input value to predict outcomes for. None values will be handled appropriately. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary of categories and the probabilities that the input value will end in them. |
Source code in src/patientflow/predictors/value_to_outcome_predictor.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 |
|
prepare
Module for preparing data, loading models, and organizing snapshots for inference.
This module provides functionality to load a trained model, prepare data for making predictions, calculate arrival rates, and organize snapshot data. It allows for selecting one snapshot per visit, filtering snapshots by prediction time, and mapping snapshot dates to corresponding indices.
Functions:
Name | Description |
---|---|
git select_one_snapshot_per_visit |
Selects one snapshot per visit based on a random number and returns the filtered DataFrame. |
prepare_patient_snapshots |
Filters the DataFrame by prediction time and optionally selects one snapshot per visit. |
prepare_group_snapshot_dict |
Prepares a dictionary mapping snapshot dates to their corresponding snapshot indices. |
calculate_time_varying_arrival_rates |
Calculates the time-varying arrival rates for a dataset indexed by datetime. |
SpecialCategoryParams
A picklable implementation of special category parameters for patient classification.
This class identifies pediatric patients based on available age-related columns in the dataset and provides functions to categorise patients accordingly. It's designed to be serializable with pickle by implementing the reduce method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
list or Index
|
Column names from the dataset used to determine the appropriate age identification method |
required |
Attributes:
Name | Type | Description |
---|---|---|
columns |
list
|
List of column names from the dataset |
method_type |
str
|
The method used for age detection ('age_on_arrival' or 'age_group') |
special_category_dict |
dict
|
Default category values mapping |
Raises:
Type | Description |
---|---|
ValueError
|
If neither 'age_on_arrival' nor 'age_group' columns are found |
Source code in src/patientflow/prepare.py
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 |
|
__init__(columns)
Initialize the SpecialCategoryParams object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
list or Index
|
Column names from the dataset used to determine the appropriate age identification method |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If neither 'age_on_arrival' nor 'age_group' columns are found |
Source code in src/patientflow/prepare.py
378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 |
|
__reduce__()
Support for pickle serialization.
Returns:
Type | Description |
---|---|
Tuple[Type[SpecialCategoryParams], Tuple[list]]
|
A tuple containing: - The class itself (to be called as a function) - A tuple of arguments to pass to the class constructor |
Source code in src/patientflow/prepare.py
462 463 464 465 466 467 468 469 470 471 472 |
|
get_params_dict()
Get the special parameter dictionary in the format expected by the SequencePredictor.
Returns:
Type | Description |
---|---|
Dict[str, Union[Callable, Dict[str, float], Dict[str, Callable]]]
|
A dictionary containing: - 'special_category_func': Function to identify pediatric patients - 'special_category_dict': Default category values (float) - 'special_func_map': Mapping of category names to detection functions |
Source code in src/patientflow/prepare.py
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 |
|
opposite_special_category_func(row)
Identify if a patient is NOT pediatric.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row
|
Union[dict, Series]
|
A row of patient data |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the patient is NOT pediatric, False if they are pediatric |
Source code in src/patientflow/prepare.py
425 426 427 428 429 430 431 432 433 434 435 436 437 438 |
|
special_category_func(row)
Identify if a patient is pediatric based on age data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row
|
Union[dict, Series]
|
A row of patient data containing either 'age_on_arrival' or 'age_group' |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the patient is pediatric (age < 18 or age_group is '0-17'), False otherwise |
Source code in src/patientflow/prepare.py
406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 |
|
additional_details(column, col_name)
Generate additional statistical details about a column's contents.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
Series
|
The column to analyze |
required |
col_name
|
str
|
Name of the column (used for context) |
required |
Returns:
Type | Description |
---|---|
str
|
A string containing statistical details about the column's contents, including: - For dates: Date range - For categorical data: Frequency of values - For numeric data: Range, mean, standard deviation, and NA count - For datetime: Date range with time |
Source code in src/patientflow/prepare.py
749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 |
|
apply_set(row)
Randomly assign a set label based on weighted probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row
|
Series
|
Series containing 'training_set', 'validation_set', and 'test_set' weights |
required |
Returns:
Type | Description |
---|---|
str
|
One of 'train', 'valid', or 'test' based on weighted random choice |
Source code in src/patientflow/prepare.py
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
assign_patient_ids(df, start_training_set, start_validation_set, start_test_set, end_test_set, date_col='arrival_datetime', patient_id='mrn', visit_col='encounter', seed=42)
Probabilistically assign patient IDs to train/validation/test sets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame with patient_id, encounter, and temporal columns |
required |
start_training_set
|
date
|
Start date for training period |
required |
start_validation_set
|
date
|
Start date for validation period |
required |
start_test_set
|
date
|
Start date for test period |
required |
end_test_set
|
date
|
End date for test period |
required |
date_col
|
str
|
Column name for temporal splitting, by default "arrival_datetime" |
'arrival_datetime'
|
patient_id
|
str
|
Column name for patient identifier, by default "mrn" |
'mrn'
|
visit_col
|
str
|
Column name for visit identifier, by default "encounter" |
'encounter'
|
seed
|
int
|
Random seed for reproducible results, by default 42 |
42
|
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame with patient ID assignments based on weighted random sampling |
Notes
- Counts encounters in each time period per patient ID
- Randomly assigns each patient ID to one set, weighted by their temporal distribution
- Patient with 70% encounters in training, 30% in validation has 70% chance of training assignment
Source code in src/patientflow/prepare.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 |
|
convert_dict_to_values(df, column, prefix)
Convert a column containing dictionaries into separate columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing the dictionary column |
required |
column
|
str
|
Name of the column containing dictionaries to convert |
required |
prefix
|
str
|
Prefix to use for the new column names |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame containing separate columns for each dictionary key, with values extracted from 'value_as_real' or 'value_as_text' if present |
Source code in src/patientflow/prepare.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
convert_set_to_dummies(df, column, prefix)
Convert a column containing sets into dummy variables.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing the set column |
required |
column
|
str
|
Name of the column containing sets to convert |
required |
prefix
|
str
|
Prefix to use for the dummy variable column names |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame containing dummy variables for each unique item in the sets |
Source code in src/patientflow/prepare.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
create_special_category_objects(columns)
Create a configuration for categorising patients with special handling for pediatric cases.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
list or Index
|
The column names available in the dataset. Used to determine which age format is present. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary containing special category configuration with: - 'special_category_func': Function to identify pediatric patients - 'special_category_dict': Default category values - 'special_func_map': Mapping of category names to detection functions |
Source code in src/patientflow/prepare.py
475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 |
|
create_temporal_splits(df, start_train, start_valid, start_test, end_test, col_name='arrival_datetime', patient_id='mrn', visit_col='encounter', seed=42)
Split dataset into temporal train/validation/test sets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input dataframe |
required |
start_train
|
date
|
Training start (inclusive) |
required |
start_valid
|
date
|
Validation start (inclusive) |
required |
start_test
|
date
|
Test start (inclusive) |
required |
end_test
|
date
|
Test end (exclusive) |
required |
col_name
|
str
|
Primary datetime column for splitting, by default "arrival_datetime" |
'arrival_datetime'
|
patient_id
|
str
|
Column name for patient identifier, by default "mrn" |
'mrn'
|
visit_col
|
str
|
Column name for visit identifier, by default "encounter" |
'encounter'
|
seed
|
int
|
Random seed for reproducible results, by default 42 |
42
|
Returns:
Type | Description |
---|---|
Tuple[DataFrame, DataFrame, DataFrame]
|
Tuple containing (train_df, valid_df, test_df) split dataframes |
Notes
Creates temporal data splits using primary datetime column and optional snapshot dates. Handles patient ID grouping if present to prevent data leakage.
Source code in src/patientflow/prepare.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 |
|
create_yta_filters(df)
Create specialty filters for categorizing patients by specialty and age group.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing patient data with columns that include either 'age_on_arrival' or 'age_group' for pediatric classification |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary mapping specialty names to filter configurations. Each configuration contains: - For pediatric specialty: {"is_child": True} - For other specialties: {"specialty": specialty_name, "is_child": False} |
Examples:
>>> df = pd.DataFrame({'patient_id': [1, 2], 'age_on_arrival': [10, 40]})
>>> filters = create_yta_filters(df)
>>> print(filters['paediatric'])
{'is_child': True}
>>> print(filters['medical'])
{'specialty': 'medical', 'is_child': False}
Source code in src/patientflow/prepare.py
520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 |
|
find_group_for_colname(column, dict_col_groups)
Find the group name that a column belongs to in the column groups dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
Name of the column to find the group for |
required |
dict_col_groups
|
dict
|
Dictionary mapping group names to lists of column names |
required |
Returns:
Type | Description |
---|---|
str or None
|
The name of the group the column belongs to, or None if not found |
Source code in src/patientflow/prepare.py
825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 |
|
generate_description(col_name)
Generate a description for a column based on its name and manual descriptions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_name
|
str
|
Name of the column to generate a description for |
required |
Returns:
Type | Description |
---|---|
str
|
A descriptive string explaining the column's purpose and content |
Source code in src/patientflow/prepare.py
710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 |
|
prepare_group_snapshot_dict(df, start_dt=None, end_dt=None)
Prepare a dictionary mapping snapshot dates to their corresponding snapshot indices.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing at least a 'snapshot_date' column |
required |
start_dt
|
date
|
Start date for filtering snapshots, by default None |
None
|
end_dt
|
date
|
End date for filtering snapshots, by default None |
None
|
Returns:
Type | Description |
---|---|
dict
|
A dictionary where: - Keys are dates - Values are arrays of indices corresponding to each date's snapshots - Empty arrays for dates with no snapshots (if start_dt and end_dt are provided) |
Raises:
Type | Description |
---|---|
ValueError
|
If 'snapshot_date' column is not present in the DataFrame |
Source code in src/patientflow/prepare.py
656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 |
|
prepare_patient_snapshots(df, prediction_time, exclude_columns=[], single_snapshot_per_visit=True, visit_col=None, label_col='is_admitted')
Prepare patient snapshots for model training or prediction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing patient visit data |
required |
prediction_time
|
str or datetime
|
The specific prediction time to filter for |
required |
exclude_columns
|
list
|
List of columns to exclude from the final DataFrame, by default [] |
[]
|
single_snapshot_per_visit
|
bool
|
Whether to select only one snapshot per visit, by default True |
True
|
visit_col
|
str
|
Name of the column containing visit identifiers, required if single_snapshot_per_visit is True |
None
|
label_col
|
str
|
Name of the column containing the target labels, by default "is_admitted" |
'is_admitted'
|
Returns:
Type | Description |
---|---|
Tuple[DataFrame, Series]
|
A tuple containing: - DataFrame: Processed DataFrame with features - Series: Corresponding labels |
Raises:
Type | Description |
---|---|
ValueError
|
If single_snapshot_per_visit is True but visit_col is not provided |
Source code in src/patientflow/prepare.py
594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 |
|
select_one_snapshot_per_visit(df, visit_col, seed=42)
Select one random snapshot per visit from a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing visit snapshots |
required |
visit_col
|
str
|
Name of the column containing visit identifiers |
required |
seed
|
int
|
Random seed for reproducibility, by default 42 |
42
|
Returns:
Type | Description |
---|---|
DataFrame
|
DataFrame containing one randomly selected snapshot per visit |
Source code in src/patientflow/prepare.py
566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 |
|
validate_special_category_objects(special_params)
Validate that a special category parameters dictionary contains all required keys.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
special_params
|
Dict[str, Any]
|
Dictionary of special category parameters to validate |
required |
Raises:
Type | Description |
---|---|
MissingKeysError
|
If any required keys are missing from the dictionary |
Source code in src/patientflow/prepare.py
496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 |
|
write_data_dict(df, dict_name, dict_path)
Write a data dictionary for a DataFrame to both Markdown and CSV formats.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame to create a data dictionary for |
required |
dict_name
|
str
|
Base name for the output files (without extension) |
required |
dict_path
|
str or Path
|
Directory path where the data dictionary files will be written |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
The created data dictionary as a DataFrame |
Notes
Creates two files: - {dict_name}.md: Markdown format data dictionary - {dict_name}.csv: CSV format data dictionary
For visit data, includes separate statistics for admitted and non-admitted patients.
Source code in src/patientflow/prepare.py
878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 |
|
survival_curve
Core survival curve calculation functions for patient flow analysis.
This module provides the mathematical computation functions for survival analysis without visualization dependencies.
Functions:
Name | Description |
---|---|
calculate_survival_curve : function |
Calculate survival curve data from patient visit data |
calculate_survival_curve(df, start_time_col, end_time_col)
Calculate survival curve data from patient visit data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing patient visit data |
required |
start_time_col
|
str
|
Name of the column containing the start time (e.g., arrival time) |
required |
end_time_col
|
str
|
Name of the column containing the end time (e.g., admission time) |
required |
Returns:
Type | Description |
---|---|
tuple of (numpy.ndarray, numpy.ndarray, pandas.DataFrame)
|
|
Source code in src/patientflow/survival_curve.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
train
Training module for patient flow models.
This module provides functionality for training various predictive models used in patient flow analysis, including classifiers and demand forecasting models.
classifiers
Machine learning classifiers for patient flow prediction.
This module provides functions for training and evaluating machine learning classifiers for patient admission prediction. It includes utilities for data preparation, model training, hyperparameter tuning, and evaluation using time series cross-validation.
Functions:
Name | Description |
---|---|
evaluate_predictions |
Calculate multiple metrics (AUC, log loss, AUPRC) for given predictions |
chronological_cross_validation |
Perform time series cross-validation with multiple metrics |
initialise_model |
Initialize a model with given hyperparameters |
create_column_transformer |
Create a column transformer for a dataframe with dynamic column handling |
calculate_class_balance |
Calculate class balance ratios for target labels |
get_feature_metadata |
Extract feature names and importances from pipeline |
get_dataset_metadata |
Get dataset sizes and class balances |
create_balance_info |
Create a dictionary with balance information |
evaluate_model |
Evaluate model on test set |
train_classifier |
Train a single model including data preparation and balancing |
train_multiple_classifiers |
Train admission prediction models for multiple prediction times |
calculate_class_balance(y)
Calculate class balance ratios for target labels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y
|
Series
|
Target labels |
required |
Returns:
Type | Description |
---|---|
Dict[Any, float]
|
Dictionary mapping each class to its proportion |
Source code in src/patientflow/train/classifiers.py
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
chronological_cross_validation(pipeline, X, y, n_splits=5)
Perform time series cross-validation with multiple metrics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline
|
Pipeline
|
Sklearn pipeline to evaluate |
required |
X
|
DataFrame
|
Feature matrix |
required |
y
|
Series
|
Target labels |
required |
n_splits
|
int
|
Number of time series splits, by default 5 |
5
|
Returns:
Type | Description |
---|---|
Dict[str, float]
|
Dictionary containing training and validation metrics |
Source code in src/patientflow/train/classifiers.py
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
create_balance_info(is_balanced, original_size, balanced_size, original_positive_rate, balanced_positive_rate, majority_to_minority_ratio)
Create a dictionary with balance information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
is_balanced
|
bool
|
Whether the dataset was balanced |
required |
original_size
|
int
|
Original dataset size |
required |
balanced_size
|
int
|
Size after balancing |
required |
original_positive_rate
|
float
|
Positive class rate before balancing |
required |
balanced_positive_rate
|
float
|
Positive class rate after balancing |
required |
majority_to_minority_ratio
|
float
|
Ratio of majority to minority class samples |
required |
Returns:
Type | Description |
---|---|
Dict[str, Union[bool, int, float]]
|
Dictionary containing balance information |
Source code in src/patientflow/train/classifiers.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
|
create_column_transformer(df, ordinal_mappings=None)
Create a column transformer for a dataframe with dynamic column handling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input dataframe |
required |
ordinal_mappings
|
Dict[str, List[Any]]
|
Mappings for ordinal categorical features, by default None |
None
|
Returns:
Type | Description |
---|---|
ColumnTransformer
|
Configured column transformer |
Source code in src/patientflow/train/classifiers.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
evaluate_model(pipeline, X_test, y_test)
Evaluate model on test set.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline
|
Pipeline
|
Trained sklearn pipeline |
required |
X_test
|
DataFrame
|
Test features |
required |
y_test
|
Series
|
Test labels |
required |
Returns:
Type | Description |
---|---|
Dict[str, float]
|
Dictionary containing test metrics |
Source code in src/patientflow/train/classifiers.py
376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 |
|
evaluate_predictions(y_true, y_pred)
Calculate multiple metrics for given predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y_true
|
NDArray[int_]
|
True binary labels |
required |
y_pred
|
NDArray[float64]
|
Predicted probabilities |
required |
Returns:
Type | Description |
---|---|
FoldResults
|
Object containing AUC, log loss, and AUPRC metrics |
Source code in src/patientflow/train/classifiers.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
get_dataset_metadata(X_train, X_valid, y_train, y_valid, X_test=None, y_test=None)
Get dataset sizes and class balances.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X_train
|
DataFrame
|
Training features |
required |
X_valid
|
DataFrame
|
Validation features |
required |
y_train
|
Series
|
Training labels |
required |
y_valid
|
Series
|
Validation labels |
required |
X_test
|
DataFrame
|
Test features. If None, test set information will be set to None. |
None
|
y_test
|
Series
|
Test labels. If None, test set information will be set to None. |
None
|
Returns:
Type | Description |
---|---|
DatasetMetadata
|
Dictionary containing dataset sizes and class balances |
Source code in src/patientflow/train/classifiers.py
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 |
|
get_feature_metadata(pipeline)
Extract feature names and importances from pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pipeline
|
Pipeline
|
Sklearn pipeline containing feature transformer and classifier |
required |
Returns:
Type | Description |
---|---|
FeatureMetadata
|
Dictionary containing feature names and their importance scores (if available) |
Raises:
Type | Description |
---|---|
AttributeError
|
If the classifier doesn't support feature importance |
Source code in src/patientflow/train/classifiers.py
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
|
initialise_model(model_class, params, xgb_specific_params={'n_jobs': -1, 'eval_metric': 'logloss', 'enable_categorical': True})
Initialize a model with given hyperparameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_class
|
Type
|
The classifier class to instantiate |
required |
params
|
Dict[str, Any]
|
Model-specific parameters to set |
required |
xgb_specific_params
|
Dict[str, Any]
|
XGBoost-specific default parameters |
{'n_jobs': -1, 'eval_metric': 'logloss', 'enable_categorical': True}
|
Returns:
Type | Description |
---|---|
Any
|
Initialized model instance |
Source code in src/patientflow/train/classifiers.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
train_classifier(train_visits, valid_visits, prediction_time, exclude_from_training_data, grid, ordinal_mappings, test_visits=None, visit_col=None, model_class=XGBClassifier, use_balanced_training=True, majority_to_minority_ratio=1.0, calibrate_probabilities=True, calibration_method='sigmoid', single_snapshot_per_visit=True, label_col='is_admitted', evaluate_on_test=False)
Train a single model including data preparation and balancing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_visits
|
DataFrame
|
Training visits dataset |
required |
valid_visits
|
DataFrame
|
Validation visits dataset |
required |
prediction_time
|
Tuple[int, int]
|
The prediction time point to use |
required |
exclude_from_training_data
|
List[str]
|
Columns to exclude from training |
required |
grid
|
Dict[str, List[Any]]
|
Parameter grid for hyperparameter tuning |
required |
ordinal_mappings
|
Dict[str, List[Any]]
|
Mappings for ordinal categorical features |
required |
test_visits
|
DataFrame
|
Test visits dataset. Required only when evaluate_on_test=True. |
None
|
visit_col
|
str
|
Name of the visit column. Required if single_snapshot_per_visit is True. |
None
|
model_class
|
Type
|
The classifier class to use. Must be sklearn-compatible with fit() and predict_proba(). Defaults to XGBClassifier. |
XGBClassifier
|
use_balanced_training
|
bool
|
Whether to use balanced training data |
True
|
majority_to_minority_ratio
|
float
|
Ratio of majority to minority class samples |
1.0
|
calibrate_probabilities
|
bool
|
Whether to apply probability calibration to the best model |
True
|
calibration_method
|
str
|
Method for probability calibration ('isotonic' or 'sigmoid') |
'sigmoid'
|
single_snapshot_per_visit
|
bool
|
Whether to select only one snapshot per visit. If True, visit_col must be provided. |
True
|
label_col
|
str
|
Name of the column containing the target labels |
"is_admitted"
|
evaluate_on_test
|
bool
|
Whether to evaluate the final model on the test set. Set to True only when satisfied with validation performance to avoid test set contamination. |
False
|
Returns:
Type | Description |
---|---|
TrainedClassifier
|
Trained model, including metrics, and feature information |
Source code in src/patientflow/train/classifiers.py
403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 |
|
train_multiple_classifiers(train_visits, valid_visits, grid, exclude_from_training_data, ordinal_mappings, prediction_times, test_visits=None, model_name='admissions', visit_col='visit_number', calibrate_probabilities=True, calibration_method='isotonic', use_balanced_training=True, majority_to_minority_ratio=1.0, label_col='is_admitted', evaluate_on_test=False)
Train admission prediction models for multiple prediction times.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_visits
|
DataFrame
|
Training visits dataset |
required |
valid_visits
|
DataFrame
|
Validation visits dataset |
required |
grid
|
Dict[str, List[Any]]
|
Parameter grid for hyperparameter tuning |
required |
exclude_from_training_data
|
List[str]
|
Columns to exclude from training |
required |
ordinal_mappings
|
Dict[str, List[Any]]
|
Mappings for ordinal categorical features |
required |
prediction_times
|
List[Tuple[int, int]]
|
List of prediction time points |
required |
test_visits
|
DataFrame
|
Test visits dataset, by default None |
None
|
model_name
|
str
|
Name prefix for models, by default "admissions" |
'admissions'
|
visit_col
|
str
|
Name of the visit column, by default "visit_number" |
'visit_number'
|
calibrate_probabilities
|
bool
|
Whether to calibrate probabilities, by default True |
True
|
calibration_method
|
str
|
Calibration method, by default "isotonic" |
'isotonic'
|
use_balanced_training
|
bool
|
Whether to use balanced training, by default True |
True
|
majority_to_minority_ratio
|
float
|
Ratio for class balancing, by default 1.0 |
1.0
|
label_col
|
str
|
Name of the label column, by default "is_admitted" |
'is_admitted'
|
evaluate_on_test
|
bool
|
Whether to evaluate on test set, by default False |
False
|
Returns:
Type | Description |
---|---|
Dict[str, TrainedClassifier]
|
Dictionary mapping model keys to trained classifiers |
Source code in src/patientflow/train/classifiers.py
661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 |
|
emergency_demand
Emergency demand prediction training module.
This module provides functionality that is specific to the implementation of the patientflow package at University College London Hospital (ULCH). It trains models to predict emergency bed demand.
The module trains three model types: 1. Admission prediction models (multiple classifiers, one for each prediction time) 2. Specialty prediction models (sequence-based) 3. Yet-to-arrive prediction models (aspirational)
Functions:
Name | Description |
---|---|
test_real_time_predictions : Test real-time prediction functionality |
Selects random test cases and validates that the trained models can generate predictions as if it where making a real-time prediction. |
train_all_models : Complete training pipeline |
Trains all three model types (admissions, specialty, yet-to-arrive) with proper validation and optional model saving. |
main : Entry point for training pipeline |
Loads configuration, data, and runs the complete training process. |
main(data_folder_name=None)
Main entry point for training patient flow models.
This function orchestrates the complete training pipeline for emergency demand prediction models. It loads configuration, data, and trains all three model types: admission prediction models, specialty prediction models, and yet-to-arrive prediction models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_folder_name
|
str
|
Name of the data folder containing the training datasets. If None, will be extracted from command line arguments. |
None
|
Returns:
Type | Description |
---|---|
None
|
The function trains and optionally saves models but does not return any values. |
Notes
The function performs the following steps: 1. Loads configuration from config.yaml 2. Loads ED visits and inpatient arrivals data 3. Sets up model parameters and hyperparameters 4. Trains admission prediction classifiers 5. Trains specialty prediction sequence model 6. Trains yet-to-arrive prediction model 7. Optionally saves trained models 8. Optionally tests real-time prediction functionality
Source code in src/patientflow/train/emergency_demand.py
368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 |
|
test_real_time_predictions(visits, models, prediction_window, specialties, cdf_cut_points, curve_params, random_seed)
Test real-time predictions by selecting a random sample from the visits dataset and generating predictions using the trained models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
visits
|
DataFrame
|
DataFrame containing visit data with columns including 'prediction_time', 'snapshot_date', and other required features for predictions. |
required |
models
|
Tuple[Dict[str, TrainedClassifier], SequenceToOutcomePredictor, ParametricIncomingAdmissionPredictor]
|
Tuple containing: - trained_classifiers: TrainedClassifier containing admission predictions - spec_model: SequenceToOutcomePredictor for specialty predictions - yet_to_arrive_model: ParametricIncomingAdmissionPredictor for yet-to-arrive predictions |
required |
prediction_window
|
int
|
Size of the prediction window in minutes for which to generate forecasts. |
required |
specialties
|
list[str]
|
List of specialty names to generate predictions for (e.g., ['surgical', 'medical', 'paediatric']). |
required |
cdf_cut_points
|
list[float]
|
List of probability thresholds for cumulative distribution function cut points (e.g., [0.9, 0.7]). |
required |
curve_params
|
tuple[float, float, float, float]
|
Parameters (x1, y1, x2, y2) defining the curve used for predictions. |
required |
random_seed
|
int
|
Random seed for reproducible sampling of test cases. |
required |
Returns:
Type | Description |
---|---|
dict
|
Dictionary containing: - 'prediction_time': str, The time point for which predictions were made - 'prediction_date': str, The date for which predictions were made - 'realtime_preds': dict, The generated predictions for the sample |
Raises:
Type | Description |
---|---|
Exception
|
If real-time inference fails, with detailed error message printed before system exit. |
Notes
The function selects a single random row from the visits DataFrame and generates predictions for that specific time point using all provided models. The predictions are made using the create_predictions() function with the specified parameters.
Source code in src/patientflow/train/emergency_demand.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
train_all_models(visits, start_training_set, start_validation_set, start_test_set, end_test_set, yta, prediction_times, prediction_window, yta_time_interval, epsilon, grid_params, exclude_columns, ordinal_mappings, random_seed, visit_col='visit_number', specialties=None, cdf_cut_points=None, curve_params=None, model_file_path=None, save_models=True, test_realtime=True)
Train and evaluate patient flow models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
visits
|
DataFrame
|
DataFrame containing visit data. |
required |
yta
|
DataFrame
|
DataFrame containing yet-to-arrive data. |
required |
prediction_times
|
list
|
List of times for making predictions. |
required |
prediction_window
|
int
|
Prediction window size in minutes. |
required |
yta_time_interval
|
int
|
Interval size for yet-to-arrive predictions in minutes. |
required |
epsilon
|
float
|
Epsilon parameter for model training. |
required |
grid_params
|
dict
|
Hyperparameter grid for model training. |
required |
exclude_columns
|
list
|
Columns to exclude during training. |
required |
ordinal_mappings
|
dict
|
Ordinal variable mappings for categorical features. |
required |
random_seed
|
int
|
Random seed for reproducibility. |
required |
visit_col
|
str
|
Name of column in dataset that is used to identify a hospital visit (eg visit_number, csn). |
'visit_number'
|
specialties
|
list
|
List of specialties to consider. Required if test_realtime is True. |
None
|
cdf_cut_points
|
list
|
CDF cut points for predictions. Required if test_realtime is True. |
None
|
curve_params
|
tuple
|
Curve parameters (x1, y1, x2, y2). Required if test_realtime is True. |
None
|
model_file_path
|
Path
|
Path to save trained models. Required if save_models is True. |
None
|
save_models
|
bool
|
Whether to save the trained models to disk. Defaults to True. |
True
|
test_realtime
|
bool
|
Whether to run real-time prediction tests. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
None
|
|
Raises:
Type | Description |
---|---|
ValueError
|
If save_models is True but model_file_path is not provided, or if test_realtime is True but any of specialties, cdf_cut_points, or curve_params are not provided. |
Notes
The function generates model names internally: - "admissions": "admissions" - "specialty": "ed_specialty" - "yet_to_arrive": f"yet_to_arrive_{int(prediction_window.total_seconds()/3600)}_hours"
Source code in src/patientflow/train/emergency_demand.py
167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 |
|
incoming_admission_predictor
Training utility for parametric admission prediction models.
This module provides functions for training parametric admission prediction models, specifically for predicting yet-to-arrive (YTA) patient volumes using parametric curves. It includes utilities for creating specialty filters and training parametric admission predictors.
The logic in this module is specific to the implementation at UCLH.
create_yta_filters(df)
Create specialty filters for categorizing patients by specialty and age group.
This function generates a dictionary of filters based on specialty categories, with special handling for pediatric patients. It uses the SpecialCategoryParams class to determine which specialties correspond to pediatric care.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing patient data with columns that include either 'age_on_arrival' or 'age_group' for pediatric classification. |
required |
Returns:
Type | Description |
---|---|
dict
|
A dictionary mapping specialty names to filter configurations. Each configuration contains: - For pediatric specialty: {"is_child": True} - For other specialties: {"specialty": specialty_name, "is_child": False} |
Source code in src/patientflow/train/incoming_admission_predictor.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
train_parametric_admission_predictor(train_visits, train_yta, prediction_window, yta_time_interval, prediction_times, num_days, epsilon=1e-06)
Train a parametric yet-to-arrive prediction model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_visits
|
DataFrame
|
Visits dataset (used for identifying special categories). |
required |
train_yta
|
DataFrame
|
Training data for yet-to-arrive predictions. |
required |
prediction_window
|
timedelta
|
Time window for predictions as a timedelta. |
required |
yta_time_interval
|
timedelta
|
Time interval for predictions as a timedelta. |
required |
prediction_times
|
List[float]
|
List of prediction times. |
required |
num_days
|
int
|
Number of days to consider. |
required |
epsilon
|
float
|
Epsilon parameter for model, by default 10e-7. |
1e-06
|
Returns:
Type | Description |
---|---|
ParametricIncomingAdmissionPredictor
|
Trained ParametricIncomingAdmissionPredictor model. |
Raises:
Type | Description |
---|---|
TypeError
|
If prediction_window or yta_time_interval are not timedelta objects. |
Source code in src/patientflow/train/incoming_admission_predictor.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
|
sequence_predictor
Training utility for sequence prediction models.
This module provides functions for training sequence-based prediction models, specifically for predicting patient outcomes based on visit sequences. It includes utilities for filtering patient data and training specialized sequence predictors.
The logic in this module is specific to the implementation at UCLH.
get_default_visits(admitted)
Filter a dataframe of patient visits to include only non-pediatric patients.
This function identifies and removes pediatric patients from the dataset based on both age criteria and specialty assignment. It automatically detects the appropriate age column format from the provided dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
admitted
|
DataFrame
|
A pandas DataFrame containing patient visit information. Must include either 'age_on_arrival' or 'age_group' columns, and a 'specialty' column. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
A filtered DataFrame containing only non-pediatric patients (adults). |
Notes
The function automatically detects which age-related columns are present in the dataframe and configures the appropriate filtering logic. It removes patients who are either: 1. Identified as pediatric based on age criteria, or 2. Assigned to a pediatric specialty
Source code in src/patientflow/train/sequence_predictor.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
|
train_sequence_predictor(train_visits, model_name, visit_col, input_var, grouping_var, outcome_var)
Train a specialty prediction model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
train_visits
|
DataFrame
|
Training data containing visit information. |
required |
model_name
|
str
|
Name identifier for the model. |
required |
visit_col
|
str
|
Column name containing visit identifiers. |
required |
input_var
|
str
|
Column name for input sequence. |
required |
grouping_var
|
str
|
Column name for grouping sequence. |
required |
outcome_var
|
str
|
Column name for target variable. |
required |
Returns:
Type | Description |
---|---|
SequencePredictor
|
Trained SequencePredictor model. |
Source code in src/patientflow/train/sequence_predictor.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
utils
save_model(model, model_name, model_file_path)
Save trained model(s) to disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
object or dict
|
A single model instance or a dictionary of models to save. |
required |
model_name
|
str
|
Base name to use for saving the model(s). |
required |
model_file_path
|
Path
|
Directory path where the model(s) will be saved. |
required |
Returns:
Type | Description |
---|---|
None
|
|
Source code in src/patientflow/train/utils.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
|
viz
Visualization module for patient flow analysis.
This module provides various plotting and visualization functions for analyzing patient flow data, model results, and evaluation metrics.
arrival_rates
Visualization functions for inpatient arrival rates and cumulative statistics.
This module provides functions to visualize time-varying arrival rates and cumulative arrivals, over the course of a day.
Functions:
Name | Description |
---|---|
annotate_hour_line : function |
Annotate hour lines on a matplotlib plot |
plot_arrival_rates : function |
Plot arrival rates for one or two datasets |
plot_cumulative_arrival_rates : function |
Plot cumulative arrival rates with statistical distributions |
annotate_hour_line(hour_line, y_value, hour_values, start_plot_index, line_styles, x_margin, annotation_prefix, text_y_offset=1, text_x_position=None, slope=None, x1=None, y1=None)
Annotate hour lines on a matplotlib plot with consistent formatting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
hour_line
|
int
|
The hour to annotate on the plot. |
required |
y_value
|
float
|
The y-coordinate for annotation positioning. |
required |
hour_values
|
list of int
|
Hour values corresponding to the x-axis positions. |
required |
start_plot_index
|
int
|
Starting index for the plot's data. |
required |
line_styles
|
dict
|
Line styles for annotations keyed by hour. |
required |
x_margin
|
float
|
Margin added to x-axis for annotation positioning. |
required |
annotation_prefix
|
str
|
Prefix for the annotation text (e.g., "On average"). |
required |
text_y_offset
|
float
|
Vertical offset for the annotation text from the line, by default 1. |
1
|
text_x_position
|
float
|
Horizontal position for annotation text, by default None. |
None
|
slope
|
float
|
Slope of a line for extended annotations, by default None. |
None
|
x1
|
float
|
Reference x-coordinate for slope-based annotation, by default None. |
None
|
y1
|
float
|
Reference y-coordinate for slope-based annotation, by default None. |
None
|
Source code in src/patientflow/viz/arrival_rates.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
draw_window_visualization(ax, hour_values, window_params, annotation_prefix, start_window, end_window)
Draw the window visualization with annotations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ax
|
Axes
|
The axes to draw on |
required |
hour_values
|
array - like
|
Hour labels for x-axis |
required |
window_params
|
tuple
|
(slope, x1, y1, y2) from get_window_parameters |
required |
annotation_prefix
|
str
|
Prefix for annotations |
required |
start_window
|
int
|
Start hour for window |
required |
end_window
|
int
|
End hour for window |
required |
Source code in src/patientflow/viz/arrival_rates.py
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 |
|
get_window_parameters(data, start_window, end_window, hour_values)
Calculate window parameters for visualization.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
array - like
|
Reindexed cumulative data |
required |
start_window
|
int
|
Start position in reindexed space |
required |
end_window
|
int
|
End position in reindexed space |
required |
hour_values
|
array - like
|
Original hour values for display |
required |
Returns:
Type | Description |
---|---|
tuple
|
(slope, x1, y1, x2, y2) where: - slope: float, The calculated slope of the line - x1: float, Start hour value - y1: float, Start y-value - x2: float, End hour value - y2: float, End y-value |
Source code in src/patientflow/viz/arrival_rates.py
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 |
|
plot_arrival_rates(inpatient_arrivals, title, inpatient_arrivals_2=None, labels=None, lagged_by=None, curve_params=None, time_interval=60, start_plot_index=0, x_margin=0.5, file_prefix='', media_file_path=None, file_name=None, num_days=None, num_days_2=None, return_figure=False)
Plot arrival rates for one or two datasets with optional lagged and spread rates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inpatient_arrivals
|
array - like
|
Primary dataset of inpatient arrivals. |
required |
title
|
str
|
Title of the plot. |
required |
inpatient_arrivals_2
|
array - like
|
Optional second dataset for comparison, by default None. |
None
|
labels
|
tuple of str
|
Labels for the datasets when comparing two datasets, by default None. |
None
|
lagged_by
|
int
|
Time lag in hours to apply to the arrival rates, by default None. |
None
|
curve_params
|
tuple of float
|
Parameters for spread arrival rates as (x1, y1, x2, y2), by default None. |
None
|
time_interval
|
int
|
Time interval in minutes for arrival rate calculations, by default 60. |
60
|
start_plot_index
|
int
|
Starting hour index for plotting, by default 0. |
0
|
x_margin
|
float
|
Margin on the x-axis, by default 0.5. |
0.5
|
file_prefix
|
str
|
Prefix for the saved file name, by default "". |
''
|
media_file_path
|
str or Path
|
Directory path to save the plot, by default None. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, uses file_prefix + cleaned title. |
None
|
num_days
|
int
|
Number of days in the first dataset, by default None. |
None
|
num_days_2
|
int
|
Number of days in the second dataset, by default None. |
None
|
return_figure
|
bool
|
If True, returns the matplotlib figure instead of displaying it, by default False. |
False
|
Returns:
Type | Description |
---|---|
Figure or None
|
Returns the figure if return_figure is True, otherwise displays the plot. |
Source code in src/patientflow/viz/arrival_rates.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 |
|
plot_cumulative_arrival_rates(inpatient_arrivals, title, curve_params=None, lagged_by=None, time_interval=60, start_plot_index=0, draw_window=None, x_margin=0.5, file_prefix='', set_y_lim=None, hour_lines=[12, 17], line_styles={12: '--', 17: ':', 20: '--'}, annotation_prefix='On average', line_colour='red', media_file_path=None, file_name=None, plot_centiles=False, highlight_centile=0.9, centiles=[0.3, 0.5, 0.7, 0.9, 0.99], markers=['D', 's', '^', 'o', 'v'], line_styles_centiles=['-.', '--', ':', '-', '-'], bed_type_spec='', text_y_offset=1, num_days=None, return_figure=False)
Plot cumulative arrival rates with optional statistical distributions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inpatient_arrivals
|
array - like
|
Dataset of inpatient arrivals. |
required |
title
|
str
|
Title of the plot. |
required |
curve_params
|
tuple of float
|
Parameters for spread rates as (x1, y1, x2, y2), by default None. |
None
|
lagged_by
|
int
|
Time lag in hours for cumulative rates, by default None. |
None
|
time_interval
|
int
|
Time interval in minutes for rate calculations, by default 60. |
60
|
start_plot_index
|
int
|
Starting hour index for plotting, by default 0. |
0
|
draw_window
|
tuple of int
|
Time window for detailed annotation, by default None. |
None
|
x_margin
|
float
|
Margin on the x-axis, by default 0.5. |
0.5
|
file_prefix
|
str
|
Prefix for the saved file name, by default "". |
''
|
set_y_lim
|
float
|
Upper limit for the y-axis, by default None. |
None
|
hour_lines
|
list of int
|
Specific hours to annotate, by default [12, 17]. |
[12, 17]
|
line_styles
|
dict
|
Line styles for hour annotations keyed by hour, by default {12: "--", 17: ":", 20: "--"}. |
{12: '--', 17: ':', 20: '--'}
|
annotation_prefix
|
str
|
Prefix for annotations, by default "On average". |
'On average'
|
line_colour
|
str
|
Color for the main line plot, by default "red". |
'red'
|
media_file_path
|
str or Path
|
Directory path to save the plot, by default None. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, uses file_prefix + cleaned title. |
None
|
plot_centiles
|
bool
|
Whether to include percentile visualization, by default False. |
False
|
highlight_centile
|
float
|
Percentile to emphasize, by default 0.9. If 1.0 is provided, will use 0.9999 instead. |
0.9
|
centiles
|
list of float
|
List of percentiles to calculate, by default [0.3, 0.5, 0.7, 0.9, 0.99]. |
[0.3, 0.5, 0.7, 0.9, 0.99]
|
markers
|
list of str
|
Marker styles for percentile lines, by default ["D", "s", "^", "o", "v"]. |
['D', 's', '^', 'o', 'v']
|
line_styles_centiles
|
list of str
|
Line styles for percentile visualization, by default ["-.", "--", ":", "-", "-"]. |
['-.', '--', ':', '-', '-']
|
bed_type_spec
|
str
|
Specification for bed type in annotations, by default "". |
''
|
text_y_offset
|
float
|
Vertical offset for text annotations, by default 1. |
1
|
num_days
|
int
|
Number of days in the dataset, by default None. |
None
|
return_figure
|
bool
|
If True, returns the matplotlib figure instead of displaying it, by default False. |
False
|
Returns:
Type | Description |
---|---|
Figure or None
|
Returns the figure if return_figure is True, otherwise displays the plot. |
Source code in src/patientflow/viz/arrival_rates.py
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 |
|
aspirational_curve
Visualization module for plotting aspirational curves in patient flow analysis.
This module provides functionality for creating and customizing plots of aspirational curves, which represent the probability of admission over time. These curves are useful for setting aspirational targets in healthcare settings.
Functions:
Name | Description |
---|---|
plot_curve : function |
Plot an aspirational curve with specified points and optional annotations |
Examples:
>>> plot_curve(
... title="Admission Probability Curve",
... x1=4,
... y1=0.2,
... x2=24,
... y2=0.8,
... include_titles=True
... )
plot_curve(title, x1, y1, x2, y2, figsize=(10, 5), include_titles=False, text_size=14, media_file_path=None, file_name=None, return_figure=False, annotate_points=False)
Plot an aspirational curve with specified points and optional annotations.
This function creates a plot of an aspirational curve between two points, with options for customization of the visualization including titles, annotations, and saving to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
title
|
str
|
The title of the plot. |
required |
x1
|
float
|
x-coordinate of the first point. |
required |
y1
|
float
|
y-coordinate of the first point (probability value). |
required |
x2
|
float
|
x-coordinate of the second point. |
required |
y2
|
float
|
y-coordinate of the second point (probability value). |
required |
figsize
|
tuple of int
|
Figure size in inches (width, height), by default (10, 5). |
(10, 5)
|
include_titles
|
bool
|
Whether to include axis labels and title, by default False. |
False
|
text_size
|
int
|
Font size for text elements, by default 14. |
14
|
media_file_path
|
str or Path
|
Path to save the plot image, by default None. |
None
|
file_name
|
str
|
Custom filename for saving the plot. If not provided, uses a cleaned version of the title. |
None
|
return_figure
|
bool
|
Whether to return the figure object instead of displaying it, by default False. |
False
|
annotate_points
|
bool
|
Whether to add coordinate annotations to the points, by default False. |
False
|
Returns:
Type | Description |
---|---|
Figure or None
|
The figure object if return_figure is True, otherwise None. |
Notes
The function creates a curve between two points using the create_curve function and adds various visualization elements including grid lines, annotations, and optional titles.
Source code in src/patientflow/viz/aspirational_curve.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
calibration
Calibration plot visualization module.
This module creates calibration plots for trained models, showing how well the predicted probabilities align with actual outcomes.
Functions:
Name | Description |
---|---|
plot_calibration : function |
Plot calibration curves for multiple models |
plot_calibration(trained_models, test_visits, exclude_from_training_data, strategy='uniform', media_file_path=None, file_name=None, suptitle=None, return_figure=False, label_col='is_admitted')
Plot calibration curves for multiple models.
A calibration plot shows how well the predicted probabilities from a model align with the actual outcomes. The plot compares the mean predicted probability with the fraction of positive outcomes for different probability bins.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trained_models
|
list[TrainedClassifier] or dict[str, TrainedClassifier]
|
List of TrainedClassifier objects or dictionary with TrainedClassifier values. |
required |
test_visits
|
DataFrame
|
DataFrame containing test visit data. |
required |
exclude_from_training_data
|
list
|
Columns to exclude from the test data. |
required |
strategy
|
(uniform, quantile)
|
Strategy for calibration curve binning. - 'uniform': Bins are of equal width - 'quantile': Bins have equal number of samples |
'uniform'
|
media_file_path
|
Path
|
Path where the plot should be saved. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "calibration_plot.png". |
None
|
suptitle
|
str
|
Optional super title for the entire figure. |
None
|
return_figure
|
bool
|
If True, returns the figure instead of displaying it. |
False
|
label_col
|
str
|
Name of the column containing the target labels. |
'is_admitted'
|
Returns:
Type | Description |
---|---|
Figure or None
|
If return_figure is True, returns the figure object. Otherwise, displays the plot and returns None. |
Notes
The function creates a subplot for each trained model, sorted by prediction time. Each subplot shows the calibration curve and a reference line for perfect calibration.
Source code in src/patientflow/viz/calibration.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
data_distribution
Visualisation module for plotting data distributions.
This module provides functions for creating distribution plots of data variables grouped by categories.
Functions:
Name | Description |
---|---|
plot_data_distribution : function |
Plot distributions of data variables grouped by categories |
plot_data_distribution(df, col_name, grouping_var, grouping_var_name, plot_type='both', title=None, rotate_x_labels=False, is_discrete=False, ordinal_order=None, media_file_path=None, file_name=None, return_figure=False, truncate_outliers=True, outlier_method='zscore', outlier_threshold=2.0)
Plot distributions of data variables grouped by categories.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Input DataFrame containing the data to plot |
required |
col_name
|
str
|
Name of the column to plot distributions for |
required |
grouping_var
|
str
|
Name of the column to group the data by |
required |
grouping_var_name
|
str
|
Display name for the grouping variable |
required |
plot_type
|
(both, hist, kde)
|
Type of plot to create. 'both' shows histogram with KDE, 'hist' shows only histogram, 'kde' shows only KDE plot |
'both'
|
title
|
str
|
Title for the plot |
None
|
rotate_x_labels
|
bool
|
Whether to rotate x-axis labels by 90 degrees |
False
|
is_discrete
|
bool
|
Whether the data is discrete |
False
|
ordinal_order
|
list
|
Order of categories for ordinal data |
None
|
media_file_path
|
Path
|
Path where the plot should be saved |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "data_distributions.png". |
None
|
return_figure
|
bool
|
If True, returns the figure instead of displaying it |
False
|
truncate_outliers
|
bool
|
Whether to truncate the x-axis to exclude extreme outliers |
True
|
outlier_method
|
(iqr, zscore)
|
Method to detect outliers. 'iqr' uses interquartile range, 'zscore' uses z-score |
'iqr'
|
outlier_threshold
|
float
|
Threshold for outlier detection. For IQR method, this is the multiplier. For z-score method, this is the number of standard deviations. |
1.5
|
Returns:
Type | Description |
---|---|
FacetGrid or None
|
If return_figure is True, returns the FacetGrid object. Otherwise, displays the plot and returns None. |
Raises:
Type | Description |
---|---|
ValueError
|
If plot_type is not one of 'both', 'hist', or 'kde' If outlier_method is not one of 'iqr' or 'zscore' |
Source code in src/patientflow/viz/data_distribution.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
|
epudd
Generate plots comparing observed values with model predictions for discrete distributions.
An Evaluating Predictions for Unique, Discrete, Distributions (EPUDD) plot displays the model's predicted CDF values alongside the actual observed values' positions within their predicted CDF intervals. For discrete distributions, each predicted value has an associated probability, and the CDF is calculated by sorting the values and computing cumulative probabilities.
The plot can show three possible positions for each observation within its predicted interval:
* lower bound of the interval
* midpoint of the interval
* upper bound of the interval
By default, the plot only shows the midpoint of the interval.
For a well-calibrated model, the observed values should fall within their predicted intervals, with the distribution of positions showing appropriate uncertainty.
The visualisation helps assess model calibration by comparing: 1. The predicted cumulative distribution function (CDF) values 2. The actual positions of observations within their predicted intervals 3. The spread and distribution of these positions
Functions:
Name | Description |
---|---|
plot_epudd : function |
Generates and plots the comparison of model predictions with observed values. |
plot_epudd(prediction_times, prob_dist_dict_all, model_name='admissions', return_figure=False, return_dataframe=False, figsize=None, suptitle=None, media_file_path=None, file_name=None, plot_all_bounds=False)
Generates plots comparing model predictions with observed values for discrete distributions.
For discrete distributions, each predicted value has an associated probability. The CDF is calculated by sorting the values and computing cumulative probabilities, normalized by the number of time points.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_times
|
list of tuple
|
List of (hour, minute) tuples representing times for which predictions were made. |
required |
prob_dist_dict_all
|
dict
|
Dictionary of probability distributions keyed by model_key. Each entry contains information about predicted distributions and observed values for different snapshot dates. The predicted distributions should be discrete probability mass functions, with each value having an associated probability. |
required |
model_name
|
str
|
Base name of the model to construct model keys, by default "admissions". |
'admissions'
|
return_figure
|
bool
|
If True, returns the figure object instead of displaying it, by default False. |
False
|
return_dataframe
|
bool
|
If True, returns a dictionary of observation dataframes by model_key, by default False. The dataframes contain the merged observation and prediction data for analysis. |
False
|
figsize
|
tuple of (float, float)
|
Size of the figure in inches as (width, height). If None, calculated automatically based on number of plots, by default None. |
None
|
suptitle
|
str
|
Super title for the entire figure, displayed above all subplots, by default None. |
None
|
media_file_path
|
Path
|
Path to save the plot, by default None. If provided, saves the plot as a PNG file. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "plot_epudd.png". |
None
|
plot_all_bounds
|
bool
|
If True, plots all bounds (lower, mid, upper). If False, only plots mid bounds. By default False. |
False
|
Returns:
Type | Description |
---|---|
Figure
|
The figure object containing the plots, if return_figure is True. |
dict
|
Dictionary of observation dataframes by model_key, if return_dataframe is True. |
tuple
|
Tuple of (figure, dataframes_dict) if both return_figure and return_dataframe are True. |
None
|
If neither return_figure nor return_dataframe is True, displays the plots and returns None. |
Notes
For discrete distributions, the CDF is calculated by:
1. Sorting the predicted values
2. Computing cumulative probabilities for each value
3. Normalizing by the number of time points
The plot shows three possible positions for each observation:
* lower_cdf (pink): Uses the lower bound of the CDF interval
* mid_cdf (green): Uses the midpoint of the CDF interval
* upper_cdf (light blue): Uses the upper bound of the CDF interval
The black points represent the model's predicted CDF values, calculated from the sorted values and their associated probabilities, while the colored points show where the actual observations fall within their predicted intervals. For a well-calibrated model, the observed values should fall within their predicted intervals, with the distribution of positions showing appropriate uncertainty.
Source code in src/patientflow/viz/epudd.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
|
estimated_probabilities
Visualization module for plotting estimated probabilities from trained models.
This module provides functions for creating distribution plots of estimated probabilities from trained classification models.
Functions:
Name | Description |
---|---|
plot_estimated_probabilities : function |
Plot estimated probability distributions for multiple models |
plot_estimated_probabilities(trained_models, test_visits, exclude_from_training_data, bins=30, media_file_path=None, file_name=None, suptitle=None, return_figure=False, label_col='is_admitted')
Plot estimated probability distributions for multiple models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trained_models
|
list[TrainedClassifier] or dict[str, TrainedClassifier]
|
List of TrainedClassifier objects or dict with TrainedClassifier values |
required |
test_visits
|
DataFrame
|
DataFrame containing test visit data |
required |
exclude_from_training_data
|
list
|
Columns to exclude from the test data |
required |
bins
|
int
|
Number of bins for the histograms |
30
|
media_file_path
|
Path
|
Path where the plot should be saved |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "estimated_probabilities.png". |
None
|
suptitle
|
str
|
Optional super title for the entire figure |
None
|
return_figure
|
bool
|
If True, returns the figure instead of displaying it |
False
|
label_col
|
str
|
Name of the column containing the target labels |
"is_admitted"
|
Returns:
Type | Description |
---|---|
Figure or None
|
If return_figure is True, returns the figure object. Otherwise, displays the plot and returns None. |
Source code in src/patientflow/viz/estimated_probabilities.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
features
Visualisation module for plotting feature importances from trained models.
This module provides functionality to visualize feature importances from trained classifiers, allowing for comparison across different prediction time points.
Functions:
Name | Description |
---|---|
plot_features : function |
Plot feature importance for multiple models |
plot_features(trained_models, media_file_path=None, file_name=None, top_n=20, suptitle=None, return_figure=False)
Plot feature importance for multiple models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trained_models
|
list[TrainedClassifier] or dict[str, TrainedClassifier]
|
List of TrainedClassifier objects or dictionary with TrainedClassifier values. |
required |
media_file_path
|
Path
|
Path where the plot should be saved. If None, the plot is only displayed. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "feature_importance_plots.png". |
None
|
top_n
|
int
|
Number of top features to display. |
20
|
suptitle
|
str
|
Super title for the entire figure. |
None
|
return_figure
|
bool
|
If True, returns the figure instead of displaying it. |
False
|
Returns:
Type | Description |
---|---|
Figure or None
|
The matplotlib figure if return_figure is True, otherwise None. |
Notes
The function sorts models by prediction time and creates a horizontal bar plot for each model showing the top N most important features. Feature names are truncated to 25 characters for better display.
Source code in src/patientflow/viz/features.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
madcap
Module for generating MADCAP (Model Accuracy and Discriminative Calibration Plots) visualizations.
MADCAP plots compare model-predicted probabilities to observed outcomes, helping to assess model calibration and discrimination. The plots can be generated for individual prediction times or for specific groups (e.g., age groups).
Functions:
Name | Description |
---|---|
classify_age : function |
Classifies age into categories based on numeric values or age group strings. |
plot_madcap : function |
Generates MADCAP plots for a list of trained models, comparing estimated probabilities to observed values. |
_plot_madcap_subplot : function |
Plots a single MADCAP subplot showing cumulative predicted and observed values. |
_plot_madcap_by_group_single : function |
Generates MADCAP plots for specific groups at a given prediction time. |
plot_madcap_by_group : function |
Generates MADCAP plots for different groups across multiple prediction times. |
plot_madcap_by_group |
Generates MADCAP plots for groups (e.g., age groups) across a series of prediction times. |
classify_age(age, age_categories=None)
Classify age into categories based on numeric values or age group strings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
age
|
int, float, or str
|
Age value (e.g., 30) or age group string (e.g., '18-24'). |
required |
age_categories
|
dict
|
Dictionary defining age categories and their ranges. If not provided, uses DEFAULT_AGE_CATEGORIES. Expected format: { "category_name": { "numeric": {"min": min_age, "max": max_age}, "groups": ["age_group1", "age_group2", ...] } } |
None
|
Returns:
Type | Description |
---|---|
str
|
Category name based on the age or age group, or 'unknown' for unexpected or invalid values. |
Examples:
>>> classify_age(25)
'adults'
>>> classify_age('65-74')
'65 or over'
Source code in src/patientflow/viz/madcap.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
plot_madcap(trained_models, test_visits, exclude_from_training_data, media_file_path=None, file_name=None, suptitle=None, return_figure=False, label_col='is_admitted')
Generate MADCAP plots for a list of trained models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trained_models
|
list[TrainedClassifier] or dict[str, TrainedClassifier]
|
List of trained classifier objects or dictionary with TrainedClassifier values. |
required |
test_visits
|
DataFrame
|
DataFrame containing test visit data. |
required |
exclude_from_training_data
|
List[str]
|
List of columns to exclude from training data. |
required |
media_file_path
|
Path
|
Directory path where the generated plots will be saved. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "madcap_plot.png". |
None
|
suptitle
|
str
|
Suptitle for the plot. |
None
|
return_figure
|
bool
|
If True, returns the figure object instead of displaying it. |
False
|
label_col
|
str
|
Name of the column containing the target labels. |
"is_admitted"
|
Returns:
Type | Description |
---|---|
Optional[Figure]
|
The figure if return_figure is True, None otherwise. |
Source code in src/patientflow/viz/madcap.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 |
|
plot_madcap_by_group(trained_models, test_visits, exclude_from_training_data, grouping_var, grouping_var_name, media_file_path=None, file_name=None, plot_difference=False, return_figure=False, label_col='is_admitted')
Generate MADCAP plots for different groups across multiple prediction times.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trained_models
|
list[TrainedClassifier] or dict[str, TrainedClassifier]
|
List of trained classifier objects or dictionary with TrainedClassifier values. |
required |
test_visits
|
DataFrame
|
DataFrame containing the test visit data. |
required |
exclude_from_training_data
|
List[str]
|
List of columns to exclude from training data. |
required |
grouping_var
|
str
|
The column name in the dataset that defines the grouping variable. |
required |
grouping_var_name
|
str
|
A descriptive name for the grouping variable, used in plot titles. |
required |
media_file_path
|
Path
|
Directory path where the generated plots will be saved. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to a generated name based on group and time. |
None
|
plot_difference
|
bool
|
If True, includes difference plot between predicted and observed outcomes. |
False
|
return_figure
|
bool
|
If True, returns a list of figure objects instead of displaying them. |
False
|
label_col
|
str
|
Name of the column containing the target labels. |
"is_admitted"
|
Returns:
Type | Description |
---|---|
Optional[List[Figure]]
|
List of figures if return_figure is True, None otherwise. |
Source code in src/patientflow/viz/madcap.py
435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 |
|
observed_against_expected
Visualisation utilities for evaluating patient flow predictions.
This module provides functions for creating visualizations to evaluate the accuracy and performance of patient flow predictions, particularly focusing on comparing observed versus expected values.
Functions:
Name | Description |
---|---|
plot_deltas : function |
Plot histograms of observed minus expected values |
plot_arrival_delta_single_instance : function |
Plot comparison between observed arrivals and expected arrival rates |
plot_arrival_deltas : function |
Plot delta charts for multiple snapshot dates on the same figure |
plot_arrival_delta_single_instance(df, prediction_time, snapshot_date, prediction_window, yta_time_interval=timedelta(minutes=15), show_delta=True, show_only_delta=False, media_file_path=None, file_name=None, return_figure=False, fig_size=(10, 4))
Plot comparison between observed arrivals and expected arrival rates.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing arrival data |
required |
prediction_time
|
tuple
|
(hour, minute) of prediction time |
required |
snapshot_date
|
date
|
Date to analyze |
required |
prediction_window
|
int
|
Prediction window in minutes |
required |
show_delta
|
bool
|
If True, plot the difference between actual and expected arrivals |
True
|
show_only_delta
|
bool
|
If True, only plot the delta between actual and expected arrivals |
False
|
yta_time_interval
|
int
|
Time interval in minutes for calculating arrival rates |
15
|
media_file_path
|
Path
|
Path to save the plot |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "arrival_comparison.png" |
None
|
return_figure
|
bool
|
If True, returns the figure instead of displaying it |
False
|
fig_size
|
tuple
|
Figure size as (width, height) in inches |
(10, 4)
|
Returns:
Type | Description |
---|---|
Figure or None
|
The figure object if return_figure is True, otherwise None |
Source code in src/patientflow/viz/observed_against_expected.py
307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 |
|
plot_arrival_deltas(df, prediction_time, snapshot_dates, prediction_window, yta_time_interval=timedelta(minutes=15), media_file_path=None, file_name=None, return_figure=False, fig_size=(15, 6))
Plot delta charts for multiple snapshot dates on the same figure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame containing arrival data |
required |
prediction_time
|
tuple
|
(hour, minute) of prediction time |
required |
snapshot_dates
|
list
|
List of datetime.date objects to analyze |
required |
prediction_window
|
timedelta
|
Prediction window in minutes |
required |
yta_time_interval
|
int
|
Time interval in minutes for calculating arrival rates |
15
|
media_file_path
|
Path
|
Path to save the plot |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "multiple_deltas.png" |
None
|
return_figure
|
bool
|
If True, returns the figure instead of displaying it |
False
|
fig_size
|
tuple
|
Figure size as (width, height) in inches |
(15, 6)
|
Returns:
Type | Description |
---|---|
Figure or None
|
The figure object if return_figure is True, otherwise None |
Source code in src/patientflow/viz/observed_against_expected.py
493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 |
|
plot_deltas(results1, results2=None, title1=None, title2=None, main_title='Histograms of Observed - Expected Values', xlabel='Observed minus expected', media_file_path=None, file_name=None, return_figure=False)
Plot histograms of observed minus expected values.
Creates a grid of histograms showing the distribution of differences between observed and expected values for different prediction times. Optionally compares two sets of results side by side.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
results1
|
dict
|
First set of results containing observed and expected values for different prediction times. Keys are prediction times, values are dicts with 'observed' and 'expected' arrays. |
required |
results2
|
dict
|
Second set of results for comparison, following the same format as results1. |
None
|
title1
|
str
|
Title for the first set of results. |
None
|
title2
|
str
|
Title for the second set of results. |
None
|
main_title
|
str
|
Main title for the entire plot. |
"Histograms of Observed - Expected Values"
|
xlabel
|
str
|
Label for the x-axis of each histogram. |
"Observed minus expected"
|
media_file_path
|
Path
|
Path where the plot should be saved. If provided, saves the plot as a PNG file. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "observed_vs_expected.png". |
None
|
return_figure
|
bool
|
If True, returns the matplotlib figure object instead of displaying it. |
False
|
Returns:
Type | Description |
---|---|
Figure or None
|
The figure object if return_figure is True, otherwise None. |
Notes
The function creates a grid of histograms with a maximum of 5 columns. Each histogram shows the distribution of differences between observed and expected values for a specific prediction time. A red dashed line at x=0 indicates where observed equals expected.
Source code in src/patientflow/viz/observed_against_expected.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
|
probability_distribution
Module for generating probability distribution visualizations.
Functions:
Name | Description |
---|---|
plot_prob_dist : Plot a probability distribution as a bar chart with enhanced plotting options. |
|
plot_prob_dist(prob_dist_data, title, media_file_path=None, figsize=(6, 3), include_titles=False, truncate_at_beds=None, text_size=None, bar_colour='#5B9BD5', file_name=None, probability_thresholds=None, show_probability_thresholds=True, probability_levels=None, plot_bed_base=None, xlabel='Number of beds', return_figure=False)
Plot a probability distribution as a bar chart with enhanced plotting options.
This function generates a bar plot for a given probability distribution, either as a pandas DataFrame, a scipy.stats distribution object (e.g., Poisson), or a dictionary. The plot can be customized with titles, axis labels, markers, and additional visual properties.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prob_dist_data
|
pandas.DataFrame, dict, scipy.stats distribution, or array-like
|
The probability distribution data to be plotted. Can be:
- pandas DataFrame
- dictionary (keys are indices, values are probabilities)
- scipy.stats distribution (e.g., Poisson). If a |
required |
title
|
str
|
The title of the plot, used for display and optionally as the file name. |
required |
media_file_path
|
str or Path
|
Directory where the plot image will be saved. If not provided, the plot is displayed without saving. |
None
|
figsize
|
tuple of float
|
The size of the figure, specified as (width, height). Default is (6, 3) |
(6, 3)
|
include_titles
|
bool
|
Whether to include titles and axis labels in the plot. Default is False |
False
|
truncate_at_beds
|
int or tuple of (int, int)
|
Either a single number specifying the upper bound, or a tuple of (lower_bound, upper_bound) for the x-axis range. If None, the full range of the data will be displayed. |
None
|
text_size
|
int
|
Font size for plot text, including titles and tick labels. |
None
|
bar_colour
|
str
|
The color of the bars in the plot. Default is "#5B9BD5" |
'#5B9BD5'
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to a generated name based on the title. |
None
|
probability_thresholds
|
dict
|
A dictionary where keys are points on the cumulative distribution function (as decimals, e.g., 0.9 for 90%) and values are the corresponding resource thresholds (bed counts). For example, {0.9: 15} indicates there is a 90% probability that at least 15 beds will be needed (represents the lower tail of the distribution). |
None
|
show_probability_thresholds
|
bool
|
Whether to show vertical lines indicating the resource requirements at different points on the cumulative distribution function. Default is True |
True
|
probability_levels
|
list of float
|
List of probability levels for automatic threshold calculation. |
None
|
plot_bed_base
|
dict
|
Dictionary of bed balance lines to plot in red. Keys are labels and values are x-axis positions. |
None
|
xlabel
|
str
|
A label for the x axis. Default is "Number of beds" |
'Number of beds'
|
return_figure
|
bool
|
If True, returns the matplotlib figure instead of displaying it. Default is False |
False
|
Returns:
Type | Description |
---|---|
Figure or None
|
Returns the figure if return_figure is True, otherwise displays the plot |
Examples:
Basic usage with an array of probabilities:
>>> probabilities = [0.05, 0.1, 0.2, 0.3, 0.2, 0.1, 0.05]
>>> plot_prob_dist(probabilities, "Bed Demand Distribution")
With thresholds:
>>> thresholds = _calculate_probability_thresholds(probabilities, [0.8, 0.95])
>>> plot_prob_dist(probabilities, "Bed Demand with Confidence Levels",
... probability_thresholds=thresholds)
Using with a scipy stats distribution:
>>> from scipy import stats
>>> poisson_dist = stats.poisson(mu=5) # Poisson with mean of 5
>>> plot_prob_dist(poisson_dist, "Poisson Distribution (μ=5)",
... truncate_at_beds=(0, 15))
Source code in src/patientflow/viz/probability_distribution.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 |
|
quantile_quantile
Generate Quantile-Quantile (QQ) plots to compare observed values with model predictions.
This module creates QQ plots for healthcare bed demand predictions, comparing observed values with model predictions. A QQ plot is a graphical technique for determining if two data sets come from populations with a common distribution. If the points form a line approximately along the reference line y=x, this suggests the distributions are similar.
Functions:
Name | Description |
---|---|
qq_plot : function |
Generate multiple QQ plots comparing observed values with model predictions |
Notes
To prepare the predicted distribution: * Treat the predicted distributions (saved as cdfs) for all time points of interest as if they were one distribution * Within this predicted distribution, because each probability is over a discrete rather than continuous number of input values, the upper and lower of values of the probability range are saved at each value * The mid point between upper and lower is calculated and saved * The distribution of cdf mid points (one for each horizon date) is sorted by value of the mid point and a cdf of this is calculated (this is a cdf of cdfs, in effect) * These are weighted by the probability of each value occurring
To prepare the observed distribution: * Take observed number each horizon date and save the cdf of that value from its predicted distribution * The distribution of cdf values (one per horizon date) is sorted * These are weighted by the probability of each value occurring, which is a uniform probability (1 / over the number of horizon dates)
qq_plot(prediction_times, prob_dist_dict_all, model_name='admissions', return_figure=False, figsize=None, suptitle=None, media_file_path=None, file_name=None)
Generate multiple QQ plots comparing observed values with model predictions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_times
|
list of tuple
|
List of (hour, minute) tuples for prediction times. |
required |
prob_dist_dict_all
|
dict
|
Dictionary of probability distributions keyed by model_key. |
required |
model_name
|
str
|
Base name of the model to construct model keys. |
"admissions"
|
return_figure
|
bool
|
If True, returns the figure object instead of displaying it. |
False
|
figsize
|
tuple of float
|
Size of the figure in inches as (width, height). If None, calculated automatically based on number of plots. |
None
|
suptitle
|
str
|
Super title for the entire figure, displayed above all subplots. |
None
|
media_file_path
|
Path
|
Path to save the plot. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "qq_plot.png". |
None
|
Returns:
Type | Description |
---|---|
Figure or None
|
Returns the figure if return_figure is True, otherwise displays the plot and returns None. |
Notes
The function creates a QQ plot for each prediction time, comparing the observed distribution with the predicted distribution. Each subplot shows how well the model's predictions match the actual observations.
Source code in src/patientflow/viz/quantile_quantile.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
|
randomised_pit
plot_randomised_pit(prediction_times, prob_dist_dict_all, model_name='admissions', return_figure=False, return_dataframe=False, figsize=None, suptitle=None, media_file_path=None, file_name=None, n_bins=10, seed=42)
Generate randomised PIT histograms for multiple prediction times side by side.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_times
|
list of tuple
|
List of (hour, minute) tuples representing times for which predictions were made. |
required |
prob_dist_dict_all
|
dict
|
Dictionary of probability distributions keyed by model_key. Each entry contains information about predicted distributions and observed values for different snapshot dates. |
required |
model_name
|
str
|
Base name of the model to construct model keys, by default "admissions". |
'admissions'
|
return_figure
|
bool
|
If True, returns the figure object instead of displaying it, by default False. |
False
|
return_dataframe
|
bool
|
If True, returns a dictionary of PIT values by model_key, by default False. |
False
|
figsize
|
tuple of (float, float)
|
Size of the figure in inches as (width, height). If None, calculated automatically based on number of plots, by default None. |
None
|
suptitle
|
str
|
Super title for the entire figure, displayed above all subplots, by default None. |
None
|
media_file_path
|
Path
|
Path to save the plot, by default None. If provided, saves the plot as a PNG file. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "plot_randomised_pit.png". |
None
|
n_bins
|
int
|
Number of histogram bins, by default 10. |
10
|
seed
|
int
|
Random seed for reproducibility, by default 42. |
42
|
Returns:
Type | Description |
---|---|
Figure
|
The figure object containing the plots, if return_figure is True. |
dict
|
Dictionary of PIT values by model_key, if return_dataframe is True. |
tuple
|
Tuple of (figure, pit_values_dict) if both return_figure and return_dataframe are True. |
None
|
If neither return_figure nor return_dataframe is True, displays the plots and returns None. |
Source code in src/patientflow/viz/randomised_pit.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
|
shap
SHAP (SHapley Additive exPlanations) visualization module.
This module provides functionality for generating SHAP plots. These are useful for visualizing feature importance and their impact on model decisions.
Functions:
Name | Description |
---|---|
plot_shap : function |
Generate SHAP plots for multiple trained models. |
plot_shap(trained_models, test_visits, exclude_from_training_data, media_file_path=None, file_name=None, return_figure=False, label_col='is_admitted')
Generate SHAP plots for multiple trained models.
This function creates SHAP (SHapley Additive exPlanations) summary plots for each trained model, showing the impact of features on model predictions. The plots can be saved to a specified media file path or displayed directly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trained_models
|
list[TrainedClassifier] or dict[str, TrainedClassifier]
|
List of trained classifier objects or dictionary with TrainedClassifier values. |
required |
test_visits
|
DataFrame
|
DataFrame containing the test visit data. |
required |
exclude_from_training_data
|
list[str]
|
List of columns to exclude from training data. |
required |
media_file_path
|
Path
|
Directory path where the generated plots will be saved. If None, plots are only displayed. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "shap_plot.png". |
None
|
return_figure
|
bool
|
If True, returns the figure instead of displaying it. |
False
|
label_col
|
str
|
Name of the column containing the target labels. |
"is_admitted"
|
Returns:
Type | Description |
---|---|
Figure or None
|
If return_figure is True, returns the generated figure. Otherwise, returns None. |
Source code in src/patientflow/viz/shap.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
survival_curve
Visualization tools for patient flow analysis using survival curves.
This module provides functions to create and analyze survival curves for time-to-event analysis.
Functions:
Name | Description |
---|---|
plot_admission_time_survival_curve : function |
Create single or multiple survival curves for ward admission times |
Notes
- The survival curves show the proportion of patients who have not yet experienced an event (e.g., admission to ward) over time
- Time is measured in hours from the initial event (e.g., arrival)
- A 4-hour target line is included by default to show performance against common healthcare targets
- The curves are created without external survival analysis packages for simplicity and transparency
- Multiple curves can be plotted on the same figure for comparison
plot_admission_time_survival_curve(df, start_time_col='arrival_datetime', end_time_col='departure_datetime', title='Time to Event Survival Curve', target_hours=[4], xlabel='Elapsed time from start', ylabel='Proportion not yet experienced event', annotation_string='{:.1%} experienced event\nwithin {:.0f} hours', labels=None, media_file_path=None, file_name=None, return_figure=False, return_df=False)
Create a survival curve for time-to-event analysis.
This function creates a survival curve showing the proportion of patients
who have not yet experienced an event over time. Can plot single or multiple
survival curves on the same plot.
Parameters
Parameters
df : pandas.DataFrame or list of pandas.DataFrame
DataFrame(s) containing patient visit data. If a list is provided,
multiple survival curves will be plotted on the same figure.
start_time_col : str, default="arrival_datetime"
Name of the column containing the start time (e.g., arrival time)
end_time_col : str, default="admitted_to_ward_datetime"
Name of the column containing the end time (e.g., admission time)
title : str, default="Time to Event Survival Curve"
Title for the plot
target_hours : list of float, default=[4]
List of target times in hours to show on the plot
xlabel : str, default="Elapsed time from start"
Label for the x-axis
ylabel : str, default="Proportion not yet experienced event"
Label for the y-axis
annotation_string : str, default="{:.1%} experienced event
within {:.0f} hours" String template for the text annotation. Use {:.1%} for the proportion and {:.0f} for the hours. Annotations are only shown for the first curve when plotting multiple curves. labels : list of str, optional Labels for each survival curve when plotting multiple curves. If None and multiple dataframes are provided, default labels will be used. Ignored when plotting a single curve. media_file_path : pathlib.Path, optional Path to save the plot. If None, the plot is not saved. file_name : str, optional Custom filename to use when saving the plot. If not provided, defaults to "survival_curve.png". return_figure : bool, default=False If True, returns the figure instead of displaying it return_df : bool, default=False If True, returns a DataFrame containing the survival curve data. For multiple curves, returns a list of DataFrames.
Returns
Returns
matplotlib.figure.Figure or pandas.DataFrame or list or tuple or None
- If return_figure is True and return_df is False: returns the figure object
- If return_figure is False and return_df is True: returns the DataFrame(s) with survival curve data
- If both return_figure and return_df are True: returns a tuple of (figure, DataFrame(s))
- If both are False: returns None
Notes
Notes
The survival curve shows the proportion of patients who have not yet experienced
the event at each time point. Vertical lines are drawn at each target hour
to indicate the target times, with the corresponding proportion of patients
who experienced the event within these timeframes.
When plotting multiple curves, different colors are automatically assigned
and a legend is displayed. Target line annotations are only shown for the
first curve to avoid visual clutter.
Source code in src/patientflow/viz/survival_curve.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
trial_results
Charts for hyperparameter optimisation trials.
This module provides tools to visualise the performance metrics of multiple hyperparameter tuning trials, highlighting the best trials for each metric.
Functions:
Name | Description |
---|---|
plot_trial_results : function |
Plot selected performance metrics for a list of hyperparameter trials. |
plot_trial_results(trials_list, metrics=None, media_file_path=None, file_name=None, return_figure=False)
Plot selected performance metrics from hyperparameter trials as scatter plots.
This function visualizes the performance metrics of a series of hyperparameter trials. It creates scatter plots for each selected metric, with the best-performing trial highlighted and annotated with its hyperparameters.
Optionally, the plot can be saved to disk or returned as a figure object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
trials_list
|
List[HyperParameterTrial]
|
A list of |
required |
metrics
|
List[str]
|
List of metric names to plot. If None, defaults to ["valid_auc", "valid_logloss"]. Each metric should be a key in the trial's cv_results dictionary. |
None
|
media_file_path
|
Path or None
|
Directory path where the generated plot image will be saved as "trial_results.png". If None, the plot is not saved. |
None
|
file_name
|
str
|
Custom filename to use when saving the plot. If not provided, defaults to "trial_results.png". |
None
|
return_figure
|
bool
|
If True, the matplotlib figure is returned instead of being displayed directly. Default is False. |
False
|
Returns:
Type | Description |
---|---|
Figure or None
|
The matplotlib figure object if |
Notes
- Assumes that each
HyperParameterTrial
intrials_list
has acv_results
dictionary containing the requested metrics, which are computed on the validation set. - Parameters from the best-performing trials are shown in the plots.
Source code in src/patientflow/viz/trial_results.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
utils
Utility functions for visualization and data formatting.
This module provides helper functions for cleaning and formatting data for visualization purposes, including filename cleaning and prediction time formatting.
Functions:
Name | Description |
---|---|
clean_title_for_filename : function |
Clean a title string to make it suitable for use in filenames |
format_prediction_time : function |
Format prediction time to 'HH:MM' format |
clean_title_for_filename(title)
Clean a title string to make it suitable for use in filenames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
title
|
str
|
The title to clean. |
required |
Returns:
Type | Description |
---|---|
str
|
The cleaned title, safe for use in filenames. |
Source code in src/patientflow/viz/utils.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
format_prediction_time(prediction_time)
Format prediction time to 'HH:MM' format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prediction_time
|
str or tuple
|
Either: - A string in 'HHMM' format, possibly containing underscores - A tuple of (hour, minute) |
required |
Returns:
Type | Description |
---|---|
str
|
Formatted time string in 'HH:MM' format. |
Source code in src/patientflow/viz/utils.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|