Skip to content

Commit fe06eed

Browse files
committed
[Reliability Bayesian pymc-devs#474] updated with review comments to hide data import and plotting funcs
Signed-off-by: Nathaniel <[email protected]>
1 parent 8611c5b commit fe06eed

File tree

2 files changed

+1195
-1125
lines changed

2 files changed

+1195
-1125
lines changed

examples/case_studies/reliability_and_calibrated_prediction.ipynb

+1,165-1,119
Large diffs are not rendered by default.

examples/case_studies/reliability_and_calibrated_prediction.myst.md

+30-6
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3.9.0 ('pymc_ar_ex')
8+
display_name: Python [conda env:pymc_ar_ex] *
99
language: python
10-
name: python3
10+
name: conda-env-pymc_ar_ex-py
11+
substitutions:
12+
extra_dependencies: lifelines
1113
---
1214

1315
(Reliability Statistics and Predictive Calibration)=
@@ -141,6 +143,8 @@ See below how the failure data flags whether or not an observation has been cens
141143
Left censoring (where we don't observe an item from the beginning of their history) and interval censoring (both left and right censoring) can also occur but are less common.
142144

143145
```{code-cell} ipython3
146+
:tags: [hide-input]
147+
144148
heat_exchange_df = pd.read_csv(
145149
StringIO(
146150
"""Years Lower,Years Upper,Censoring Indicator,Count,Plant
@@ -168,6 +172,9 @@ heat_exchange_df["censored"] = np.where(
168172
heat_exchange_df["Censoring Indicator"] == "Right", heat_exchange_df["Count"], 0
169173
)
170174
heat_exchange_df["risk_set"] = [100, 99, 97, 0, 100, 98, 0, 100, 0]
175+
```
176+
177+
```{code-cell} ipython3
171178
heat_exchange_df
172179
```
173180

@@ -248,6 +255,8 @@ We' apply the same techniques to a larger dataset and plot some of these quantit
248255
The shock absorbers data is in period format but it records a constantly decreasing risk set over time with one item being censored or failing at each time point i.e. removed from testing successfully (approved) or removed due to failure. This is a special case of the **period** format data.
249256

250257
```{code-cell} ipython3
258+
:tags: [hide-input]
259+
251260
shockabsorbers_df = pd.read_csv(
252261
StringIO(
253262
"""Kilometers,Failure Mode,Censoring Indicator
@@ -301,6 +310,9 @@ shockabsorbers_events = survival_table_from_events(
301310
shockabsorbers_events.rename(
302311
{"event_at": "t", "observed": "failed", "at_risk": "risk_set"}, axis=1, inplace=True
303312
)
313+
```
314+
315+
```{code-cell} ipython3
304316
actuarial_table_shock = make_actuarial_table(shockabsorbers_events)
305317
actuarial_table_shock
306318
```
@@ -319,6 +331,8 @@ lnf.print_summary()
319331
Although it's tempting to take this model and run with it, we need to be cautious in the case of limited data. For instance in the heat-exchange data we have three years of data with a total of 11 failures. A too simple model can get this quite wrong. For the moment we'll focus on the shock-absorber data - its non-parametric description and a simple univariate fit to this data.
320332

321333
```{code-cell} ipython3
334+
:tags: [hide-input]
335+
322336
def plot_cdfs(actuarial_table, dist_fits=True, ax=None, title="", xy=(3000, 0.5), item_period=None):
323337
if item_period is None:
324338
lnf = LogNormalFitter().fit(actuarial_table["t"] + 1e-25, actuarial_table["failed"])
@@ -504,6 +518,8 @@ ax.legend();
504518
Next we'll plot the bootstrapped data and the two estimates of coverage we achieve conditional on the MLE fit. In other words when we want to assess the coverage of a prediction interval based on our MLE fit we can also bootstrap an estimate for this quantity.
505519

506520
```{code-cell} ipython3
521+
:tags: [hide-input]
522+
507523
mosaic = """AABB
508524
CCCC"""
509525
fig, axs = plt.subplot_mosaic(mosaic=mosaic, figsize=(20, 12))
@@ -592,6 +608,8 @@ Next we'll look at a data set which has a slightly less clean parametric fit. Th
592608
We want to spend some time with this example to show how the *frequentist* techniques which worked well to estimate the shock-absorbers data can be augmented in the case of the Bearing cage data. In particular we'll show how the issues arising can be resolved with a *Bayesian* approach.
593609

594610
```{code-cell} ipython3
611+
:tags: [hide-input]
612+
595613
bearing_cage_df = pd.read_csv(
596614
StringIO(
597615
"""Hours,Censoring Indicator,Count
@@ -635,8 +653,11 @@ bearing_cage_events = survival_table_from_events(
635653
bearing_cage_events.rename(
636654
{"event_at": "t", "observed": "failed", "at_risk": "risk_set"}, axis=1, inplace=True
637655
)
656+
```
657+
658+
```{code-cell} ipython3
638659
actuarial_table_bearings = make_actuarial_table(bearing_cage_events)
639-
pd.options.display.float_format = "{:.5f}".format
660+
640661
actuarial_table_bearings
641662
```
642663

@@ -709,11 +730,13 @@ ax = plot_cdfs(
709730

710731
## Probability Plots: Comparing CDFs in a Restricted Linear Range
711732

712-
With this adjustment to the data format we compare the MLE fit against the empirical CDF. In the next section we'll use the technique of linearising the MLE fits so that can perform a visual "goodness of fit" check. These types of plots rely on a transformation that can be applied to the location and scale distributions to turn their CDF into a linear space.
733+
In this section we'll use the technique of linearising the MLE fits so that can perform a visual "goodness of fit" check. These types of plots rely on a transformation that can be applied to the location and scale distributions to turn their CDF into a linear space.
713734

714735
For both the Lognormal and Weibull fits we can represent their CDF in a linear space as a relationship between the logged value t and an appropriate $CDF^{-1}$.
715736

716737
```{code-cell} ipython3
738+
:tags: [hide-input]
739+
717740
def sev_ppf(p):
718741
return np.log(-np.log(1 - p))
719742
@@ -1158,7 +1181,8 @@ ax2.legend()
11581181

11591182
The choice of model in such cases is crucial. The decision about which failure profile is apt has to be informed by a subject matter expert because extrapolation from such sparse data is always risky. An understanding of the uncertainty is crucial if real costs attach to the failures and the subject matter expert is usually better placed to tell if you 2 or 7 failures can be plausibly expected within 600 hours of service.
11601183

1161-
# Conclusion
1184+
1185+
## Conclusion
11621186

11631187
We've seen how to analyse and model reliability from both a frequentist and Bayesian perspective and compare against the non-parametric estimates. We've shown how prediction intervals can be derived for a number of key statistics by both a bootstrapping and a bayesian approach. We've seen approaches to calibrating these prediction intervals through re-sampling methods and informative prior specification. These views on the problem are complementary and the choice of technique which is appropriate should be driven by factors of the questions of interest, not some ideological commitment.
11641188

@@ -1168,7 +1192,7 @@ In particular we've seen how the MLE fits to our bearings data provide a decent
11681192

11691193
## Authors
11701194

1171-
Nathaniel Forde
1195+
* Authored by Nathaniel Forde on 9th of January 2022 ([pymc-examples491](https://github.com/pymc-devs/pymc-examples/pull/491))
11721196

11731197
+++
11741198

0 commit comments

Comments
 (0)