You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: myst_nbs/case_studies/reliability_and_calibration.myst.md
+216-31
Original file line number
Diff line number
Diff line change
@@ -22,6 +22,7 @@ kernelspec:
22
22
23
23
```{code-cell} ipython3
24
24
import os
25
+
import random
25
26
26
27
from io import StringIO
27
28
@@ -404,20 +405,36 @@ plot_ln_pi(
404
405
405
406
### Bootstrap Calibration and Coverage Estimation
406
407
407
-
We want now to estimate the coverage implied by this prediction interval, and to do so we will bootstrap estimates for the lower and upper bounds of the 95% confidence interval and ultimately assess their coverage conditional on the MLE fit.
408
+
We want now to estimate the coverage implied by this prediction interval, and to do so we will bootstrap estimates for the lower and upper bounds of the 95% confidence interval and ultimately assess their coverage conditional on the MLE fit. We will use the fractional weighted (bayesian) bootstrap. We'll report two methods of estimating the coverage statistic - the first is the empirical coverage based on sampling a random value from within the known range and assess if it falls between the 95% MLE lower bound and upper bounds.
408
409
409
-
```{code-cell} ipython3
410
-
import random
410
+
The second method we'll use to assess coverage is to bootstrap estimates of a 95% lower bound and upper bound and then assess how much those bootstrapped values would cover conditional on the MLE fit.
axs[0].set_title("Bootstrapped Mu against Bootstrapped 95% Lower Bound")
462
479
prop = draws["Contained"].sum() / len(draws)
463
480
axs[0].annotate(
464
-
f"Estimated Prediction \nEmpirical Coverage Based on Sampling : {np.round(prop, 3)}",
465
-
xy=(10.4, 16000),
481
+
f"Estimated Prediction \nEmpirical Coverage \nBased on Sampling : {np.round(prop, 3)}",
482
+
xy=(10.4, 12000),
466
483
fontweight="bold",
467
484
)
468
485
axs[1].set_title("Bootstrapped Sigma against Bootstrapped 95% Lower Bound")
@@ -514,6 +531,10 @@ axs[2].annotate(
514
531
);
515
532
```
516
533
534
+
These simulations should be repeated a far larger number of times than we do here. We can also vary the interval size to achieve the desired coverage level.
535
+
536
+
+++
537
+
517
538
### Bearing Cage Data: A Study in Bayesian Reliability Analysis
518
539
519
540
Next we'll look at a data set which has a slightly less clean parametric fit. The most obvious feature of this data is the small amount of failing records. The data is recorded in the **period** format with counts showing the extent of the `risk set` in each period.
@@ -778,7 +799,7 @@ ax2.legend()
778
799
ax.set_ylabel("Fraction Failing");
779
800
```
780
801
781
-
## Bayesian Modelling
802
+
## Bayesian Modelling of Reliability Data
782
803
783
804
We've now seen how to model and visualise the parametric model fits to sparse reliability using a frequentist or MLE framework. We want to now show how the same style of inferences can be achieved in the Bayesian paradigm.
784
805
@@ -793,7 +814,7 @@ where $\delta_{i}$ is an indicator for whether the observation is a faiure or a
793
814
794
815
### Direct PYMC implementation of Weibull Survival
795
816
796
-
We'll first model the Weibull likelihood directly in terms of the parameters $\alpha, \beta$, and then consider an alternative parameterisation.
817
+
We fit two versions of this model with different specifications for the priors, one takes a **vague** uniform prior over the data, and another specifies priors closer to the MLE fits. We will show the implications of the prior specification has for the calibration of the model with the observed data below.
We can see here how the Bayesian uncertainty estimates driven by our deliberately vague priors encompasses more uncertainty than our MLE fit and the uninformative prior implies a wider predictive distribution for the 5% and 10% failure times. The Bayesian model with uninformative priors seems to do a better job of capturing the uncertainty in the non-parametric estimates of our CDF.
980
+
981
+
The concrete estimates of failure percentage over time of each model fit are especially crucial in a situation where we have sparse data. It is a meaningful sense check that we can consult with subject matter experts about how plausible the expectation and range for the 10% failure time is for their product.
982
+
983
+
+++
984
+
985
+
## Predicting the Number of Failures in an Interval
986
+
987
+
Because our data on observed failures is extremely sparse, we have to be very careful about extrapolating beyond the observed range of time, but we can ask about the predictable number of failures in the lower tail of our cdf. Another view on this data which can be helpful for discussing with subject matters experts is the number of implied failures over a time interval.
988
+
989
+
### The Plugin Estimate
990
+
991
+
Imagine we want to know how many bearings will fail between 150 and 600 hours based of service. We can calculate this based on the estimated CDF and number of new future bearings. We first calculate:
to establish a probability for the failure occurring in the interval and then a point prediction for the number of failures in the interval is given by `risk_set`*$\hat{\rho}$.
### Applying the Same Procedure on the Bayesian Posterior
1010
+
1011
+
We'll use the posterior predictive distribution of the uniformative model. We show here how to derive the uncertainty in the estimates of the 95% prediction interval for the number of failures in a time interval.
"Posterior Predictive Expected Failure Count between 150-600 hours \nas a function of Weibull(alpha, beta)",
1055
+
fontsize=20,
1056
+
)
1057
+
1058
+
ax1.hist(
1059
+
output_df["lb"],
1060
+
ec="black",
1061
+
color="slateblue",
1062
+
label="95% PI Lower Bound on Failure Count",
1063
+
alpha=0.3,
1064
+
)
1065
+
ax1.axvline(output_df["lb"].mean(), label="Expected 95% PI Lower Bound on Failure Count")
1066
+
ax1.axvline(output_df["ub"].mean(), label="Expected 95% PI Upper Bound on Failure Count")
1067
+
ax1.hist(
1068
+
output_df["ub"],
1069
+
ec="black",
1070
+
color="cyan",
1071
+
label="95% PI Upper Bound on Failure Count",
1072
+
bins=20,
1073
+
alpha=0.3,
1074
+
)
1075
+
ax1.hist(
1076
+
output_df["expected"], ec="black", color="pink", label="Expected Count of Failures", bins=20
1077
+
)
1078
+
ax1.set_title("Uncertainty in the Posterior Prediction Interval of Failure Counts", fontsize=20)
1079
+
ax1.legend();
905
1080
```
906
1081
1082
+
The choice of model in such cases is crucial. The decision about which failure profile is apt, has to be informed by a subject matter expert because extrapolation from such sparse data is always risky. An understanding of the uncertainty is crucial if real costs attach to the failures and the subject matter expert is usually better placed to tell if you 2 or 7 failures can be expected within 600 hours of service.
1083
+
1084
+
# Conclusion
1085
+
1086
+
We've seen how to analyse and model reliability from both a frequentist and Bayesian perspective and compare against the non-parametric estimates. We've shown how prediction intervals can be derived for a number of key statistics by both a bootstrapping and a bayesian approach. We've seen approaches to calibrating these prediction intervals through re-sampling methods and informative prior specification. These views on the problem are complementary and the choice of technique which is appropriate should be driven by factors of the questions of interest, not some ideological commitment.
1087
+
1088
+
In particular we've seen how the MLE fits to our bearings data provide a decent first guess approach to establishing priors in the Bayesian analysis. We've also seen how subject matter expertise can be elicited by deriving key quantities from the implied models and subjecting these implications to scrutiny. The choice of Bayesian prediction interval is calibrated to our priors expectations, and where we have none - we can supply vague or non-informative priors. The implications of these priors can again be checked and analysed against an appropriate cost function.
0 commit comments