Skip to content

prior and posterior predictive checks #69

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
OriolAbril opened this issue Mar 30, 2021 · 3 comments
Closed

prior and posterior predictive checks #69

OriolAbril opened this issue Mar 30, 2021 · 3 comments
Labels
high impact Notebooks with most visits on docs.pymc.io tracker id Issues used as trackers in the notebook update project, do not close!

Comments

@OriolAbril
Copy link
Member

OriolAbril commented Mar 30, 2021

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/diagnostics_and_criticism/posterior_predictive.ipynb
Reviewers: @AlexAndorra @lucianopaz

Note: Please refer to notebook updates overview for more details on some of the bullet points below

Known changes needed

Changes listed in this section should all be done at some point in order to get this
notebook to a "Best Practices" state. However, these are probably not enough!
Make sure to thoroughly review the notebook and search for other updates.

General updates

  • Use new numpy random generator (see updates overview)

ArviZ related

  • Use InferenceData
  • Try to take advantage of matplotlib and xarray plotting to avoid unnecessary plotting loops
  • update hpd to hdi

Changes for discussion

Changes listed in this section are up for discussion, these are ideas on how to improve
the notebook but may not have a clear implementation, or fix some know issue only partially.

General updates

  • Show advanced uses of sample_posterior_predictive? Or should that be another more specific notebook not focused on model criticism but purely on pymc3 usage? (i.e. a howto instead of a diagnostics_and_criticism notebook).

Notes

Exotic dependencies

None

Computing requirements

All models seem to sample in under a minute

@OriolAbril OriolAbril added the tracker id Issues used as trackers in the notebook update project, do not close! label Mar 30, 2021
@OriolAbril OriolAbril added the high impact Notebooks with most visits on docs.pymc.io label Apr 6, 2021
@lucianopaz
Copy link
Member

@OriolAbril, do you feel that out of sample predictions should be introduced in the posterior predictive checks notebook? I mean, the notebook you linked has a "predictions" section where we use the pm.Data container to change the predictor values in the logistic regression example, but there are other scenarios where the pm.Data approach isn't enough.

For example, in the Radon hierarchical model there are two kinds of predictions:

  • Predict the concentration of Radon in a new house in a previously observed county (this would be like retrodicting or something like in-sample predictions).
  • Predict the concentration of Radon in a new house in a previously unobserved county (this is what we could call the true out-of-sample prediction).

The second prediction task needs to ignore some contents of the traced posterior samples (the county level random effects), so using pm.Data cannot help us do that. We haven't really established a standard way of doing this and I wonder if we should mention it in the posterior predictive checks? I think that if we mention the pm.Data way of working, we should at least mention and link a notebook where we deal with the other kind of prediction tasks.

@OriolAbril
Copy link
Member Author

Yes, this is much more well put, thanks!

I think it would be beneficial to split that notebook into more specific tutorials. Usage of pm.sample_posterior_predictive should have it's own notebook, explaining all these cases, but we should probably wait until v4 to write it, There pm.Data will be more powerful and flexible, but I think we will still need to refine the model for tasks like random effects or out of sample predictions in hierarchical models.

Between prior sampling, and prior/posterior predictive checks, we could also add another split, maybe even move predictive checks to ArviZ. Hopefully we'll have support for more and more predictive checks in arviz, we already have https://arviz-devs.github.io/arviz/api/generated/arviz.plot_separation.html for binary outcomes and even that may need to be splitted into continuous, discrete, binary... predictive checks.

As a more immediate goal, I think that keeping prior sampling and predictive checks here and create a new posterior predictive sampling notebook would be ideal.

@OriolAbril
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high impact Notebooks with most visits on docs.pymc.io tracker id Issues used as trackers in the notebook update project, do not close!
Projects
Development

No branches or pull requests

2 participants