Skip to content

Update GLM notebooks #4215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

MarcoGorelli
Copy link
Contributor

Currently, there's a plot which uses plot_posterior_predictive_glm, which isn't listed in the API, not does it appear in Arviz's API reference.

I guess there plans to deprecate it? If so, I've replaced it with az.plot_hdi.

Something I don't understand about plot_hdi is that if I do

az.plot_hdi(x, poster_predictive["y"])

then I get the warning

FutureWarning: hdi currently interprets 2d data as (draw, shape) but this will change in a future release to (chain, draw) for coherence with other functions

I find this confusing if I add a dimension (as in the example from the arviz docs) then the warning goes away, is that what users are expected to do?

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Nov 10, 2020

Codecov Report

Merging #4215 (50f0aa2) into master (f732a01) will increase coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4215      +/-   ##
==========================================
+ Coverage   88.91%   88.94%   +0.03%     
==========================================
  Files          89       92       +3     
  Lines       14429    14800     +371     
==========================================
+ Hits        12829    13164     +335     
- Misses       1600     1636      +36     
Impacted Files Coverage Δ
pymc3/model.py 89.33% <0.00%> (ø)
pymc3/step_methods/__init__.py 100.00% <0.00%> (ø)
pymc3/distributions/__init__.py 100.00% <0.00%> (ø)
pymc3/distributions/tree.py 88.60% <0.00%> (ø)
pymc3/distributions/bart.py 80.80% <0.00%> (ø)
pymc3/step_methods/pgbart.py 97.98% <0.00%> (ø)
pymc3/step_methods/hmc/nuts.py 97.48% <0.00%> (+0.01%) ⬆️
pymc3/sampling.py 86.88% <0.00%> (+0.04%) ⬆️
pymc3/distributions/bound.py 92.36% <0.00%> (+0.76%) ⬆️

@AlexAndorra
Copy link
Contributor

Hey @MarcoGorelli ! Yeah I think that's what you have to do to get rid of the warning until the release implementing this change 😕 I don't know when this release will happen though -- cc @OriolAbril @canyon289

Maybe another workaround is to include the posterior predictive samples into the inference data object:

    az.from_pymc3_predictions(
        ppc, idata_orig=trace, inplace=True
    )

Regarding plot_posterior_predictive_glm I'm not aware of any plan to deprecate it, but maybe we should 🤷‍♂️ I'm curious about what Oriol and Ravin (and others) think about this

@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Nov 12, 2020

Regarding plot_posterior_predictive_glm I'm not aware of any plan to deprecate it,

Ah OK, sorry - if the plan's to keep it, I'll keep it in, I've just changed

- plot_posterior_predictive_glm(trace, samples=100, label="posterior predictive regression lines")
+ plot_posterior_predictive_glm(
+     trace.posterior.to_dataframe().to_dict(orient="records"),
+     samples=100,
+     label="posterior predictive regression lines",
+ )

so that it still works when using return_inferencedata=True when sampling

@MarcoGorelli
Copy link
Contributor Author

Actually, having looked at the source code for plot_posterior_predictive_glm, it looks like it's meant for a really specific use case in which you have parameters named exactly 'Inference' and 'x', so it's probably not worth changing things around it - sorry for the noise 😳

So, in the end, the only changes here are:

  • explicitly use return_inferencedata=False, so this notebook doesn't break in a future version when True becomes the default
  • use arviz to make the plots
  • expand the star import

@twiecki
Copy link
Member

twiecki commented Nov 13, 2020

We should remove that function from the library and put it inside the NB because it is so specific.

@twiecki
Copy link
Member

twiecki commented Nov 13, 2020

@MarcoGorelli And then remove it from the function. But before we'd need to check if it's used anywhere else. If not, remove and add to the release notes that we're deprecating it.

@AlexAndorra
Copy link
Contributor

I'm guessing this would be a breaking change then, wouldn't it? And is it easily replaceable with az.plot_hdi in your experience here @MarcoGorelli ?
If not, I'm not sure we should deprecate it: it doesn't seem to be a burden on maintenance for the core team and is a nice one-liner for people using the GLM module -- which, I'm guessing, are mostly beginners and which is why the GLM module is here in the first place

@twiecki
Copy link
Member

twiecki commented Nov 13, 2020 via email

@MarcoGorelli
Copy link
Contributor Author

MarcoGorelli commented Nov 13, 2020

It's used in ~ 8 notebooks, but nowhere else in the codebase

a nice one-liner for people using the GLM module -- which, I'm guessing, are mostly beginners and which is why the GLM module is here in the first place

I can see the desire to make it simple, but if you then need to learn the pymc3 API the moment your model is even just slightly different (or even if it just has parameters named differently, e.g. '\alpha' and '\beta') then I'm not sure it achieves its aim. If you're a beginner, I think that you don't expect that to use it you have to name your intercept 'Intercept' and your coefficient 'x' (as this function requires), you expect something more generic.

At the moment it looks to me like it's more of a convenience function which is useful for the GLM notebooks rather than something generic which needs to be part of the public API

And is it easily replaceable with az.plot_hdi in your experience here @MarcoGorelli ?

In this notebook at least it's easily replaceable with

ax.plot(x, trace['Intercept'][None, :] + trace['x'][None, :]*x[:, None], 'k-')
ax.plot([], [], 'k-')

(and an extra one line of code if you only want to select e.g. 100 samples)


FWIW, my suggestion would be to either:

  • make the function more generic (so it can work even if your parameters are named differently) and document it in the API reference page
  • deprecate it and just re-implement it where necessary in the notebooks which use, with an extra 3-4 simple lines of code

@AlexAndorra
Copy link
Contributor

Thanks @MarcoGorelli -- your points definitely go in the deprecation column. TBH, I've never really used the GLM module but I think you and @twiecki make some good points. Since it is easily repleacable by arviz.plot_hdi, I'd be down for the second option then:

deprecate it and just re-implement it where necessary in the notebooks which use, with an extra 3-4 simple lines of code

The first option goes against the choice we made to delegate plots and diagnostics to ArviZ.

I guess the next steps then are:

  • Replace its use in the 8 notebooks you found
  • Remove the function from the codebase
  • Add a note about deprecation to the release notes

Are you down for this?

@twiecki
Copy link
Member

twiecki commented Nov 13, 2020

@AlexAndorra I agree with this plan.

@MarcoGorelli
Copy link
Contributor Author

Sure, thanks for the discussion - should there be a DeprecationWarning before removing the function?

@AlexAndorra
Copy link
Contributor

Mmmh, probably. I'll defer to @twiecki on this one

@twiecki
Copy link
Member

twiecki commented Nov 13, 2020

It's the right thing to do. You could add the code that replaces it to that warning (or point people to it) and open an issue that this should be removed in the next version.

@MarcoGorelli MarcoGorelli changed the title Re-run glm-linear Factor plot_posterior_predictive_glm into notebooks Nov 15, 2020
@MarcoGorelli MarcoGorelli marked this pull request as draft November 15, 2020 12:22
@MarcoGorelli
Copy link
Contributor Author

I tried running this notebook on Kaggle (as their environment has all the dependencies needed, except for watermark) but the diff shows everything as having changed, rather than showing the cell-by-cell comparisons we're used to...will look into this

@MarcoGorelli
Copy link
Contributor Author

I tried running this notebook on Kaggle (as their environment has all the dependencies needed, except for watermark) but the diff shows everything as having changed, rather than showing the cell-by-cell comparisons we're used to...will look into this

🎉 removing papermill from the metadata seems to do the trick

@AlexAndorra
Copy link
Contributor

Is this ready for review now @MarcoGorelli ?

@MarcoGorelli
Copy link
Contributor Author

There's still some GLM left, hopefully I can get them all done this or next week

@MarcoGorelli
Copy link
Contributor Author

Hi @AlexAndorra and @twiecki - having looked at GLM-robust-with-outlier-detection.ipynb, I do think it's pretty neat how plot_posterior_predictive_glm is used, so perhaps it'd be a pity to lose it.

Would it be welcome if I moved it over to ArviZ, made it a little more generic (so it can work even if return_inferencedata=True), and then deprecated it in PyMC3?

@AlexAndorra
Copy link
Contributor

AlexAndorra commented Nov 17, 2020

That sounds like a great plan @MarcoGorelli ! The thing with an ArviZ function is that it should be PPL-agnostic, which I think will be hard to do with plot_posterior_predictive_glm as it is now -- BTW, extending it to work with InferenceData and time series data would be awesome 🤩

So the best may be to add it to Bambi instead. Pinging @aloctavodia and @tomicapretto, as they work on this, and @ahartikainen, as he knows ArviZ very well 😉

In the meantime, we should probably keep this function in PyMC3, which would mean we'd close this PR I guess

@MarcoGorelli
Copy link
Contributor Author

In the meantime, we should probably keep this function in PyMC3

OK, sure! There's still some updates which I've made to these notebooks (e.g. az.plot_trace instead of pm.traceplot), so perhaps I'll keep the updates but revert the re-implementation of plot_posterior_predictive_glm - then, in a separate PR, once some decision has been made, we can think of moving plot_posterior_predictive_glm out of pymc3

Will ping when the GLM notebooks are ready

@MarcoGorelli MarcoGorelli changed the title Factor plot_posterior_predictive_glm into notebooks Update GLM notebooks Nov 17, 2020
@tomicapretto
Copy link
Contributor

Thanks @AlexAndorra for pinging us here.

You are right, we plan to include (or at least discuss about including) something like plot_posterior_predictive_glm in Bambi.
Bambi still requires users to generate plots and numeric summaries "from scratch". I think Bambi could be even more beginner friendly by having a set of functions/methods to produce plots that one usually create when using GLMMs (like posterior predictive plots) without requiring to use much matplotlib or knowing how to work with InferenceData/xarray objects.

@MarcoGorelli if you are interested, contributions are always welcomed in Bambi :)

@AlexAndorra
Copy link
Contributor

Yep, sounds good @MarcoGorelli 👌

@MarcoGorelli
Copy link
Contributor Author

closing as this will go in pymc-examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants