Update GLM notebooks #4215

MarcoGorelli · 2020-11-09T17:41:41Z

Currently, there's a plot which uses plot_posterior_predictive_glm, which isn't listed in the API, not does it appear in Arviz's API reference.

I guess there plans to deprecate it? If so, I've replaced it with az.plot_hdi.

Something I don't understand about plot_hdi is that if I do

az.plot_hdi(x, poster_predictive["y"])

then I get the warning

FutureWarning: hdi currently interprets 2d data as (draw, shape) but this will change in a future release to (chain, draw) for coherence with other functions

I find this confusing if I add a dimension (as in the example from the arviz docs) then the warning goes away, is that what users are expected to do?

review-notebook-app · 2020-11-09T17:41:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2020-11-10T03:37:17Z

Codecov Report

Merging #4215 (50f0aa2) into master (f732a01) will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4215      +/-   ##
==========================================
+ Coverage   88.91%   88.94%   +0.03%     
==========================================
  Files          89       92       +3     
  Lines       14429    14800     +371     
==========================================
+ Hits        12829    13164     +335     
- Misses       1600     1636      +36

Impacted Files	Coverage Δ
pymc3/model.py	`89.33% <0.00%> (ø)`
pymc3/step_methods/__init__.py	`100.00% <0.00%> (ø)`
pymc3/distributions/__init__.py	`100.00% <0.00%> (ø)`
pymc3/distributions/tree.py	`88.60% <0.00%> (ø)`
pymc3/distributions/bart.py	`80.80% <0.00%> (ø)`
pymc3/step_methods/pgbart.py	`97.98% <0.00%> (ø)`
pymc3/step_methods/hmc/nuts.py	`97.48% <0.00%> (+0.01%)`	⬆️
pymc3/sampling.py	`86.88% <0.00%> (+0.04%)`	⬆️
pymc3/distributions/bound.py	`92.36% <0.00%> (+0.76%)`	⬆️

AlexAndorra · 2020-11-10T09:59:15Z

Hey @MarcoGorelli ! Yeah I think that's what you have to do to get rid of the warning until the release implementing this change 😕 I don't know when this release will happen though -- cc @OriolAbril @canyon289

Maybe another workaround is to include the posterior predictive samples into the inference data object:

    az.from_pymc3_predictions(
        ppc, idata_orig=trace, inplace=True
    )

Regarding plot_posterior_predictive_glm I'm not aware of any plan to deprecate it, but maybe we should 🤷‍♂️ I'm curious about what Oriol and Ravin (and others) think about this

MarcoGorelli · 2020-11-12T16:00:55Z

Regarding plot_posterior_predictive_glm I'm not aware of any plan to deprecate it,

Ah OK, sorry - if the plan's to keep it, I'll keep it in, I've just changed

- plot_posterior_predictive_glm(trace, samples=100, label="posterior predictive regression lines")
+ plot_posterior_predictive_glm(
+     trace.posterior.to_dataframe().to_dict(orient="records"),
+     samples=100,
+     label="posterior predictive regression lines",
+ )

so that it still works when using return_inferencedata=True when sampling

MarcoGorelli · 2020-11-12T21:23:23Z

Actually, having looked at the source code for plot_posterior_predictive_glm, it looks like it's meant for a really specific use case in which you have parameters named exactly 'Inference' and 'x', so it's probably not worth changing things around it - sorry for the noise 😳

So, in the end, the only changes here are:

explicitly use return_inferencedata=False, so this notebook doesn't break in a future version when True becomes the default
use arviz to make the plots
expand the star import

twiecki · 2020-11-13T08:13:07Z

We should remove that function from the library and put it inside the NB because it is so specific.

twiecki · 2020-11-13T11:00:35Z

@MarcoGorelli And then remove it from the function. But before we'd need to check if it's used anywhere else. If not, remove and add to the release notes that we're deprecating it.

AlexAndorra · 2020-11-13T11:03:38Z

I'm guessing this would be a breaking change then, wouldn't it? And is it easily replaceable with az.plot_hdi in your experience here @MarcoGorelli ?
If not, I'm not sure we should deprecate it: it doesn't seem to be a burden on maintenance for the core team and is a nice one-liner for people using the GLM module -- which, I'm guessing, are mostly beginners and which is why the GLM module is here in the first place

twiecki · 2020-11-13T11:25:59Z

You make good points.

…

On Fri, Nov 13, 2020, 12:03 Alexandre ANDORRA ***@***.***> wrote: I'm guessing this would be a breaking change then, wouldn't it? And is it easily replaceable with az.plot_hdi in your experience here @MarcoGorelli <https://github.com/MarcoGorelli> ? If not, I'm not sure we should deprecate it: it doesn't seem to be a burden on maintenance for the core team and is a nice one-liner for people using the GLM module -- which, I'm guessing, are mostly beginners and which is why the GLM module is here in the first place — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4215 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAFETGATMDME4C6MKAZKU5LSPUHBTANCNFSM4TPU7EBA> .

MarcoGorelli · 2020-11-13T11:26:37Z

It's used in ~ 8 notebooks, but nowhere else in the codebase

a nice one-liner for people using the GLM module -- which, I'm guessing, are mostly beginners and which is why the GLM module is here in the first place

I can see the desire to make it simple, but if you then need to learn the pymc3 API the moment your model is even just slightly different (or even if it just has parameters named differently, e.g. '\alpha' and '\beta') then I'm not sure it achieves its aim. If you're a beginner, I think that you don't expect that to use it you have to name your intercept 'Intercept' and your coefficient 'x' (as this function requires), you expect something more generic.

At the moment it looks to me like it's more of a convenience function which is useful for the GLM notebooks rather than something generic which needs to be part of the public API

And is it easily replaceable with az.plot_hdi in your experience here @MarcoGorelli ?

In this notebook at least it's easily replaceable with

ax.plot(x, trace['Intercept'][None, :] + trace['x'][None, :]*x[:, None], 'k-')
ax.plot([], [], 'k-')

(and an extra one line of code if you only want to select e.g. 100 samples)

FWIW, my suggestion would be to either:

make the function more generic (so it can work even if your parameters are named differently) and document it in the API reference page
deprecate it and just re-implement it where necessary in the notebooks which use, with an extra 3-4 simple lines of code

AlexAndorra · 2020-11-13T12:22:51Z

Thanks @MarcoGorelli -- your points definitely go in the deprecation column. TBH, I've never really used the GLM module but I think you and @twiecki make some good points. Since it is easily repleacable by arviz.plot_hdi, I'd be down for the second option then:

deprecate it and just re-implement it where necessary in the notebooks which use, with an extra 3-4 simple lines of code

The first option goes against the choice we made to delegate plots and diagnostics to ArviZ.

I guess the next steps then are:

Replace its use in the 8 notebooks you found
Remove the function from the codebase
Add a note about deprecation to the release notes

Are you down for this?

twiecki · 2020-11-13T12:33:17Z

@AlexAndorra I agree with this plan.

MarcoGorelli · 2020-11-13T12:51:07Z

Sure, thanks for the discussion - should there be a DeprecationWarning before removing the function?

AlexAndorra · 2020-11-13T12:55:54Z

Mmmh, probably. I'll defer to @twiecki on this one

twiecki · 2020-11-13T13:03:44Z

It's the right thing to do. You could add the code that replaces it to that warning (or point people to it) and open an issue that this should be removed in the next version.

MarcoGorelli · 2020-11-15T12:39:08Z

I tried running this notebook on Kaggle (as their environment has all the dependencies needed, except for watermark) but the diff shows everything as having changed, rather than showing the cell-by-cell comparisons we're used to...will look into this

MarcoGorelli · 2020-11-15T12:52:34Z

I tried running this notebook on Kaggle (as their environment has all the dependencies needed, except for watermark) but the diff shows everything as having changed, rather than showing the cell-by-cell comparisons we're used to...will look into this

🎉 removing papermill from the metadata seems to do the trick

AlexAndorra · 2020-11-16T20:10:15Z

Is this ready for review now @MarcoGorelli ?

MarcoGorelli · 2020-11-16T20:35:44Z

There's still some GLM left, hopefully I can get them all done this or next week

MarcoGorelli · 2020-11-17T11:36:45Z

Hi @AlexAndorra and @twiecki - having looked at GLM-robust-with-outlier-detection.ipynb, I do think it's pretty neat how plot_posterior_predictive_glm is used, so perhaps it'd be a pity to lose it.

Would it be welcome if I moved it over to ArviZ, made it a little more generic (so it can work even if return_inferencedata=True), and then deprecated it in PyMC3?

AlexAndorra · 2020-11-17T17:22:18Z

That sounds like a great plan @MarcoGorelli ! The thing with an ArviZ function is that it should be PPL-agnostic, which I think will be hard to do with plot_posterior_predictive_glm as it is now -- BTW, extending it to work with InferenceData and time series data would be awesome 🤩

So the best may be to add it to Bambi instead. Pinging @aloctavodia and @tomicapretto, as they work on this, and @ahartikainen, as he knows ArviZ very well 😉

In the meantime, we should probably keep this function in PyMC3, which would mean we'd close this PR I guess

MarcoGorelli · 2020-11-17T17:29:33Z

In the meantime, we should probably keep this function in PyMC3

OK, sure! There's still some updates which I've made to these notebooks (e.g. az.plot_trace instead of pm.traceplot), so perhaps I'll keep the updates but revert the re-implementation of plot_posterior_predictive_glm - then, in a separate PR, once some decision has been made, we can think of moving plot_posterior_predictive_glm out of pymc3

Will ping when the GLM notebooks are ready

tomicapretto · 2020-11-17T17:41:05Z

Thanks @AlexAndorra for pinging us here.

You are right, we plan to include (or at least discuss about including) something like plot_posterior_predictive_glm in Bambi.
Bambi still requires users to generate plots and numeric summaries "from scratch". I think Bambi could be even more beginner friendly by having a set of functions/methods to produce plots that one usually create when using GLMMs (like posterior predictive plots) without requiring to use much matplotlib or knowing how to work with InferenceData/xarray objects.

@MarcoGorelli if you are interested, contributions are always welcomed in Bambi :)

AlexAndorra · 2020-11-18T09:22:17Z

Yep, sounds good @MarcoGorelli 👌

MarcoGorelli · 2020-12-16T19:07:36Z

closing as this will go in pymc-examples

re-run glm-linear

075bc2b

MarcoGorelli added 3 commits November 12, 2020 15:21

Merge remote-tracking branch 'upstream/master' into re-run-glm-linear

2a08298

keep using plot_posterior_predictive_glm

2aa1391

revert description change

4b99201

MarcoGorelli added 4 commits November 12, 2020 21:15

don't return inferencedata

b16d2a2

remove trailing comma

b1e9138

revert plotting changes

62c1916

rerun

eb5e124

define plot_posterior_predictive_glm inside notebook

4719b96

MarcoGorelli added 2 commits November 14, 2020 14:50

use np.outer

cf435e1

run glm-linear and glm-logistic on Kaggle

512c88e

MarcoGorelli changed the title ~~Re-run glm-linear~~ Factor plot_posterior_predictive_glm into notebooks Nov 15, 2020

MarcoGorelli marked this pull request as draft November 15, 2020 12:22

MarcoGorelli added 2 commits November 15, 2020 12:24

remove pip install watermark cell

1460495

remove metadata

26a6e21

nbformat_minor

2e0d859

MarcoGorelli mentioned this pull request Nov 15, 2020

ReviewNB shows entire notebook has having changed, rather than showing cell-by-cell comparisons ReviewNB/support#76

Closed

remove papermill

6e67c9f

MarcoGorelli added 6 commits November 15, 2020 12:53

remove papermill from glm-linear too

c16d6ac

revert api_quickstart changes

059e2c0

remove papermill from glm-linear

792e3f7

don't remove cell metadata

71ff932

only remove metadata from glm-linear

2bb5237

glm poisson regression

50f0aa2

MarcoGorelli changed the title ~~Factor plot_posterior_predictive_glm into notebooks~~ Update GLM notebooks Nov 17, 2020

MarcoGorelli mentioned this pull request Nov 19, 2020

Let plot_posterior_predictive_glm work with inferencedata too #4234

Merged

MarcoGorelli closed this Dec 16, 2020

Uh oh!

Update GLM notebooks #4215

Update GLM notebooks #4215

Uh oh!

Conversation

MarcoGorelli commented Nov 9, 2020

Uh oh!

review-notebook-app bot commented Nov 9, 2020

Uh oh!

codecov bot commented Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

AlexAndorra commented Nov 10, 2020

Uh oh!

MarcoGorelli commented Nov 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Nov 12, 2020

Uh oh!

twiecki commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

twiecki commented Nov 13, 2020

Uh oh!

AlexAndorra commented Nov 13, 2020

Uh oh!

twiecki commented Nov 13, 2020 via email

Uh oh!

MarcoGorelli commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexAndorra commented Nov 13, 2020

Uh oh!

twiecki commented Nov 13, 2020

Uh oh!

MarcoGorelli commented Nov 13, 2020

Uh oh!

AlexAndorra commented Nov 13, 2020

Uh oh!

twiecki commented Nov 13, 2020

Uh oh!

MarcoGorelli commented Nov 15, 2020

Uh oh!

MarcoGorelli commented Nov 15, 2020

Uh oh!

AlexAndorra commented Nov 16, 2020

Uh oh!

MarcoGorelli commented Nov 16, 2020

Uh oh!

MarcoGorelli commented Nov 17, 2020

Uh oh!

AlexAndorra commented Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Nov 17, 2020

Uh oh!

tomicapretto commented Nov 17, 2020

Uh oh!

AlexAndorra commented Nov 18, 2020

Uh oh!

MarcoGorelli commented Dec 16, 2020

Uh oh!

Uh oh!

codecov bot commented Nov 10, 2020 •

edited

Loading

MarcoGorelli commented Nov 12, 2020 •

edited

Loading

twiecki commented Nov 13, 2020 •

edited

Loading

MarcoGorelli commented Nov 13, 2020 •

edited

Loading

AlexAndorra commented Nov 17, 2020 •

edited

Loading