[WIP] Update glm-logistic with bambi #196

chiral-carbon · 2021-07-26T12:13:11Z

Addresses issue #82 and aims to advance it to best practices, use bambi instead of pymc glm module.

I had a question regarding the family to be used in cells 4 and 14 while defining the model. The notebook originally uses Binomial family with the GLM module but with bambi I think we do not have a default binomial family defined. I used negativebinomial just in case but since it is not the correct replacement, what should I do about it?

review-notebook-app · 2021-07-26T12:13:15Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2021-07-28T04:06:54Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-07-28T04:06:53Z
----------------------------------------------------------------

@tomicapretto, after seeing https://github.com/bambinos/bambi/issues/371 it looks like the binomial family is not part of the immediate roadmap, so it would probably be best to keep this notebook on standby for a while. How does this sound? Do you already have a bit of an idea about the expected timeline?

tomicapretto · 2021-07-28T12:38:17Z

Hi @OriolAbril, thanks for tagging me here!

Actually, I realized it wasn't so much work to implement some stuff to have the Binomial family. It's still being implemented in formulae (see bambinos/formulae#40), and it will be implemented in Bambi after that.
As for the timeline, this should be done for the next release which will happen when GSoC ends.

Also thank you @chiral-carbon for all your work here! 😁

chiral-carbon · 2021-08-02T10:32:45Z

thanks a lot @tomicapretto , means a lot to me! 😄

tomicapretto · 2021-08-04T20:50:27Z

Hi @chiral-carbon. Bambi development version now has a Binomial family 🎉
Have look here to see how it works (and how it compares to the existing Bernoulli family)

chiral-carbon · 2021-08-05T13:51:14Z

this is great @tomicapretto!

review-notebook-app · 2021-08-24T21:31:20Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-24T21:31:20Z
----------------------------------------------------------------

I think I should try to replace this with az.plot_pair()

OriolAbril commented on 2021-08-25T07:55:12Z
----------------------------------------------------------------

it should still use seaborn. This part is the initial data exploration and ArviZ is designed to explore the results of the models, so it expects data to have chain and draw dim for example which is not the case yet.

review-notebook-app · 2021-08-24T21:31:21Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-24T21:31:21Z
----------------------------------------------------------------

how can I fix this, since the formulae itself would not change?

tomicapretto commented on 2021-08-25T11:59:22Z
----------------------------------------------------------------

Each row in the data represents a Bernoulli trial, whether income is greater than 50k or not. For this case, you need to use family="bernoulli".

You should use family="binomial" when the rows are aggregated and you have the number of bernoulli trials and the number of successes (something like a number of people, and how many of those make more than 50k).

In short, you need the same formula, but using family="bernoulli" instead of family="binomial" .

This may help too.

OriolAbril · 2021-08-25T07:55:12Z

it should still use seaborn. This part is the initial data exploration and ArviZ is designed to explore the results of the models, so it expects data to have chain and draw dim for example which is not the case yet.

View entire conversation on ReviewNB

tomicapretto · 2021-08-25T11:59:23Z

Each row in the data represents a Bernoulli trial, whether income is greater than 50k or not. For this case, you need to use family="bernoulli".

You should use family="binomial" when the rows are aggregated and you have the number of bernoulli trials and the number of successes (something like a number of people, and how many of those make more than 50k).

In short, you need the same formula, but using family="bernoulli" instead of family="binomial" .

This may help too.

View entire conversation on ReviewNB

review-notebook-app · 2021-08-26T21:28:26Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-26T21:28:26Z
----------------------------------------------------------------

wanted to use arviz here and had tried arviz.plot_dist(), would that be correct usage?

OriolAbril commented on 2021-08-27T20:53:22Z
----------------------------------------------------------------

I am not sure plot_dist works to create histograms of floats in latest release, I think it's only added to development version. You can try using the default KDE though instead of a histogram and see how it looks

review-notebook-app · 2021-08-26T21:28:27Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-26T21:28:26Z
----------------------------------------------------------------

my kernel keeps dying at this step so any suggestions to fix this?

tomicapretto commented on 2021-08-27T00:08:08Z
----------------------------------------------------------------

One simple thing to do is standardizin the variableage instead of dividing by 10. We have this example here.

Note it is possible to standardize using the formula interface, for example, income ~ scale(age) + np.power(scale(age), 2)

chiral-carbon commented on 2021-08-27T16:51:28Z
----------------------------------------------------------------

thanks a lot! but it worked out without changing anything, too many processes were running so that's why the kernel kept dying I think 😅

tomicapretto · 2021-08-27T00:08:09Z

One simple thing to do is standardizin the variableage instead of dividing by 10. We have this example here.

Note it is possible to standardize using the formula interface, for example, income ~ scale(age) + np.power(scale(age), 2)

View entire conversation on ReviewNB

chiral-carbon · 2021-08-27T16:51:29Z

thanks a lot! but it worked out without changing anything, too many processes were running so that's why the kernel kept dying I think 😅

View entire conversation on ReviewNB

OriolAbril · 2021-08-27T20:53:23Z

I am not sure plot_dist works to create histograms of floats in latest release, I think it's only added to development version. You can try using the default KDE though instead of a histogram and see how it looks

View entire conversation on ReviewNB

review-notebook-app · 2021-08-27T20:53:41Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-27T20:53:41Z
----------------------------------------------------------------

Plot joint is deprecated, you should use plot_pair instead. See https://arviz-devs.github.io/arviz/examples/plot_joint.html

review-notebook-app · 2021-08-30T13:47:04Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:04Z
----------------------------------------------------------------

This has to be removed

review-notebook-app · 2021-08-30T13:47:05Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:04Z
----------------------------------------------------------------

There is a render issue here. It looks like a backslash was added so now dollar sign is escaped breaking what should be math and what should not

chiral-carbon commented on 2021-08-30T20:07:15Z
----------------------------------------------------------------

yes I added a dollar sign there because it wasn't one before

Each curve shows how the probability of earning more than 50K changes with age.

and to escape it I added a backslash since there is another dollar sign later on. it rendered fine in my notebook but I guess I'll just remove the dollar sign I added.

OriolAbril commented on 2021-08-30T20:43:12Z
----------------------------------------------------------------

it can be a reviewnb issue, it is not too trustworthy when rendering, so if it looks right in your notebook just leave it like this. There are also other dollar signs rendered correctly in other cells, so you can also copy the markdown source from there.

review-notebook-app · 2021-08-30T13:47:06Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:05Z
----------------------------------------------------------------

I think there are too many quantiles, I would use ~5-7 at most, otherwise it becomes difficult to interpret them. Also note that plot_kde is a low level ArviZ function, so you need to manually add the labels.

chiral-carbon commented on 2021-08-30T14:46:01Z
----------------------------------------------------------------

I tried passing a label argument as:

az.plot_kde(np.exp(b), plot_kwargs={'label': 'Odds Ratio');

but got an error saying there were too many arguments for label.

And it doesn't accept xlabel or ylabel as args either because we are invoking matplotlib.lines.Line2D() which doesn't have those arguments.

OriolAbril commented on 2021-08-30T15:16:41Z
----------------------------------------------------------------

label would be used to set the legend label in matplotlib. I think plot_kde takes it directly, not via plot_kwargs (you should check https://arviz-devs.github.io/arviz/api/generated/arviz.plot_kde.html to be sure though).

To set the xlabel you need to use matplotlib manually, more or less like it was done before:

ax = az.plot_kde(...
ax.set_xlabel("Odds Ratio");

review-notebook-app · 2021-08-30T13:47:06Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:06Z
----------------------------------------------------------------

In this case it won't change the results because we are computing percentiles, but if we were computing the mean or variance, then the result could be completely different. Therefore, I would first compute the exponential, then the percentile so that operate -> summarize and not summarize -> operate becomes the norm.

As a side note, you can compute the exponential in the cell above and store odds_ratio instead of storing b and use it for plotting and summarizing.

chiral-carbon commented on 2021-08-30T14:46:40Z
----------------------------------------------------------------

good point, will correct that.

chiral-carbon commented on 2021-08-30T14:50:50Z
----------------------------------------------------------------

also, I think Odds Ratio is the values on the xticks, not the value np.exp(trace.posterior['educ'] ?

OriolAbril commented on 2021-08-30T15:14:36Z
----------------------------------------------------------------

the values of np.exp(trace.posterior['educ'] are encoded in the x axis, not as in the y axis. Also note how it says P(1.378 < O.R. < 1.414) = 0.95 . The O.R. stands for Odds Ratio

chiral-carbon · 2021-08-30T14:46:02Z

I tried passing label argument as az.plot_kde(np.exp(b), plot_kwargs={'label': 'Odds Ratio'); but got an error saying there were too many arguments for label.

And it doesn't accept xlabel or ylabel as args either because we are invoking matplotlib.lines.Line2D() which doesn't have those arguments.

View entire conversation on ReviewNB

chiral-carbon · 2021-08-30T14:46:41Z

good point, will correct that.

View entire conversation on ReviewNB

chiral-carbon · 2021-08-30T14:50:51Z

also, I think Odds Ratio is the values on the xticks, not the value np.exp(trace.posterior['educ'] ?

View entire conversation on ReviewNB

OriolAbril · 2021-08-30T15:14:37Z

the values of np.exp(trace.posterior['educ'] are encoded in the x axis, not as in the y axis. Also note how it says P(1.378 < O.R. < 1.414) = 0.95 . The O.R. stands for Odds Ratio

View entire conversation on ReviewNB

OriolAbril · 2021-08-30T15:16:41Z

label would be used to set the legend label in matplotlib. I think plot_kde takes it directly, not via plot_kwargs (you should check https://arviz-devs.github.io/arviz/api/generated/arviz.plot_kde.html to be sure though).

To set the xlabel you need to use matplotlib manually, more or less like it was done before:

ax = az.plot_kde(...
ax.set_xlabel("Odds Ratio");

View entire conversation on ReviewNB

chiral-carbon · 2021-08-30T20:07:16Z

yes I added a dollar sign there because it wasn't one before

Each curve shows how the probability of earning more than 50K changes with age.

and to escape it I added a backslash since there is another dollar sign later on. it rendered fine in my notebook but I guess I'll just remove the dollar sign I added.

View entire conversation on ReviewNB

OriolAbril · 2021-08-30T20:43:13Z

it can be a reviewnb issue, it is not too trustworthy when rendering, so if it looks right in your notebook just leave it like this. There are also other dollar signs rendered correctly in other cells, so you can also copy the markdown source from there.

View entire conversation on ReviewNB

review-notebook-app · 2021-08-31T13:05:58Z

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-31T13:05:57Z
----------------------------------------------------------------

I get warnings for constrained_layout=True being used, I guess this happens internally. I tried fixing it but kept getting the syntax wrong, will push that once I correct it

[WIP] Update glm-logistic with bambi

256fd6a

chiral-carbon added the outreachy2021 label Jul 26, 2021

chiral-carbon mentioned this pull request Aug 13, 2021

Update GLM robust #205

Merged

MarcoGorelli marked this pull request as draft August 13, 2021 13:25

update with bambi binomial family

e525b08

chiral-carbon marked this pull request as ready for review August 24, 2021 21:26

changes in plotting

23bcdae

update after running 'model selection' section

244250b

changes: plots

66f0114

fixes

19dfb60

OriolAbril approved these changes Aug 31, 2021

View reviewed changes

OriolAbril merged commit 014bc47 into pymc-devs:main Aug 31, 2021

chiral-carbon deleted the glmlogistic branch August 31, 2021 13:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Update glm-logistic with bambi #196

[WIP] Update glm-logistic with bambi #196

chiral-carbon commented Jul 26, 2021

review-notebook-app bot commented Jul 26, 2021

review-notebook-app bot commented Jul 28, 2021 •

edited

Loading

tomicapretto commented Jul 28, 2021 •

edited

Loading

chiral-carbon commented Aug 2, 2021

tomicapretto commented Aug 4, 2021

chiral-carbon commented Aug 5, 2021

review-notebook-app bot commented Aug 24, 2021 •

edited

Loading

review-notebook-app bot commented Aug 24, 2021 •

edited

Loading

OriolAbril commented Aug 25, 2021

tomicapretto commented Aug 25, 2021

review-notebook-app bot commented Aug 26, 2021 •

edited

Loading

review-notebook-app bot commented Aug 26, 2021 •

edited

Loading

tomicapretto commented Aug 27, 2021

chiral-carbon commented Aug 27, 2021

OriolAbril commented Aug 27, 2021

review-notebook-app bot commented Aug 27, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

chiral-carbon commented Aug 30, 2021

chiral-carbon commented Aug 30, 2021

chiral-carbon commented Aug 30, 2021

OriolAbril commented Aug 30, 2021

OriolAbril commented Aug 30, 2021

chiral-carbon commented Aug 30, 2021

OriolAbril commented Aug 30, 2021

review-notebook-app bot commented Aug 31, 2021

[WIP] Update glm-logistic with bambi #196

[WIP] Update glm-logistic with bambi #196

Conversation

chiral-carbon commented Jul 26, 2021

review-notebook-app bot commented Jul 26, 2021

review-notebook-app bot commented Jul 28, 2021 • edited Loading

tomicapretto commented Jul 28, 2021 • edited Loading

chiral-carbon commented Aug 2, 2021

tomicapretto commented Aug 4, 2021

chiral-carbon commented Aug 5, 2021

review-notebook-app bot commented Aug 24, 2021 • edited Loading

review-notebook-app bot commented Aug 24, 2021 • edited Loading

OriolAbril commented Aug 25, 2021

tomicapretto commented Aug 25, 2021

review-notebook-app bot commented Aug 26, 2021 • edited Loading

review-notebook-app bot commented Aug 26, 2021 • edited Loading

tomicapretto commented Aug 27, 2021

chiral-carbon commented Aug 27, 2021

OriolAbril commented Aug 27, 2021

review-notebook-app bot commented Aug 27, 2021 • edited Loading

review-notebook-app bot commented Aug 30, 2021

review-notebook-app bot commented Aug 30, 2021 • edited Loading

review-notebook-app bot commented Aug 30, 2021 • edited Loading

review-notebook-app bot commented Aug 30, 2021 • edited Loading

chiral-carbon commented Aug 30, 2021

chiral-carbon commented Aug 30, 2021

chiral-carbon commented Aug 30, 2021

OriolAbril commented Aug 30, 2021

OriolAbril commented Aug 30, 2021

chiral-carbon commented Aug 30, 2021

OriolAbril commented Aug 30, 2021

review-notebook-app bot commented Aug 31, 2021

review-notebook-app bot commented Jul 28, 2021 •

edited

Loading

tomicapretto commented Jul 28, 2021 •

edited

Loading

review-notebook-app bot commented Aug 24, 2021 •

edited

Loading

review-notebook-app bot commented Aug 24, 2021 •

edited

Loading

review-notebook-app bot commented Aug 26, 2021 •

edited

Loading

review-notebook-app bot commented Aug 26, 2021 •

edited

Loading

review-notebook-app bot commented Aug 27, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading

review-notebook-app bot commented Aug 30, 2021 •

edited

Loading