Skip to content

[WIP] Update glm-logistic with bambi #196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 31, 2021

Conversation

chiral-carbon
Copy link
Collaborator

Addresses issue #82 and aims to advance it to best practices, use bambi instead of pymc glm module.

I had a question regarding the family to be used in cells 4 and 14 while defining the model. The notebook originally uses Binomial family with the GLM module but with bambi I think we do not have a default binomial family defined. I used negativebinomial just in case but since it is not the correct replacement, what should I do about it?

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Jul 28, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-07-28T04:06:53Z
----------------------------------------------------------------

@tomicapretto, after seeing https://github.com/bambinos/bambi/issues/371 it looks like the binomial family is not part of the immediate roadmap, so it would probably be best to keep this notebook on standby for a while. How does this sound? Do you already have a bit of an idea about the expected timeline?


@tomicapretto
Copy link

tomicapretto commented Jul 28, 2021

Hi @OriolAbril, thanks for tagging me here!

Actually, I realized it wasn't so much work to implement some stuff to have the Binomial family. It's still being implemented in formulae (see bambinos/formulae#40), and it will be implemented in Bambi after that.
As for the timeline, this should be done for the next release which will happen when GSoC ends.

Also thank you @chiral-carbon for all your work here! 😁

@chiral-carbon
Copy link
Collaborator Author

thanks a lot @tomicapretto , means a lot to me! 😄

@tomicapretto
Copy link

Hi @chiral-carbon. Bambi development version now has a Binomial family 🎉
Have look here to see how it works (and how it compares to the existing Bernoulli family)

@chiral-carbon
Copy link
Collaborator Author

this is great @tomicapretto!

@chiral-carbon chiral-carbon mentioned this pull request Aug 13, 2021
@MarcoGorelli MarcoGorelli marked this pull request as draft August 13, 2021 13:25
@chiral-carbon chiral-carbon marked this pull request as ready for review August 24, 2021 21:26
@review-notebook-app
Copy link

review-notebook-app bot commented Aug 24, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-24T21:31:20Z
----------------------------------------------------------------

I think I should try to replace this with az.plot_pair()


OriolAbril commented on 2021-08-25T07:55:12Z
----------------------------------------------------------------

it should still use seaborn. This part is the initial data exploration and ArviZ is designed to explore the results of the models, so it expects data to have chain and draw dim for example which is not the case yet.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 24, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-24T21:31:21Z
----------------------------------------------------------------

how can I fix this, since the formulae itself would not change?


tomicapretto commented on 2021-08-25T11:59:22Z
----------------------------------------------------------------

Each row in the data represents a Bernoulli trial, whether income is greater than 50k or not. For this case, you need to use family="bernoulli".

You should use family="binomial" when the rows are aggregated and you have the number of bernoulli trials and the number of successes (something like a number of people, and how many of those make more than 50k).

In short, you need the same formula, but using family="bernoulli" instead of family="binomial" .

This may help too.

Copy link
Member

it should still use seaborn. This part is the initial data exploration and ArviZ is designed to explore the results of the models, so it expects data to have chain and draw dim for example which is not the case yet.


View entire conversation on ReviewNB

Copy link

Each row in the data represents a Bernoulli trial, whether income is greater than 50k or not. For this case, you need to use family="bernoulli".

You should use family="binomial" when the rows are aggregated and you have the number of bernoulli trials and the number of successes (something like a number of people, and how many of those make more than 50k).

In short, you need the same formula, but using family="bernoulli" instead of family="binomial" .

This may help too.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 26, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-26T21:28:26Z
----------------------------------------------------------------

wanted to use arviz here and had tried arviz.plot_dist(), would that be correct usage?


OriolAbril commented on 2021-08-27T20:53:22Z
----------------------------------------------------------------

I am not sure plot_dist works to create histograms of floats in latest release, I think it's only added to development version. You can try using the default KDE though instead of a histogram and see how it looks

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 26, 2021

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-26T21:28:26Z
----------------------------------------------------------------

my kernel keeps dying at this step so any suggestions to fix this?


tomicapretto commented on 2021-08-27T00:08:08Z
----------------------------------------------------------------

One simple thing to do is standardizin the variableage instead of dividing by 10. We have this example here.

Note it is possible to standardize using the formula interface, for example, income ~ scale(age) + np.power(scale(age), 2)

chiral-carbon commented on 2021-08-27T16:51:28Z
----------------------------------------------------------------

thanks a lot! but it worked out without changing anything, too many processes were running so that's why the kernel kept dying I think 😅

Copy link

One simple thing to do is standardizin the variableage instead of dividing by 10. We have this example here.

Note it is possible to standardize using the formula interface, for example, income ~ scale(age) + np.power(scale(age), 2)


View entire conversation on ReviewNB

Copy link
Collaborator Author

thanks a lot! but it worked out without changing anything, too many processes were running so that's why the kernel kept dying I think 😅


View entire conversation on ReviewNB

Copy link
Member

I am not sure plot_dist works to create histograms of floats in latest release, I think it's only added to development version. You can try using the default KDE though instead of a histogram and see how it looks


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 27, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-27T20:53:41Z
----------------------------------------------------------------

Plot joint is deprecated, you should use plot_pair instead. See https://arviz-devs.github.io/arviz/examples/plot_joint.html


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:04Z
----------------------------------------------------------------

This has to be removed


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 30, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:04Z
----------------------------------------------------------------

There is a render issue here. It looks like a backslash was added so now dollar sign is escaped breaking what should be math and what should not


chiral-carbon commented on 2021-08-30T20:07:15Z
----------------------------------------------------------------

yes I added a dollar sign there because it wasn't one before

Each curve shows how the probability of earning more than 50K changes with age.

and to escape it I added a backslash since there is another dollar sign later on. it rendered fine in my notebook but I guess I'll just remove the dollar sign I added.

OriolAbril commented on 2021-08-30T20:43:12Z
----------------------------------------------------------------

it can be a reviewnb issue, it is not too trustworthy when rendering, so if it looks right in your notebook just leave it like this. There are also other dollar signs rendered correctly in other cells, so you can also copy the markdown source from there.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 30, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:05Z
----------------------------------------------------------------

I think there are too many quantiles, I would use ~5-7 at most, otherwise it becomes difficult to interpret them. Also note that plot_kde is a low level ArviZ function, so you need to manually add the labels.


chiral-carbon commented on 2021-08-30T14:46:01Z
----------------------------------------------------------------

I tried passing a label argument as:

az.plot_kde(np.exp(b), plot_kwargs={'label': 'Odds Ratio'); 

but got an error saying there were too many arguments for label.

And it doesn't accept xlabel or ylabel as args either because we are invoking matplotlib.lines.Line2D() which doesn't have those arguments.

OriolAbril commented on 2021-08-30T15:16:41Z
----------------------------------------------------------------

label would be used to set the legend label in matplotlib. I think plot_kde takes it directly, not via plot_kwargs (you should check https://arviz-devs.github.io/arviz/api/generated/arviz.plot_kde.html to be sure though).

To set the xlabel you need to use matplotlib manually, more or less like it was done before:

ax = az.plot_kde(...
ax.set_xlabel("Odds Ratio");

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 30, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-08-30T13:47:06Z
----------------------------------------------------------------

In this case it won't change the results because we are computing percentiles, but if we were computing the mean or variance, then the result could be completely different. Therefore, I would first compute the exponential, then the percentile so that operate -> summarize and not summarize -> operate becomes the norm.

As a side note, you can compute the exponential in the cell above and store odds_ratio instead of storing b and use it for plotting and summarizing.


chiral-carbon commented on 2021-08-30T14:46:40Z
----------------------------------------------------------------

good point, will correct that.

chiral-carbon commented on 2021-08-30T14:50:50Z
----------------------------------------------------------------

also, I think Odds Ratio is the values on the xticks, not the value np.exp(trace.posterior['educ'] ?

OriolAbril commented on 2021-08-30T15:14:36Z
----------------------------------------------------------------

the values of np.exp(trace.posterior['educ'] are encoded in the x axis, not as in the y axis. Also note how it says P(1.378 < O.R. < 1.414) = 0.95 . The O.R. stands for Odds Ratio

Copy link
Collaborator Author

I tried passing label argument as az.plot_kde(np.exp(b), plot_kwargs={'label': 'Odds Ratio'); but got an error saying there were too many arguments for label.

And it doesn't accept xlabel or ylabel as args either because we are invoking matplotlib.lines.Line2D() which doesn't have those arguments.


View entire conversation on ReviewNB

Copy link
Collaborator Author

good point, will correct that.


View entire conversation on ReviewNB

Copy link
Collaborator Author

also, I think Odds Ratio is the values on the xticks, not the value np.exp(trace.posterior['educ'] ?


View entire conversation on ReviewNB

Copy link
Member

the values of np.exp(trace.posterior['educ'] are encoded in the x axis, not as in the y axis. Also note how it says P(1.378 < O.R. < 1.414) = 0.95 . The O.R. stands for Odds Ratio


View entire conversation on ReviewNB

Copy link
Member

label would be used to set the legend label in matplotlib. I think plot_kde takes it directly, not via plot_kwargs (you should check https://arviz-devs.github.io/arviz/api/generated/arviz.plot_kde.html to be sure though).

To set the xlabel you need to use matplotlib manually, more or less like it was done before:

ax = az.plot_kde(...
ax.set_xlabel("Odds Ratio");


View entire conversation on ReviewNB

Copy link
Collaborator Author

yes I added a dollar sign there because it wasn't one before

Each curve shows how the probability of earning more than 50K changes with age.

and to escape it I added a backslash since there is another dollar sign later on. it rendered fine in my notebook but I guess I'll just remove the dollar sign I added.


View entire conversation on ReviewNB

Copy link
Member

it can be a reviewnb issue, it is not too trustworthy when rendering, so if it looks right in your notebook just leave it like this. There are also other dollar signs rendered correctly in other cells, so you can also copy the markdown source from there.


View entire conversation on ReviewNB

@OriolAbril OriolAbril merged commit 014bc47 into pymc-devs:main Aug 31, 2021
@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

chiral-carbon commented on 2021-08-31T13:05:57Z
----------------------------------------------------------------

I get warnings for constrained_layout=True being used, I guess this happens internally. I tried fixing it but kept getting the syntax wrong, will push that once I correct it


@chiral-carbon chiral-carbon deleted the glmlogistic branch August 31, 2021 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants