-
-
Notifications
You must be signed in to change notification settings - Fork 269
[WIP] Update glm-logistic with bambi #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-07-28T04:06:53Z @tomicapretto, after seeing https://github.com/bambinos/bambi/issues/371 it looks like the binomial family is not part of the immediate roadmap, so it would probably be best to keep this notebook on standby for a while. How does this sound? Do you already have a bit of an idea about the expected timeline? |
Hi @OriolAbril, thanks for tagging me here! Actually, I realized it wasn't so much work to implement some stuff to have the Binomial family. It's still being implemented in formulae (see bambinos/formulae#40), and it will be implemented in Bambi after that. Also thank you @chiral-carbon for all your work here! 😁 |
thanks a lot @tomicapretto , means a lot to me! 😄 |
Hi @chiral-carbon. Bambi development version now has a Binomial family 🎉 |
this is great @tomicapretto! |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-24T21:31:20Z I think I should try to replace this with OriolAbril commented on 2021-08-25T07:55:12Z it should still use seaborn. This part is the initial data exploration and ArviZ is designed to explore the results of the models, so it expects data to have chain and draw dim for example which is not the case yet.
|
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-24T21:31:21Z how can I fix this, since the formulae itself would not change? tomicapretto commented on 2021-08-25T11:59:22Z Each row in the data represents a Bernoulli trial, whether income is greater than 50k or not. For this case, you need to use family="bernoulli". You should use family="binomial" when the rows are aggregated and you have the number of bernoulli trials and the number of successes (something like a number of people, and how many of those make more than 50k).
In short, you need the same formula, but using
This may help too. |
it should still use seaborn. This part is the initial data exploration and ArviZ is designed to explore the results of the models, so it expects data to have chain and draw dim for example which is not the case yet.
View entire conversation on ReviewNB |
Each row in the data represents a Bernoulli trial, whether income is greater than 50k or not. For this case, you need to use family="bernoulli". You should use family="binomial" when the rows are aggregated and you have the number of bernoulli trials and the number of successes (something like a number of people, and how many of those make more than 50k).
In short, you need the same formula, but using
This may help too. View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-26T21:28:26Z wanted to use arviz here and had tried arviz.plot_dist(), would that be correct usage? OriolAbril commented on 2021-08-27T20:53:22Z I am not sure plot_dist works to create histograms of floats in latest release, I think it's only added to development version. You can try using the default KDE though instead of a histogram and see how it looks
|
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-26T21:28:26Z my kernel keeps dying at this step so any suggestions to fix this? tomicapretto commented on 2021-08-27T00:08:08Z One simple thing to do is standardizin the variable Note it is possible to standardize using the formula interface, for example, chiral-carbon commented on 2021-08-27T16:51:28Z thanks a lot! but it worked out without changing anything, too many processes were running so that's why the kernel kept dying I think 😅 |
One simple thing to do is standardizin the variable Note it is possible to standardize using the formula interface, for example, View entire conversation on ReviewNB |
thanks a lot! but it worked out without changing anything, too many processes were running so that's why the kernel kept dying I think 😅 View entire conversation on ReviewNB |
I am not sure plot_dist works to create histograms of floats in latest release, I think it's only added to development version. You can try using the default KDE though instead of a histogram and see how it looks
View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-27T20:53:41Z Plot joint is deprecated, you should use plot_pair instead. See https://arviz-devs.github.io/arviz/examples/plot_joint.html |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-30T13:47:04Z This has to be removed |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-30T13:47:04Z There is a render issue here. It looks like a backslash was added so now dollar sign is escaped breaking what should be math and what should not
chiral-carbon commented on 2021-08-30T20:07:15Z yes I added a dollar sign there because it wasn't one before Each curve shows how the probability of earning more than 50K changes with age. and to escape it I added a backslash since there is another dollar sign later on. it rendered fine in my notebook but I guess I'll just remove the dollar sign I added. OriolAbril commented on 2021-08-30T20:43:12Z it can be a reviewnb issue, it is not too trustworthy when rendering, so if it looks right in your notebook just leave it like this. There are also other dollar signs rendered correctly in other cells, so you can also copy the markdown source from there. |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-30T13:47:05Z I think there are too many quantiles, I would use ~5-7 at most, otherwise it becomes difficult to interpret them. Also note that chiral-carbon commented on 2021-08-30T14:46:01Z I tried passing a az.plot_kde(np.exp(b), plot_kwargs={'label': 'Odds Ratio'); but got an error saying there were too many arguments for And it doesn't accept OriolAbril commented on 2021-08-30T15:16:41Z
To set the xlabel you need to use matplotlib manually, more or less like it was done before:
ax = az.plot_kde(... ax.set_xlabel("Odds Ratio");
|
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-08-30T13:47:06Z In this case it won't change the results because we are computing percentiles, but if we were computing the mean or variance, then the result could be completely different. Therefore, I would first compute the exponential, then the percentile so that operate -> summarize and not summarize -> operate becomes the norm.
As a side note, you can compute the exponential in the cell above and store chiral-carbon commented on 2021-08-30T14:46:40Z good point, will correct that. chiral-carbon commented on 2021-08-30T14:50:50Z also, I think Odds Ratio is the values on the xticks, not the value OriolAbril commented on 2021-08-30T15:14:36Z the values of |
I tried passing And it doesn't accept View entire conversation on ReviewNB |
good point, will correct that. View entire conversation on ReviewNB |
also, I think Odds Ratio is the values on the xticks, not the value View entire conversation on ReviewNB |
the values of View entire conversation on ReviewNB |
To set the xlabel you need to use matplotlib manually, more or less like it was done before:
ax = az.plot_kde(... ax.set_xlabel("Odds Ratio");
View entire conversation on ReviewNB |
yes I added a dollar sign there because it wasn't one before Each curve shows how the probability of earning more than 50K changes with age. and to escape it I added a backslash since there is another dollar sign later on. it rendered fine in my notebook but I guess I'll just remove the dollar sign I added. View entire conversation on ReviewNB |
it can be a reviewnb issue, it is not too trustworthy when rendering, so if it looks right in your notebook just leave it like this. There are also other dollar signs rendered correctly in other cells, so you can also copy the markdown source from there. View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB chiral-carbon commented on 2021-08-31T13:05:57Z I get warnings for
|
Addresses issue #82 and aims to advance it to best practices, use
bambi
instead of pymc glm module.I had a question regarding the family to be used in cells 4 and 14 while defining the model. The notebook originally uses Binomial family with the GLM module but with
bambi
I think we do not have a default binomial family defined. I usednegativebinomial
just in case but since it is not the correct replacement, what should I do about it?