-
-
Notifications
You must be signed in to change notification settings - Fork 269
binomial regression example #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-06-26T07:21:32Z I wouldn't mention the GLM module at all, since it has been deprecated in V4. You can mention the Bambi library, but perhaps it's enough at the end.
Nitpick: recap appears twice at the beginning of two consecutive paragraphs. drbenvincent commented on 2021-06-26T10:54:21Z I've fixed the repetition, and did some slight language tweaks, including "data is" -> "data are"
I've removed mention of the GLM module.
|
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-06-26T07:21:33Z Perhaps not, but what if you showed one of those double scales with raw y on the left side and the proportion scale on the right side? Just because you made the good point before that proportion lose info (although only if there is variation in drbenvincent commented on 2021-06-26T10:54:59Z Excellent idea. Have added the second y-axis to this plot - and the one below.
|
Very neat and directly to the point! |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-06-26T07:26:01Z Any links to more advanced examples (not necessarily pymc3), if you happen to know of?
drbenvincent commented on 2021-06-26T10:55:42Z I've added a reference to a useful GLM book which also has a free online version. Unfortunately it uses R code, but it's a good book. |
- removed mention of glm module - fixed the pluralisation of data - removed repeated "recap" - changed eta -> p which makes more sense in this example - added double y axes on data space plots - added some pointers to further information + link to relevant wikipedia page - set numpy global random seed - required for scipy distributions generating the dataset
I've fixed the repetition, and did some slight language tweaks, including "data is" -> "data are"
I've removed mention of the GLM module.
View entire conversation on ReviewNB |
Excellent idea. Have added the second y-axis to this plot - and the one below.
View entire conversation on ReviewNB |
I've added a reference to a useful GLM book which also has a free online version. Unfortunately it uses R code, but it's a good book. View entire conversation on ReviewNB |
Thanks @ricardoV94. Think I've sorted everything you've mentioned now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Didn't spot anything else.
Just spotted and fixed a minor mistake - it wasn't using the seeded rng. So just changed |
@drbenvincent thanks a lot, I like the explanations in here too! I don't see any other issues, so let's hit the merge button. |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2021-06-28T05:46:17Z Line #6. import seaborn as sns seaborn is not used |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2021-06-28T05:46:18Z Line #42. ax[1].plot(β0_true, β1_true, "ro", label="true") Better use "C2o" instead of "ro", both to avoid using red, and and to use the same color as in the left panel to mark the "true" value. |
Thanks @aloctavodia - will submit a new pull request soon |
Here's a new example for binomial regression. I've specifically not used the glm module in order to make absolutely everything totally transparent. Hopefully there's a decent amount of intro / explanation here to be useful for total beginners.
I believe everything is best practice - although not sure about setting random seed when sampling from a scipy distribution in the data generation part.
Let me know if you can think of any edits.