Skip to content

Update GLM hierarchical binomial model to best practices #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

CloudChaoszero
Copy link
Contributor

@CloudChaoszero CloudChaoszero commented Mar 21, 2021

This PR is the start of #14

Addresses #61 and updates the notebook to "Best Practices"

Summary

  • Have trace objects saved as InferenceData objects via return_inferencedata=True.
  • If an opportunity for using coords exists, then add param coords

I'm open to interpretation on if this is what should be the goal for linked issue #14.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CloudChaoszero
Copy link
Contributor Author

Performing required updates to 1-2 files now.

  • Gauging if these are the ideal changes needed.

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 21, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-21T09:06:03Z
----------------------------------------------------------------

We can use coords here. For example, ab could have dim var or param with coordinate values alpha, beta and instead of shape=N we could define an arange of length N as obs_id and use it for theta and p 


CloudChaoszero commented on 2021-03-21T21:58:25Z
----------------------------------------------------------------

Sounds good. I've updated the notebook to (hopefully) capture the changes needed!

OriolAbril commented on 2021-03-29T11:45:29Z
----------------------------------------------------------------

I think y and obs should not be in coords as they don't represent values to be used as labels. What we may want to do is use pm.Data for n to have it stored in the constant_data group of the inferencedata result.

CloudChaoszero commented on 2021-03-31T02:36:07Z
----------------------------------------------------------------

Sounds good. I added it via pm.Data under var name n_val

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 21, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-21T09:06:07Z
----------------------------------------------------------------

Use az.pair_plot instead, it will work directly with inferencedata, and will automatically set the labels of the plot right


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-21T09:06:07Z
----------------------------------------------------------------

I'd use trace.posterior["ab"].mean(("chain", "draw"))


@OriolAbril
Copy link
Member

Also, minor note, I think you'll run into merge issues with #24 here.

Copy link
Contributor Author

Sounds good. I've updated the notebook to (hopefully) capture the changes needed!


View entire conversation on ReviewNB

@OriolAbril
Copy link
Member

There are merge conflicts now 😅

@CloudChaoszero CloudChaoszero marked this pull request as ready for review March 29, 2021 07:46
@CloudChaoszero
Copy link
Contributor Author

@OriolAbril Yikes ha-- updated now!

Copy link
Member

I think y and obs should not be in coords as they don't represent values to be used as labels


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Mar 29, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-29T11:48:40Z
----------------------------------------------------------------

I think here it's better to usecompact=False

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-29T11:48:41Z
----------------------------------------------------------------

I would delete the commented code. Also, there is no need to use group="posterior" (it is the default and won't change) nor show=True given that we are in a jupyter environment


@review-notebook-app
Copy link

review-notebook-app bot commented Mar 29, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-03-29T11:54:52Z
----------------------------------------------------------------

We probably should update the priors and change the halfflat for a HalfNormal with large variance or something like this


CloudChaoszero commented on 2021-03-31T02:36:27Z
----------------------------------------------------------------

Updated the prior to have dist. HalfNormal, from HalfFlat

OriolAbril commented on 2021-03-31T04:56:32Z
----------------------------------------------------------------

It looks like the prior is too restrictive now. We should try with sigma higher than one, [5, 15] maybe. I also think that we can get rid of the testvals now that we are not using an improper prior anymore.

CloudChaoszero commented on 2021-04-01T08:02:18Z
----------------------------------------------------------------

I removed the testvals param to get the update.

Hmm, however the subsequent plot_trace & plot_pair visualizations do not re-produce the pre-existing visualizations, if you were looking to get those results.

OriolAbril commented on 2021-04-01T08:07:32Z
----------------------------------------------------------------

I think that not reproducing the prevoius results is an issue with our current prior choice. by default HalfNormal has sigma=1, but here the scale of beta is 10-20! Our prior is way to restrictive. We should increase the sigma of the halfnormal.

You should see that while you increase the posterior changes dramatically (i.e. rom the current 1 to 5 for example) until a point where increasing sigma has no effect anymore (i.e. probably from 20 to 50 the posterior has no significant changes)

CloudChaoszero commented on 2021-04-03T17:42:43Z
----------------------------------------------------------------

Sounds great. I changed the sigma to be from 1 to 10 actually. In the Github PR, I'll show the posterior plots b/t sigma 5 & 10.

Copy link
Contributor Author

Sounds good. I added it via pm.Data under var name n_val


View entire conversation on ReviewNB

Copy link
Contributor Author

Updated the prior to have dist. HalfNormal, from HalfFlat


View entire conversation on ReviewNB

Copy link
Member

It looks like the prior is too restrictive now. We should try with sigma higher than one, [5, 15] maybe. I also think that we can get rid of the testvals now that we are not using an improper prior anymore.


View entire conversation on ReviewNB

Copy link
Contributor Author

I removed the testvals param to get the update.

Hmm, however the subsequent plot_trace & plot_pair visualizations do not re-produce the pre-existing visualizations, if you were looking to get those results.


View entire conversation on ReviewNB

Copy link
Member

I think that not reproducing the prevoius results is an issue with our current prior choice. by default HalfNormal has sigma=1, but here the scale of beta is 10-20! Our prior is way to restrictive. We should increase the sigma of the halfnormal.

You should see that while you increase the posterior changes dramatically (i.e. rom the current 1 to 5 for example) until a point where increasing sigma has no effect anymore (i.e. probably from 20 to 50 the posterior has no significant changes)


View entire conversation on ReviewNB

Copy link
Contributor Author

Sounds great. I changed the sigma to be from 1 to 10 actually. In the Github PR, I'll show the posterior plots b/t sigma 5 & 10.


View entire conversation on ReviewNB

@CloudChaoszero
Copy link
Contributor Author

Sigma = 5

5

Sigma = 10

10

@review-notebook-app
Copy link

review-notebook-app bot commented Apr 4, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-04T07:05:21Z
----------------------------------------------------------------

I would delete the commented code.


CloudChaoszero commented on 2021-04-04T09:02:23Z
----------------------------------------------------------------

Gotcha

@OriolAbril
Copy link
Member

Thanks for the plots with multiple sigmas, it looks like from 5 to 10 it still changes significantly, can you take a look at sigma=20 and sigma=30? no need to post them, only check the intuition that the posterior progressively stops being so influenced by the prior. We are already retrieving the expected values with 10 so everything looks good. Thanks again for the PR 😄

Copy link
Contributor Author

Gotcha


View entire conversation on ReviewNB

@CloudChaoszero
Copy link
Contributor Author

Thanks for the plots with multiple sigmas, it looks like from 5 to 10 it still changes significantly, can you take a look at sigma=20 and sigma=30? no need to post them, only check the intuition that the posterior progressively stops being so influenced by the prior. We are already retrieving the expected values with 10 so everything looks good.

@OriolAbril Yeah, so there isn't a significant change in the posteriors after updating to sigma =20, sigma=30, seen below.

And no problem! 😄 This PR gives me some guidance on fully implementing return_inferencedata and coords changes for this examples repo.


Sigma = 20

Screen Shot 2021-04-04 at 1 45 14 AM

Sigma = 30

Screen Shot 2021-04-04 at 1 48 39 AM

@OriolAbril OriolAbril changed the title Add return_inferencedata and coords to trace Update GLM hierarchical binomial model to best practices Apr 4, 2021
@OriolAbril OriolAbril merged commit f0d6c33 into pymc-devs:main Apr 4, 2021
twiecki pushed a commit that referenced this pull request Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants