-
-
Notifications
You must be signed in to change notification settings - Fork 269
Updates to Simpon's Paradox notebook #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-09-01T11:57:25Z add "directed acycylic graph (DSG)" for people who do not know what a DAG is. drbenvincent commented on 2024-09-02T14:14:16Z Good idea. Done in an upcoming commit. tomicapretto commented on 2024-09-09T18:51:37Z We then show 2 "wayes" to resolve... typo? drbenvincent commented on 2024-09-13T09:03:22Z Well spotted. Fixed |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-09-01T11:57:26Z Line #1. with pm.Model() as model1: where are the coords? drbenvincent commented on 2024-09-02T14:20:15Z So you can actually provide dims in the model without specifying that they are in the coords. For example in Model 2 I provide a coords dict but only provide the group labels. Yet I still have a dimension in the model called obs_id. Because this is not defined, it's easier when it comes to out of sample prediction when shapes change.
I'm happy to explicitly provide all coords in the kwarg, but that will probably require updates in terms of new coords in the out of sample prediction. That might actually be better practice, but let me know if you'd prefer that before I put in the work :) |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-09-01T11:57:27Z Line #7. sigma = pm.Gamma("sigma", 2, 2) in the previous versions there us one sigma per group, why did you change it? drbenvincent commented on 2024-09-02T14:23:39Z In model 3 I was getting divergences and I thought it might help things if I went from unpooled sigmas to pooled. Thought that would actually be ok because the data generating process has identical sigmas for each group.
I just thought I'd keep a single fully pooled parameter in all models to demonstrate you can mix and match.
But could potentially be simpler if Model 2 has unpooled sigmas and Model 3 has hierarchical. Do you think it would be a big improvement to do that?
|
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2024-09-01T11:57:27Z
drbenvincent commented on 2024-09-02T14:31:07Z In terms of divergences: Original notebook used the non-centred parameterisation. The problem that arose was that that makes the DAG and model maths deviate from the regular parameterisation and made it more difficult to explain the connections between the Wilkinson notation, model maths, and pymc model code.
In terms of sigma: See my reply another of your comments above. Do you think a hierarchy on sigma would reduce divergences? |
View / edit / reply to this conversation on ReviewNB tomicapretto commented on 2024-09-01T14:17:08Z I think it would be more consistent to always use drbenvincent commented on 2024-09-02T14:28:05Z Thanks, this was an error that I didn't catch. Fixed in an upcoming commit |
View / edit / reply to this conversation on ReviewNB tomicapretto commented on 2024-09-01T14:17:09Z The formula is correct and the model it builds is equivalent to the one you wrote with distributions above. However, there's not a one-to-one mapping between the distributions above and the formula. The models are equivalent but not exactly the same. With the formula, we're writing the following model
$$ \begin{aligned} p_{0\sigma}, p_{1\sigma} &\sim \text{Gamma}(2, 2) \\ \vec{u_0} &\sim \text{Normal}(0, p_{0\sigma}^2) \\ \vec{u_1} &\sim \text{Normal}(0, p_{1\sigma}^2) \\ \beta_0 &\sim \text{Normal} \\ \beta_1 &\sim \text{Normal} \\ \sigma &\sim \text{Gamma}(2, 2) \\ \mu_i &= \beta_0 + \vec{u_0}[g_i] + \beta_1 \cdot x_i + 1\vec{u_1}[g_i] \cdot x_i \\ y_i &\sim \text{Normal}(\mu_i, \sigma) \end{aligned} $$
and then
This comment in a Bambi discussion can also help
drbenvincent commented on 2024-09-02T14:52:36Z Ah, so I think I understand 100%, let me check. The formula block I wrote is equivalent to your formula block. Your's encodes a population mean and group deflections but mine just encodes the absolute group values (not as deflections from population mean).
However, when it comes to comparing to the Wilkinson notation, the population mean + deflections is a truer interpretation of what that model notation really means? I buy that.
When I was writing this update I was actually thinking about adding an admonition block saying that you can get an equivalent model with the population + deflection approach. But now I am swayed to go with your population + deflection approach AND update the pymc model accordingly. But I might still add in an admonition block describing this initial model, and to link to the Bambi discussion. tomicapretto commented on 2024-09-03T09:39:02Z Yes, that's correct! The Wilkinson's model formula was adapted in a non-bayesian setting where you have fixed and random parameters. The population mean is the fixed part, the group deflections are the random part. In a bayesian context they're not different (both are random, and that's fine). |
In model 3 I was getting divergences and I thought it might help things if I went from unpooled sigmas to pooled. Thought that would actually be ok because the data generating process has identical sigmas for each group.
I just thought I'd keep a single fully pooled parameter in all models to demonstrate you can mix and match.
But could potentially be simpler if Model 2 has unpooled sigmas and Model 3 has hierarchical. Do you think it would be a big improvement to do that?
View entire conversation on ReviewNB |
Thanks, this was an error that I didn't catch. Fixed in an upcoming commit View entire conversation on ReviewNB |
In terms of divergences: Original notebook used the non-centred parameterisation. The problem that arose was that that makes the DAG and model maths deviate from the regular parameterisation and made it more difficult to explain the connections between the Wilkinson notation, model maths, and pymc model code.
In terms of sigma: See my reply another of your comments above. Do you think a hierarchy on sigma would reduce divergences? View entire conversation on ReviewNB |
Ah, so I think I understand 100%, let me check. The formula block I wrote is equivalent to your formula block. Your's encodes a population mean and group deflections but mine just encodes the absolute group values (not as deflections from population mean).
However, when it comes to comparing to the Wilkinson notation, the population mean + deflections is a truer interpretation of what that model notation really means? I buy that.
When I was writing this update I was actually thinking about adding an admonition block saying that you can get an equivalent model with the population + deflection approach. But now I am swayed to go with your population + deflection approach AND update the pymc model accordingly. But I might still add in an admonition block describing this initial model, and to link to the Bambi discussion. View entire conversation on ReviewNB |
Yes, that's correct! The Wilkinson's model formula was adapted in a non-bayesian setting where you have fixed and random parameters. The population mean is the fixed part, the group deflections are the random part. In a bayesian context they're not different (both are random, and that's fine). View entire conversation on ReviewNB |
@tomicapretto So this is all very interesting and I feel like I'm learning a lot. I've written the Model3 ![]() Question 1: If you want to describe the slope of an as yet unobserved group, you would take samples from Question 2: What would determine if you should specify the distributions of |
It's actually a bit more complicated than that. I think I should write a comprehensive guide for this in the future (which would help me to understand it better too). I always point people to these two links, which I find useful. https://www.andrewheiss.com/blog/2021/11/10/ame-bayes-re-guide/
I'm not sure if you're asking what to do to predict for a new group with ZSN or when one should use ZSN instead of a simple Normal. As usual, it depends. Hopefully the simpler exampe below will help. This is a model where we attempt to have a global mean There are several things one can do to resolve that issue:
With any of the choices above we get equivalent models, but using different parametrizations. All of them are "no pooling" models. Another way to "resolve" that issue is to use a hierarchical approach. One could write the model this way or this other way where With this hierarchical approach we are not removing linear dependencies like above. What happens here is a bit more sophisticated. The However, as the sample size for each group What do I want to say with all of this? That, theoretically, the hierarchy will make your parameters identifiable because they're no longer independent and thus Normal should work. However, as the sample sizes for the groups grow, it will be harder and harder to get draws from the posterior because of the increasing correlation. See, for example, the debate here about zero-sum constraints https://discourse.mc-stan.org/t/new-stan-data-type-zero-sum-vector/26215/3. It's a very interesting and challenging topic. Finally, there's also this book https://www.amazon.com/Richly-Parameterized-Chapman-Statistical-Science/dp/0367533731 which covers some of this topics (identifiability) using the "degrees of freedom" concept. Long story short, when you have partially pooled parameters the degrees of freedom are a non integer value. For the model above, it should be between those of the identifiable no-pooling model and the non-identifiable partial-pooling model. Btw, I'm fairly familiar with this but I'm certainly not a truly expert, so I recommend reading the book. |
Superb @tomicapretto ! Because of time constraints and considering the focus of the notebook, I think I will not tackle the population mean (or similar) issue in this notebook. I will take it as a win that this notebook started with the pymc model code, but now makes a reasonable attempt to discuss the mapping onto mathematical and Wilkinson descriptions. Nevertheless I'll continue to learn about this in more and either return to give more depth to this notebook or come up with an alternative. But in the mean time it sounds like you may well be coming up with your own Capretto definitive resource blog post or docs :) |
Haha thanks :D |
We then show 2 "wayes" to resolve... typo? View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB tomicapretto commented on 2024-09-09T19:05:47Z For extra clarity, in the second paragraph, you could also say that allows you to see the _conditional_ posterior predictive distribution drbenvincent commented on 2024-09-13T09:03:05Z Done :) |
Done :) View entire conversation on ReviewNB |
Well spotted. Fixed View entire conversation on ReviewNB |
Thanks @tomicapretto. I've made those changes. |
This PR updates the Simpon's Paradox notebook in a number of ways:
MutableData
->Data
rng
to sampling functions for increased reproducibilityNote: This PR is not related to an issue, I just decided to make these updates in the absence of an issue.
📚 Documentation preview 📚: https://pymc-examples--697.org.readthedocs.build/en/697/