Updates to Simpon's Paradox notebook #697

drbenvincent · 2024-08-31T15:32:00Z

This PR updates the Simpon's Paradox notebook in a number of ways:

MutableData -> Data
Updated watermark based on the current style guide
Added mathematical descriptions of the models
Improved the hierarchical model. Switched from non-centred to regular parameterisation which gives a simpler graphviz which is more closely aligned to the model maths
Increased the causal language and have included causal DAGs
Added rng to sampling functions for increased reproducibility
Significantly simplified potting code
Added Wilkinson notation (and explanation) for each model. Hopefully this is correct, but I am not 100% sure.
Adds a missing notebook reference to the LKJ notebook because I reference to it from this one.
Added some more data visualisation
All models now have a fully pooled observation noise parameter for simplicity.

Notebook follows style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
PR description contains a link to the relevant issue:
- a tracker one for existing notebooks (tracker issues have the "tracker id" label)
- or a proposal one for new notebooks
Check the notebook is not excluded from any pre-commit check: https://github.com/pymc-devs/pymc-examples/blob/main/.pre-commit-config.yaml

Note: This PR is not related to an issue, I just decided to make these updates in the absence of an issue.

📚 Documentation preview 📚: https://pymc-examples--697.org.readthedocs.build/en/697/

review-notebook-app · 2024-08-31T15:32:05Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2024-09-01T11:57:26Z

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-09-01T11:57:25Z
----------------------------------------------------------------

add "directed acycylic graph (DSG)" for people who do not know what a DAG is.

drbenvincent commented on 2024-09-02T14:14:16Z
----------------------------------------------------------------

Good idea. Done in an upcoming commit.

tomicapretto commented on 2024-09-09T18:51:37Z
----------------------------------------------------------------

We then show 2 "wayes" to resolve... typo?

drbenvincent commented on 2024-09-13T09:03:22Z
----------------------------------------------------------------

Well spotted. Fixed

review-notebook-app · 2024-09-01T11:57:26Z

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-09-01T11:57:26Z
----------------------------------------------------------------

Line #1.    with pm.Model() as model1:

where are the coords?

drbenvincent commented on 2024-09-02T14:20:15Z
----------------------------------------------------------------

So you can actually provide dims in the model without specifying that they are in the coords. For example in Model 2 I provide a coords dict but only provide the group labels. Yet I still have a dimension in the model called obs_id. Because this is not defined, it's easier when it comes to out of sample prediction when shapes change.

I'm happy to explicitly provide all coords in the kwarg, but that will probably require updates in terms of new coords in the out of sample prediction. That might actually be better practice, but let me know if you'd prefer that before I put in the work :)

review-notebook-app · 2024-09-01T11:57:27Z

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-09-01T11:57:27Z
----------------------------------------------------------------

Line #7.        sigma = pm.Gamma("sigma", 2, 2)

in the previous versions there us one sigma per group, why did you change it?

drbenvincent commented on 2024-09-02T14:23:39Z
----------------------------------------------------------------

In model 3 I was getting divergences and I thought it might help things if I went from unpooled sigmas to pooled. Thought that would actually be ok because the data generating process has identical sigmas for each group.

I just thought I'd keep a single fully pooled parameter in all models to demonstrate you can mix and match.

But could potentially be simpler if Model 2 has unpooled sigmas and Model 3 has hierarchical. Do you think it would be a big improvement to do that?

review-notebook-app · 2024-09-01T11:57:28Z

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-09-01T11:57:27Z
----------------------------------------------------------------

we have some divergences
do you still want one global sigma? one per group or even build a hierarchy on sigma?

drbenvincent commented on 2024-09-02T14:31:07Z
----------------------------------------------------------------

In terms of divergences: Original notebook used the non-centred parameterisation. The problem that arose was that that makes the DAG and model maths deviate from the regular parameterisation and made it more difficult to explain the connections between the Wilkinson notation, model maths, and pymc model code.

In terms of sigma: See my reply another of your comments above. Do you think a hierarchy on sigma would reduce divergences?

review-notebook-app · 2024-09-01T14:17:08Z

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2024-09-01T14:17:08Z
----------------------------------------------------------------

I think it would be more consistent to always use \sim when we want to say that something follows a given distribution. I'm not sure why you're using = in the first two rows.

drbenvincent commented on 2024-09-02T14:28:05Z
----------------------------------------------------------------

Thanks, this was an error that I didn't catch. Fixed in an upcoming commit

review-notebook-app · 2024-09-01T14:17:09Z

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2024-09-01T14:17:09Z
----------------------------------------------------------------

The formula is correct and the model it builds is equivalent to the one you wrote with distributions above. However, there's not a one-to-one mapping between the distributions above and the formula. The models are equivalent but not exactly the same. With the formula, we're writing the following model

$$

\begin{aligned}

p_{0\sigma}, p_{1\sigma} &\sim \text{Gamma}(2, 2) \\

\vec{u_0} &\sim \text{Normal}(0, p_{0\sigma}^2) \\

\vec{u_1} &\sim \text{Normal}(0, p_{1\sigma}^2) \\

\beta_0 &\sim \text{Normal} \\

\beta_1 &\sim \text{Normal} \\

\sigma &\sim \text{Gamma}(2, 2) \\

\mu_i &= \beta_0 + \vec{u_0}[g_i] + \beta_1 \cdot x_i + 1\vec{u_1}[g_i] \cdot x_i \\

y_i &\sim \text{Normal}(\mu_i, \sigma)

\end{aligned}

$$

and then 1 maps $\beta_0$, x maps $\beta_1$, (1 | g) maps $u_0$ and (x | g) maps `u_1$

This comment in a Bambi discussion can also help

bambinos/bambi#808 (comment)

drbenvincent commented on 2024-09-02T14:52:36Z
----------------------------------------------------------------

Ah, so I think I understand 100%, let me check. The formula block I wrote is equivalent to your formula block. Your's encodes a population mean and group deflections but mine just encodes the absolute group values (not as deflections from population mean).

However, when it comes to comparing to the Wilkinson notation, the population mean + deflections is a truer interpretation of what that model notation really means? I buy that.

When I was writing this update I was actually thinking about adding an admonition block saying that you can get an equivalent model with the population + deflection approach. But now I am swayed to go with your population + deflection approach AND update the pymc model accordingly. But I might still add in an admonition block describing this initial model, and to link to the Bambi discussion.

tomicapretto commented on 2024-09-03T09:39:02Z
----------------------------------------------------------------

Yes, that's correct! The Wilkinson's model formula was adapted in a non-bayesian setting where you have fixed and random parameters. The population mean is the fixed part, the group deflections are the random part. In a bayesian context they're not different (both are random, and that's fine).

drbenvincent · 2024-09-02T14:23:40Z

In model 3 I was getting divergences and I thought it might help things if I went from unpooled sigmas to pooled. Thought that would actually be ok because the data generating process has identical sigmas for each group.

I just thought I'd keep a single fully pooled parameter in all models to demonstrate you can mix and match.

But could potentially be simpler if Model 2 has unpooled sigmas and Model 3 has hierarchical. Do you think it would be a big improvement to do that?

View entire conversation on ReviewNB

drbenvincent · 2024-09-02T14:28:06Z

Thanks, this was an error that I didn't catch. Fixed in an upcoming commit

View entire conversation on ReviewNB

drbenvincent · 2024-09-02T14:31:08Z

In terms of divergences: Original notebook used the non-centred parameterisation. The problem that arose was that that makes the DAG and model maths deviate from the regular parameterisation and made it more difficult to explain the connections between the Wilkinson notation, model maths, and pymc model code.

In terms of sigma: See my reply another of your comments above. Do you think a hierarchy on sigma would reduce divergences?

View entire conversation on ReviewNB

drbenvincent · 2024-09-02T14:52:38Z

Ah, so I think I understand 100%, let me check. The formula block I wrote is equivalent to your formula block. Your's encodes a population mean and group deflections but mine just encodes the absolute group values (not as deflections from population mean).

However, when it comes to comparing to the Wilkinson notation, the population mean + deflections is a truer interpretation of what that model notation really means? I buy that.

When I was writing this update I was actually thinking about adding an admonition block saying that you can get an equivalent model with the population + deflection approach. But now I am swayed to go with your population + deflection approach AND update the pymc model accordingly. But I might still add in an admonition block describing this initial model, and to link to the Bambi discussion.

View entire conversation on ReviewNB

tomicapretto · 2024-09-03T09:39:03Z

Yes, that's correct! The Wilkinson's model formula was adapted in a non-bayesian setting where you have fixed and random parameters. The population mean is the fixed part, the group deflections are the random part. In a bayesian context they're not different (both are random, and that's fine).

View entire conversation on ReviewNB

drbenvincent · 2024-09-04T09:19:05Z

@tomicapretto So this is all very interesting and I feel like I'm learning a lot.

I've written the Model3 1 + x + (1 + x | g) as:

Question 1: If you want to describe the slope of an as yet unobserved group, you would take samples from $\text{Normal}(\beta_1, p_{1\sigma})$? And you'd do that on a sample by sample basis, i.e. that the $\beta_1$ and $p_{1\sigma}$ represent samples drawn from the posterior. It feels a but clumsy, so I just wanted to check my thinking with you.

Question 2: What would determine if you should specify the distributions of $u$ as $\text{ZeroSumNormal}$, rather than $\text{Normal}$?

tomicapretto · 2024-09-05T14:50:41Z

@drbenvincent

Question 1: If you want to describe the slope of an as yet unobserved group, you would take samples from Normal(β1, p_1)? And you'd do that on a sample by sample basis, i.e. that the β1 and p_1 represent samples drawn from the posterior. It feels a but clumsy, so I just wanted to check my thinking with you.

It's actually a bit more complicated than that. I think I should write a comprehensive guide for this in the future (which would help me to understand it better too). I always point people to these two links, which I find useful.

https://www.andrewheiss.com/blog/2021/11/10/ame-bayes-re-guide/
https://x.com/tjmahr/status/1458443667748818955

Question 2: What would determine if you should specify the distributions of u as ZeroSumNormal, rather than Normal?

I'm not sure if you're asking what to do to predict for a new group with ZSN or when one should use ZSN instead of a simple Normal. As usual, it depends.

Hopefully the simpler exampe below will help.
Imagine you have a continuous outcome in the reals and A groups.
We use a normal observational model like the following

$$ \begin{aligned} Y_i &\sim \text{Normal}(\mu_i, \sigma^2) \\ \mu_i &= \alpha + \beta_{j[i]} \\ \alpha &\sim \text{Normal} \\ \beta_j &\sim \text{Normal} \quad \forall j = 1, \dots, A \\ \sigma &\sim \text{Something} \end{aligned} $$

This is a model where we attempt to have a global mean $\alpha$ and a group specific parameter $\beta_j$.
The posterior means of $\alpha$ and $\beta_j$ are non-identifiable as they're linearly dependent.

There are several things one can do to resolve that issue:

Constrain $\alpha$ to zero, which one think of as removing $\alpha$ from the model. In this case, the $\beta_j$ parameters represent the group means.
Constrain one $\beta_j$ to zero, which is like removing one $\beta_j$. $\alpha$ becomes the mean of the group for which we removed/constrained the $\beta$ parameter. All the other $\beta_j$ are deflections between the reference group mean and the $j$-th group mean.
Constraint the sum of $\beta_j$ params to zero. $\alpha$ represents the global mean and each $\beta_j$ deflections around that mean.

With any of the choices above we get equivalent models, but using different parametrizations. All of them are "no pooling" models.

Another way to "resolve" that issue is to use a hierarchical approach. One could write the model this way

$$ \begin{aligned} Y_i &\sim \text{Normal}(\mu_i, \sigma^2) \\ \mu_i &= \alpha + \beta_{j[i]} \\ \alpha &\sim \text{Normal} \\ \beta_j &\sim \text{Normal}(0, s^2) \quad \forall j = 1, \dots, A \\ s &\sim \text{Something} \\ \sigma &\sim \text{Something} \end{aligned} $$

or this other way

$$ \begin{aligned} Y_i &\sim \text{Normal}(\mu_i, \sigma^2) \\ \mu_i &= \beta_{j[i]} \\ \beta_j &\sim \text{Normal}(m, s^2) \quad \forall j = 1, \dots, A \\ m &\sim \text{Normal} \\ s &\sim \text{Something} \\ \sigma &\sim \text{Something} \end{aligned} $$

where $m$ is the $\alpha$ from above.

With this hierarchical approach we are not removing linear dependencies like above. What happens here is a bit more sophisticated. The $\beta_j$ parameters are no longer independent because they're assumed to be draws of a normal population with a common mean ($m$) and standard deviation ($s$). But there's no "hard" constraint like when we set some parameter, or their sum, to zero. Think of it as a "soft" constrained.

However, as the sample size for each group $j$ increases, the posterior will converge to the posterior one would get in the first model we got above, without restrictions, which has non-identifiable means.

What do I want to say with all of this? That, theoretically, the hierarchy will make your parameters identifiable because they're no longer independent and thus Normal should work. However, as the sample sizes for the groups grow, it will be harder and harder to get draws from the posterior because of the increasing correlation.
If you constrain the partially-pooled $\beta_j$ parameters with a zero-sum constraint, that problem should disappear. However, that can create other issues (for example, how to get predictions for new groups). Adrian and Luciano solved that for some cases, but I'm not sure if they solved it for all cases.

See, for example, the debate here about zero-sum constraints https://discourse.mc-stan.org/t/new-stan-data-type-zero-sum-vector/26215/3.

It's a very interesting and challenging topic.

Finally, there's also this book https://www.amazon.com/Richly-Parameterized-Chapman-Statistical-Science/dp/0367533731 which covers some of this topics (identifiability) using the "degrees of freedom" concept. Long story short, when you have partially pooled parameters the degrees of freedom are a non integer value. For the model above, it should be between those of the identifiable no-pooling model and the non-identifiable partial-pooling model.

Btw, I'm fairly familiar with this but I'm certainly not a truly expert, so I recommend reading the book.

drbenvincent · 2024-09-05T19:36:25Z

Superb @tomicapretto !

Because of time constraints and considering the focus of the notebook, I think I will not tackle the population mean (or similar) issue in this notebook. I will take it as a win that this notebook started with the pymc model code, but now makes a reasonable attempt to discuss the mapping onto mathematical and Wilkinson descriptions.

Nevertheless I'll continue to learn about this in more and either return to give more depth to this notebook or come up with an alternative. But in the mean time it sounds like you may well be coming up with your own Capretto definitive resource blog post or docs :)

tomicapretto · 2024-09-05T19:39:26Z

Superb @tomicapretto !

Because of time constraints and considering the focus of the notebook, I think I will not tackle the population mean (or similar) issue in this notebook. I will take it as a win that this notebook started with the pymc model code, but now makes a reasonable attempt to discuss the mapping onto mathematical and Wilkinson descriptions.

Nevertheless I'll continue to learn about this in more and either return to give more depth to this notebook or come up with an alternative. But in the mean time it sounds like you may well be coming up with your own Capretto definitive resource blog post or docs :)

Haha thanks :D

tomicapretto · 2024-09-09T18:51:38Z

We then show 2 "wayes" to resolve... typo?

View entire conversation on ReviewNB

review-notebook-app · 2024-09-09T19:05:47Z

View / edit / reply to this conversation on ReviewNB

tomicapretto commented on 2024-09-09T19:05:47Z
----------------------------------------------------------------

For extra clarity, in the second paragraph, you could also say that allows you to see the _conditional_ posterior predictive distribution

drbenvincent commented on 2024-09-13T09:03:05Z
----------------------------------------------------------------

Done :)

drbenvincent · 2024-09-13T09:03:07Z

Done :)

View entire conversation on ReviewNB

drbenvincent · 2024-09-13T09:03:24Z

Well spotted. Fixed

View entire conversation on ReviewNB

drbenvincent · 2024-09-13T09:03:47Z

Thanks @tomicapretto. I've made those changes.

drbenvincent added 6 commits August 31, 2024 13:26

MutableData -> Data

060dc04

remove filter_vars="regex" from plot_trace

e23da4c

update author edits section

4e5b1f3

update watermark

05992cb

add random seed, model maths, use Gammas for sigma, simplify plot code

aad143f

add causal DAGs

b90a3b9

drbenvincent added 13 commits August 31, 2024 16:38

update summary section

ae6357b

update admonition boxes

2184277

update and simplify introduction

c176b41

update section headings

63cd4a7

simplify scatter plot code with seaborn

5715231

code simplifications, model and idata renaming

e87c605

significant simplification of plot code

ed4782b

update figure desccription

c134f92

Bunch of improvements and fixes

1da30fb

Add Wilkinson notation for each model

9a7729c

fix admonitions + add notebook reference to LKJ notebook

b944a08

final tweaks before review

8944da3

hide short plot code cell + grammatical fix

9e6167b

drbenvincent requested review from OriolAbril, tomicapretto and juanitorduz September 1, 2024 11:32

update model 2 wilkinson notation

5c90218

change equals to sim in model 3 maths

8f6e402

drbenvincent requested a review from tomicapretto September 2, 2024 16:13

drbenvincent added 6 commits September 3, 2024 22:03

partial progress on reformulating Model 3

034342e

fixes

d938a2b

model 3 fixes + further plot code simplification

520b336

fix typo

231c4c8

small tweaks and improvements

bbcfa05

remove commas from model 3 maths

4544ed7

tweaks + abandon unobserved group distribution

8403f14

tweak plot sizes

b93c161

drbenvincent added 2 commits September 13, 2024 09:59

fix typo: "wayes" -> "ways"

10c5622

add clarification about _conditional_ posterior predictive

5c8bf4c

tomicapretto approved these changes Sep 14, 2024

View reviewed changes

juanitorduz approved these changes Sep 14, 2024

View reviewed changes

OriolAbril merged commit c6e35b9 into pymc-devs:main Sep 14, 2024
2 checks passed

drbenvincent deleted the simpsons branch September 14, 2024 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to Simpon's Paradox notebook #697

Updates to Simpon's Paradox notebook #697

drbenvincent commented Aug 31, 2024 •

edited

Loading

review-notebook-app bot commented Aug 31, 2024

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

drbenvincent commented Sep 2, 2024

drbenvincent commented Sep 2, 2024

drbenvincent commented Sep 2, 2024

drbenvincent commented Sep 2, 2024

tomicapretto commented Sep 3, 2024

drbenvincent commented Sep 4, 2024 •

edited

Loading

tomicapretto commented Sep 5, 2024

drbenvincent commented Sep 5, 2024

tomicapretto commented Sep 5, 2024

tomicapretto commented Sep 9, 2024

review-notebook-app bot commented Sep 9, 2024 •

edited

Loading

drbenvincent commented Sep 13, 2024

drbenvincent commented Sep 13, 2024

drbenvincent commented Sep 13, 2024

Updates to Simpon's Paradox notebook #697

Updates to Simpon's Paradox notebook #697

Conversation

drbenvincent commented Aug 31, 2024 • edited Loading

review-notebook-app bot commented Aug 31, 2024

review-notebook-app bot commented Sep 1, 2024 • edited Loading

review-notebook-app bot commented Sep 1, 2024 • edited Loading

review-notebook-app bot commented Sep 1, 2024 • edited Loading

review-notebook-app bot commented Sep 1, 2024 • edited Loading

review-notebook-app bot commented Sep 1, 2024 • edited Loading

review-notebook-app bot commented Sep 1, 2024 • edited Loading

drbenvincent commented Sep 2, 2024

drbenvincent commented Sep 2, 2024

drbenvincent commented Sep 2, 2024

drbenvincent commented Sep 2, 2024

tomicapretto commented Sep 3, 2024

drbenvincent commented Sep 4, 2024 • edited Loading

tomicapretto commented Sep 5, 2024

drbenvincent commented Sep 5, 2024

tomicapretto commented Sep 5, 2024

tomicapretto commented Sep 9, 2024

review-notebook-app bot commented Sep 9, 2024 • edited Loading

drbenvincent commented Sep 13, 2024

drbenvincent commented Sep 13, 2024

drbenvincent commented Sep 13, 2024

drbenvincent commented Aug 31, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

review-notebook-app bot commented Sep 1, 2024 •

edited

Loading

drbenvincent commented Sep 4, 2024 •

edited

Loading

review-notebook-app bot commented Sep 9, 2024 •

edited

Loading