Skip to content

Commit b84c42e

Browse files
authored
Notebook: GLM-hierarchical updates (styling, fixing links, etc) (#427)
* add styling, fix links, etc * adding myst file * TW changes added * adding myst file
1 parent 82c0e4e commit b84c42e

File tree

2 files changed

+288
-222
lines changed

2 files changed

+288
-222
lines changed

examples/generalized_linear_models/GLM-hierarchical.ipynb

+237-192
Large diffs are not rendered by default.

myst_nbs/generalized_linear_models/GLM-hierarchical.myst.md

+51-30
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,32 @@ jupytext:
66
format_version: 0.13
77
jupytext_version: 1.13.7
88
kernelspec:
9-
display_name: Python 3
9+
display_name: Python 3 (ipykernel)
1010
language: python
1111
name: python3
1212
---
1313

14+
(notebook_name)=
1415
# GLM: Hierarchical Linear Regression
1516

17+
:::{post} May 20, 2022
18+
:tags: generalized linear model, hierarchical model
19+
:category: intermediate
20+
:author: Danne Elbers, Thomas Wiecki, Chris Fonnesbeck
21+
:::
22+
1623
+++
1724

1825
(c) 2016 by Danne Elbers, Thomas Wiecki
1926

20-
> This tutorial is adapted from a [blog post by Danne Elbers and Thomas Wiecki called "The Best Of Both Worlds: Hierarchical Linear Regression in PyMC"](http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/).
21-
22-
Today's blog post is co-written by [Danne Elbers](http://www.linkedin.com/pub/danne-elbers/69/3a2/7ba) who is doing her masters thesis with me on computational psychiatry using Bayesian modeling. This post also borrows heavily from a [Notebook](http://nbviewer.ipython.org/github/fonnesbeck/multilevel_modeling/blob/master/multilevel_modeling.ipynb?create=1) by [Chris Fonnesbeck](http://biostat.mc.vanderbilt.edu/wiki/Main/ChrisFonnesbeck).
27+
This tutorial is adapted from a blog post by [Danne Elbers](http://www.linkedin.com/pub/danne-elbers/69/3a2/7ba) and Thomas Wiecki called ["The Best Of Both Worlds: Hierarchical Linear Regression in PyMC"](http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/). It also borrows heavily from a notebook {ref}`multilevel_modeling` by [Chris Fonnesbeck](http://biostat.mc.vanderbilt.edu/wiki/Main/ChrisFonnesbeck).
2328

24-
The power of Bayesian modelling really clicked for me when I was first introduced to hierarchical modelling. In this blog post we will:
29+
In this tutorial we:
2530

26-
* provide and intuitive explanation of hierarchical/multi-level Bayesian modeling;
31+
* provide an intuitive explanation of hierarchical/multi-level Bayesian modeling;
2732
* show how this type of model can easily be built and estimated in [PyMC](https://github.com/pymc-devs/pymc);
28-
* demonstrate the advantage of using hierarchical Bayesian modelling, as opposed to non-hierarchical Bayesian modelling by comparing the two
29-
* visualize the "shrinkage effect" (explained below)
33+
* demonstrate the advantage of using hierarchical Bayesian modelling, as opposed to non-hierarchical Bayesian; modelling by comparing the two
34+
* visualize the "shrinkage effect" (explained below);
3035
* highlight connections to the frequentist version of this model.
3136

3237
Having multiple sets of related measurements comes up all the time. In mathematical psychology, for example, you test multiple subjects on the same task. We then want to estimate a computational/mathematical model that describes the behavior on the task by a set of parameters. We could thus fit a model to each subject individually, assuming they share no similarities; or, pool all the data and estimate one model assuming all subjects are identical. Hierarchical modeling allows the best of both worlds by modeling subjects' similarities but also allowing estimation of individual parameters. As an aside, software from our lab, [HDDM](http://ski.cog.brown.edu/hddm_docs/), allows hierarchical Bayesian estimation of a widely used decision making model in psychology. In this blog post, however, we will use a more classical example of [hierarchical linear regression](http://en.wikipedia.org/wiki/Hierarchical_linear_modeling) to predict radon levels in houses.
@@ -38,10 +43,11 @@ This is the 3rd blog post on the topic of Bayesian modeling in PyMC, see here fo
3843

3944
## The Dataset
4045

41-
Gelman et al.'s (2007) radon dataset is a classic for hierarchical modeling. In this dataset the amount of the radioactive gas radon has been measured among different households in all counties of several states. Radon gas is known to be the highest cause of lung cancer in non-smokers. It is believed to be more strongly present in households containing a basement and to differ in amount present among types of soil.
42-
Here we'll investigate this differences and try to make predictions of radonlevels in different counties based on the county itself and the presence of a basement. In this example we'll look at Minnesota, a state that contains 85 counties in which different measurements are taken, ranging from 2 to 116 measurements per county.
46+
Gelman et al.'s (2007) radon dataset is a classic for hierarchical modeling. In this dataset the amount of the radioactive gas radon has been measured among different households in all counties of several states. Radon gas is known to be the highest cause of lung cancer in non-smokers. It is believed to be more strongly present in households containing a basement and to differ in amount present among types of soil. Here we'll investigate this differences and try to make predictions of radon levels in different counties based on the county itself and the presence of a basement. In this example we'll look at Minnesota, a state that contains 85 counties in which different measurements are taken, ranging from 2 to 116 measurements per county.
4347

44-
![radon](https://upload.wikimedia.org/wikipedia/commons/b/b9/CNX_Chem_21_06_RadonExpos.png)
48+
<p>
49+
<div> <img src="https://upload.wikimedia.org/wikipedia/commons/b/b9/CNX_Chem_21_06_RadonExpos.png" alt="radon exposure" style="width: 500px;"/></div>
50+
</p>
4551

4652
+++
4753

@@ -88,7 +94,7 @@ Now you might say: "That's easy! I'll just pool all my data and estimate one big
8894

8995
$$radon_{i, c} = \alpha + \beta*\text{floor}_{i, c} + \epsilon$$
9096

91-
Where $i$ represents the measurement, $c$ the county and floor contains a 0 or 1 if the house has a basement or not, respectively. If you need a refresher on Linear Regressions in `PyMC`, check out my [previous blog post](http://twiecki.github.io/blog/2013/08/12/bayesian-glms-1/). Critically, we are only estimating *one* intercept and *one* slope for all measurements over all counties pooled together as illustrated in the graphic below ($\theta$ represents $(\alpha, \beta)$ in our case and $y_i$ are the measurements of the $i$th county).
97+
Where $i$ represents the measurement, $c$ the county and floor contains a 0 or 1 if the house has a basement or not, respectively. If you need a refresher on Linear Regressions in `PyMC`, check out the previous post, [The Inference Button: Bayesian GLMs made easy with PyMC](http://twiecki.github.io/blog/2013/08/12/bayesian-glms-1/). Critically, we are only estimating *one* intercept and *one* slope for all measurements over all counties pooled together as illustrated in the graphic below ($\theta$ represents $(\alpha, \beta)$ in our case and $y_i$ are the measurements of the $i$th county).
9298

9399
![pooled](http://f.cl.ly/items/0R1W063h1h0W2M2C0S3M/Screen%20Shot%202013-10-10%20at%208.22.21%20AM.png)
94100

@@ -138,8 +144,8 @@ coords = {
138144
with pm.Model(coords=coords) as unpooled_model:
139145
140146
# Independent parameters for each county
141-
county_idx = pm.Data("county_idx", county_idxs, dims="obs_id")
142-
floor = pm.Data("floor", data.floor.values, dims="obs_id")
147+
county_idx = pm.Data("county_idx", county_idxs, dims="obs_id", mutable=True)
148+
floor = pm.Data("floor", data.floor.values, dims="obs_id", mutable=True)
143149
144150
a = pm.Normal("a", 0, sigma=100, dims="county")
145151
b = pm.Normal("b", 0, sigma=100, dims="county")
@@ -163,19 +169,19 @@ with unpooled_model:
163169
```
164170

165171
### Hierarchical Model
166-
Instead of creating models separatley, the hierarchical model creates group parameters that consider the countys not as completely different but as having an underlying similarity. These distributions are subsequently used to influence the distribution of each county's $\alpha$ and $\beta$.
172+
Instead of creating models separately, the hierarchical model creates group parameters that consider the counties not as completely different but as having an underlying similarity. These distributions are subsequently used to influence the distribution of each county's $\alpha$ and $\beta$.
167173

168174
```{code-cell} ipython3
169175
with pm.Model(coords=coords) as hierarchical_model:
170-
county_idx = pm.Data("county_idx", county_idxs, dims="obs_id")
176+
county_idx = pm.Data("county_idx", county_idxs, dims="obs_id", mutable=True)
171177
# Hyperpriors for group nodes
172178
mu_a = pm.Normal("mu_a", mu=0.0, sigma=100)
173179
sigma_a = pm.HalfNormal("sigma_a", 5.0)
174180
mu_b = pm.Normal("mu_b", mu=0.0, sigma=100)
175181
sigma_b = pm.HalfNormal("sigma_b", 5.0)
176182
177183
# Intercept for each county, distributed around group mean mu_a
178-
# Above we just set mu and sd to a fixed value while here we
184+
# Above we set mu and sd to a fixed value while here we
179185
# plug in a common group distribution for all a and b (which are
180186
# vectors of length n_counties).
181187
a = pm.Normal("a", mu=mu_a, sigma=sigma_a, dims="county")
@@ -313,12 +319,12 @@ In the above plot we have the data points in black of three selected counties. T
313319

314320
When looking at the county 'CASS' we see that the non-hierarchical estimation is strongly biased: as this county's data contains only households with a basement the estimated regression produces the non-sensical result of a giant negative slope meaning that we would expect negative radon levels in a house without basement!
315321

316-
Moreover, in the example county's 'CROW WING' and 'FREEBORN' the non-hierarchical model appears to react more strongly than the hierarchical model to the existence of outliers in the dataset ('CROW WING': no basement upper right. 'FREEBORN': basement upper left). Assuming that there should be a higher amount of radon gas measurable in households with basements opposed to those without, the county 'CROW WING''s non-hierachical model seems off. Having the group-distribution constrain the coefficients we get meaningful estimates in all cases as we apply what we learn from the group to the individuals and vice-versa.
322+
Moreover, in the example counties 'CROW WING' and 'FREEBORN' the non-hierarchical model appears to react more strongly than the hierarchical model to the existence of outliers in the dataset ('CROW WING': no basement upper right. 'FREEBORN': basement upper left). Assuming that there should be a higher amount of radon gas measurable in households with basements opposed to those without, the county 'CROW WING''s non-hierachical model seems off. By having the group-distribution constrain the coefficients, we get meaningful estimates in all cases as we apply what we learn from the group to the individuals and vice-versa.
317323

318324
+++
319325

320326
## Shrinkage
321-
Shrinkage describes the process by which our estimates are "pulled" towards the group-mean as a result of the common group distribution -- county-coefficients very far away from the group mean have very low probability under the normality assumption, moving them closer to the group mean gives them higher probability. In the non-hierachical model every county is allowed to differ completely from the others by just using each county's data, resulting in a model more prone to outliers (as shown above).
327+
Shrinkage describes the process by which our estimates are "pulled" towards the group-mean as a result of the common group distribution -- county-coefficients very far away from the group mean have very low probability under the normality assumption, moving them closer to the group mean gives them higher probability. In the non-hierachical model every county is allowed to differ completely from the others by using each county's data, resulting in a model more prone to outliers (as shown above).
322328

323329
```{code-cell} ipython3
324330
hier_a = hier_post["a"].mean("chain_draw")
@@ -355,38 +361,53 @@ for i in range(len(hier_b)):
355361
ax.legend();
356362
```
357363

358-
In the shrinkage plot above we show the coefficients of each county's non-hierarchical posterior mean (blue) and the hierarchical posterior mean (red). To show the effect of shrinkage on a single coefficient-pair (alpha and beta) we connect the blue and red points belonging to the same county by an arrow. Some non-hierarchical posteriors are so far out that we couldn't display them in this plot (it makes the axes too wide). Interestingly, all hierarchical posteriors of the floor-measure seem to be around -0.6 indicating that having a basement in almost all county's is a clear indicator for heightened radon levels. The intercept (which we take for type of soil) appears to differ among countys. This information would have been difficult to find if we had only used the non-hierarchical model.
364+
In the shrinkage plot above we show the coefficients of each county's non-hierarchical posterior mean (blue) and the hierarchical posterior mean (red). To show the effect of shrinkage on a single coefficient-pair (alpha and beta) we connect the blue and red points belonging to the same county by an arrow. Some non-hierarchical posteriors are so far out that we couldn't display them in this plot (it makes the axes too wide). Interestingly, all hierarchical posteriors of the floor-measure seem to be around -0.6 indicating that having a basement in almost all counties is a clear indicator for higher radon levels. The intercept (which we take for type of soil) appears to differ among counties. This information would have been difficult to find if we had only used the non-hierarchical model.
359365

360-
Critically, many effects that look quite large and significant in the non-hiearchical model actually turn out to be much smaller when we take the group distribution into account (this point can also well be seen in plot `In[12]` in [Chris' NB](http://nbviewer.ipython.org/github/fonnesbeck/multilevel_modeling/blob/master/multilevel_modeling.ipynb)). Shrinkage can thus be viewed as a form of smart regularization that helps reduce false-positives!
366+
Critically, many effects that look quite large and significant in the non-hiearchical model actually turn out to be much smaller when we take the group distribution into account (this point can also well be seen in plot `In[12]` in Chris Fonnesbeck's notebook {ref}`multilevel_modeling`. Shrinkage can thus be viewed as a form of smart regularization that helps reduce false-positives!
361367

362368
+++
363369

364370
### Connections to Frequentist statistics
365371

366-
This type of hierarchical, partial pooling model is known as a [random effects model](https://en.wikipedia.org/wiki/Random_effects_model) in frequentist terms. Interestingly, if we placed uniform priors on the group mean and variance in the above model, the resulting Bayesian model would be equivalent to a random effects model. One might imagine that the difference between a model with uniform or wide normal hyperpriors should not have a huge impact. However, [Gelman says](http://andrewgelman.com/2014/03/15/problematic-interpretations-confidence-intervals/) encourages use of weakly-informative priors (like we did above) over flat priors.
372+
This type of hierarchical, partial pooling model is known as a [random effects model](https://en.wikipedia.org/wiki/Random_effects_model) in frequentist terms. Interestingly, if we placed uniform priors on the group mean and variance in the above model, the resulting Bayesian model would be equivalent to a random effects model. One might imagine that the difference between a model with uniform or wide normal hyperpriors should not have a huge impact. However, [Gelman](http://andrewgelman.com/2014/03/15/problematic-interpretations-confidence-intervals/) encourages use of weakly-informative priors (like we did above) over flat priors.
367373

368374
+++
369375

370376
## Summary
371377

372-
In this post, co-authored by Danne Elbers, we showed how a multi-level hierarchical Bayesian model gives the best of both worlds when we have multiple sets of measurements we expect to have similarity. The naive approach either pools all data together and ignores the individual differences, or treats each set as completely separate leading to noisy estimates, as shown above. By assuming that each individual data set (each county in our case) is distributed according to a group distribution -- which we simultaneously estimate -- we benefit from increased statistical power and smart regularization via the shrinkage effect. Probabilistic Programming in [PyMC](https://github.com/pymc-devs/pymc) then makes Bayesian estimation of this model trivial.
378+
In this tutorial we showed how a multi-level hierarchical Bayesian model gives the best of both worlds when we have multiple sets of measurements we expect to have similarity. The naive approach either pools all data together and ignores the individual differences, or treats each set as completely separate leading to noisy estimates, as shown above. By assuming that each individual data set (each county in our case) is distributed according to a group distribution -- which we simultaneously estimate -- we benefit from increased statistical power and smart regularization via the shrinkage effect. Probabilistic Programming in [PyMC](https://github.com/pymc-devs/pymc) then makes Bayesian estimation of this model trivial.
373379

374-
As a follow-up we could also include other states into our model. For this we could add yet another layer to the hierarchy where each state is pooled at the country level. Finally, readers of my blog will notice that we didn't use `glm()` here as it does not play nice with hierarchical models yet.
380+
As a follow-up we could also include other states into our model. For this we could add yet another layer to the hierarchy where each state is pooled at the country level. Finally, readers of Thomas Wiecki's blog will notice that they didn't use `glm()` here as it does not play nice with hierarchical models yet.
381+
382+
+++
383+
384+
## Authors
385+
386+
* Adapted from the blog post in 2016 called ["The Best Of Both Worlds: Hierarchical Linear Regression in PyMC"](http://twiecki.github.io/blog/2014/03/17/bayesian-glms-3/) by Danne Elbers and Thomas Wiecki
387+
* Moved from pymc to pymc-examples repo in December 2020 ([pymc-examples#8](https://github.com/pymc-devs/pymc-examples/pull/8))
388+
* Updated by [@CloudChaoszero](CloudChaoszero) in April 2021 ([pymc-examples#147](https://github.com/pymc-devs/pymc-examples/pull/147))
389+
* Updated Markdown and styling by @reshamas in September 2022 ([pymc-examples#427](https://github.com/pymc-devs/pymc-examples/pull/427))
390+
391+
+++
375392

376393
## References
377-
* [The underlying Notebook of this blog post](https://rawgithub.com/twiecki/WhileMyMCMCGentlySamples/master/content/downloads/notebooks/GLM_hierarchical.ipynb)
394+
* Thomas Wiecki: [The underlying Notebook of this blog post](https://rawgithub.com/twiecki/WhileMyMCMCGentlySamples/master/content/downloads/notebooks/GLM_hierarchical.ipynb)
378395
* Blog post: [The Inference Button: Bayesian GLMs made easy with PyMC](http://twiecki.github.io/blog/2013/08/12/bayesian-glms-1/)
379396
* Blog post: [This world is far from Normal(ly distributed): Bayesian Robust Regression in PyMC](http://twiecki.github.io/blog/2013/08/27/bayesian-glms-2/)
380397
* [Chris Fonnesbeck repo containing a more extensive analysis](https://github.com/fonnesbeck/multilevel_modeling/)
381398
* Blog post: [Shrinkage in multi-level hierarchical models](http://doingbayesiandataanalysis.blogspot.com/2012/11/shrinkage-in-multi-level-hierarchical.html) by John Kruschke
382-
* Gelman, A.; Carlin; Stern; and Rubin, D., 2007, "Replication data for: Bayesian Data Analysis, Second Edition",
399+
* Gelman, A.; Carlin; Stern; and Rubin, D., 2007, "Replication data for: Bayesian Data Analysis, Second Edition"
383400
* Gelman, A., & Hill, J. (2006). [Data Analysis Using Regression and Multilevel/Hierarchical Models (1st ed.). Cambridge University Press.](http://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/dp/052168689X)
384401
* Gelman, A. (2006). Multilevel (Hierarchical) modeling: what it can and cannot do. Technometrics, 48(3), 432–435.
385-
386-
### Acknowledgements
387-
Thanks to [Imri Sofer](http://serre-lab.clps.brown.edu/person/imri-sofer/) for feedback and teaching us about the connections to random-effects models and [Dan Dillon](http://cdasr.mclean.harvard.edu/index.php/about-us/current-lab-members/14-faculty/62-daniel-dillon) for useful comments on an earlier draft.
402+
403+
+++
404+
405+
## Watermark
388406

389407
```{code-cell} ipython3
390408
%load_ext watermark
391-
%watermark -n -u -v -iv -w
409+
%watermark -n -u -v -iv -w -p aesara,aeppl,xarray
392410
```
411+
412+
:::{include} ../page_footer.md
413+
:::

0 commit comments

Comments
 (0)