Skip to content

Commit 192e33e

Browse files
committed
add example, more links, more references to papers
1 parent 4e4bab9 commit 192e33e

File tree

3 files changed

+59
-4
lines changed

3 files changed

+59
-4
lines changed

examples/causal_inference/difference_in_differences.ipynb

+21-2
Original file line numberDiff line numberDiff line change
@@ -52,17 +52,27 @@
5252
"\n",
5353
"This notebook provides a brief overview of the difference in differences approach to causal inference, and shows a working example of how to conduct this type of analysis under the Bayesian framework, using PyMC. While the notebooks provides a high level overview of the approach, I recommend consulting two excellent textbooks on causal inference. Both [The Effect](https://theeffectbook.net/) {cite:p}`huntington2021effect` and [Causal Inference: The Mixtape](https://mixtape.scunning.com) {cite:p}`cunningham2021causal` have chapters devoted to difference in differences.\n",
5454
"\n",
55-
"Difference in differences would be a good approach to take for causal inference if:\n",
55+
"Difference in differences](https://en.wikipedia.org/wiki/Difference_in_differences) would be a good approach to take for causal inference if:\n",
5656
"* you want to know the causal impact of a treatment/intervention\n",
5757
"* you have pre and post treatment measures\n",
5858
"* you have both a treatment and a control group\n",
59-
"* the treatment was _not_ allocated by randomisation.\n",
59+
"* the treatment was _not_ allocated by randomisation, that is, you are in a [quasi-experimental](https://en.wikipedia.org/wiki/Quasi-experiment) setting.\n",
6060
"\n",
6161
"Otherwise there are likely better suited approaches you could use.\n",
6262
"\n",
6363
"Note that our desire to estimate the causal impact of a treatment involves [counterfactual thinking](https://en.wikipedia.org/wiki/Counterfactual_thinking). This is because we are asking \"What would the post-treatment outcome of the treatment group be _if_ treatment had not been administered?\" but we can never observe this."
6464
]
6565
},
66+
{
67+
"cell_type": "markdown",
68+
"id": "6ec005f3-c443-4243-a4f5-c86252367fe8",
69+
"metadata": {},
70+
"source": [
71+
"### Example\n",
72+
"\n",
73+
"A classic example is given by a study by {cite:t}`card1993minimum`. This study examined the effects of increasing the minimum wage upon employment in the fast food sector. This is a quasi-experimental setting because the intervention (increase in minimum wages) was not applied to different geographical units (e.g. states) randomly. The intevention was applied to New Jersey in April 1992. If they measured pre and post intervention employment rates in New Jersey only, then they would have failed to control for omitted variables changing over time (e.g. seasonal effects) which could provide alternative causal explanations for changes in employment rates. But by selecting a control state (Pennsylvania), this allows one to infer that changes in employment in Pennsylvania would match the counterfactual - what _would have happened if_ New Jersey had not received the intervention?"
74+
]
75+
},
6676
{
6777
"cell_type": "markdown",
6878
"id": "54f5c8aa-2a4d-4b77-ba64-a0e9df729103",
@@ -1144,6 +1154,15 @@
11441154
"So there we have it, we have a full posterior distribution over our estimated causal impact using the difference in differences approach."
11451155
]
11461156
},
1157+
{
1158+
"cell_type": "markdown",
1159+
"id": "bf284262-ef3f-4cc1-af07-f20bb3c69ce3",
1160+
"metadata": {},
1161+
"source": [
1162+
"## Summary\n",
1163+
"Of course, when using the difference in differences approach for real applications, there is a lot more due diligence that's needed. Readers are encouraged to check out the textbooks listed above in the introduction as well as a useful review paper {cite:p}`wing2018designing` which covers the important contextual issues in more detail. Additionally, {cite:t}`bertrand2004much` takes a skeptical look at the approach as well as proposing solutions to some of the problems they highlight."
1164+
]
1165+
},
11471166
{
11481167
"cell_type": "markdown",
11491168
"id": "b3b2ee6b-2581-4ee5-a305-b9712dd49f09",

examples/references.bib

+25
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,29 @@ @book{berry1996statistics
3535
year = {1996},
3636
publisher = {Duxbury Press}
3737
}
38+
@article{bertrand2004much,
39+
title = {How much should we trust differences-in-differences estimates?},
40+
author = {Bertrand, Marianne and Duflo, Esther and Mullainathan, Sendhil},
41+
journal = {The Quarterly journal of economics},
42+
volume = {119},
43+
number = {1},
44+
pages = {249--275},
45+
year = {2004},
46+
publisher = {MIT Press}
47+
}
3848
@book{breen1996regression,
3949
title = {Regression models: Censored, sample selected, or truncated data},
4050
author = {Breen, Richard and others},
4151
volume = {111},
4252
year = {1996},
4353
publisher = {Sage}
4454
}
55+
@misc{card1993minimum,
56+
title = {Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania},
57+
author = {Card, David and Krueger, Alan B},
58+
year = {1993},
59+
publisher = {National Bureau of Economic Research Cambridge, Mass., USA}
60+
}
4561
@misc{carpenter2016hierarchical,
4662
title = {Hierarchical partial pooling for repeated binary trials},
4763
author = {Carpenter, Bob and Gabry, J and Goodrich, B},
@@ -493,6 +509,15 @@ @book{wilkinson2005grammar
493509
issn = {1431-8784},
494510
isbn = {978-0-387-24544-7}
495511
}
512+
@article{wing2018designing,
513+
title = {Designing difference in difference studies: best practices for public health policy research},
514+
author = {Wing, Coady and Simon, Kosali and Bello-Gomez, Ricardo A},
515+
journal = {Annu Rev Public Health},
516+
volume = {39},
517+
number = {1},
518+
pages = {453--469},
519+
year = {2018}
520+
}
496521
@article{Yao_2018,
497522
doi = {10.1214/17-ba1091},
498523
url = {https://doi.org/10.1214\%2F17-ba1091},

myst_nbs/causal_inference/difference_in_differences.myst.md

+13-2
Original file line numberDiff line numberDiff line change
@@ -40,18 +40,24 @@ az.style.use("arviz-darkgrid")
4040

4141
This notebook provides a brief overview of the difference in differences approach to causal inference, and shows a working example of how to conduct this type of analysis under the Bayesian framework, using PyMC. While the notebooks provides a high level overview of the approach, I recommend consulting two excellent textbooks on causal inference. Both [The Effect](https://theeffectbook.net/) {cite:p}`huntington2021effect` and [Causal Inference: The Mixtape](https://mixtape.scunning.com) {cite:p}`cunningham2021causal` have chapters devoted to difference in differences.
4242

43-
Difference in differences would be a good approach to take for causal inference if:
43+
Difference in differences](https://en.wikipedia.org/wiki/Difference_in_differences) would be a good approach to take for causal inference if:
4444
* you want to know the causal impact of a treatment/intervention
4545
* you have pre and post treatment measures
4646
* you have both a treatment and a control group
47-
* the treatment was _not_ allocated by randomisation.
47+
* the treatment was _not_ allocated by randomisation, that is, you are in a [quasi-experimental](https://en.wikipedia.org/wiki/Quasi-experiment) setting.
4848

4949
Otherwise there are likely better suited approaches you could use.
5050

5151
Note that our desire to estimate the causal impact of a treatment involves [counterfactual thinking](https://en.wikipedia.org/wiki/Counterfactual_thinking). This is because we are asking "What would the post-treatment outcome of the treatment group be _if_ treatment had not been administered?" but we can never observe this.
5252

5353
+++
5454

55+
### Example
56+
57+
A classic example is given by a study by {cite:t}`card1993minimum`. This study examined the effects of increasing the minimum wage upon employment in the fast food sector. This is a quasi-experimental setting because the intervention (increase in minimum wages) was not applied to different geographical units (e.g. states) randomly. The intevention was applied to New Jersey in April 1992. If they measured pre and post intervention employment rates in New Jersey only, then they would have failed to control for omitted variables changing over time (e.g. seasonal effects) which could provide alternative causal explanations for changes in employment rates. But by selecting a control state (Pennsylvania), this allows one to infer that changes in employment in Pennsylvania would match the counterfactual - what _would have happened if_ New Jersey had not received the intervention?
58+
59+
+++
60+
5561
### Causal DAG
5662

5763
The causal DAG for difference in differences is given below. It says:
@@ -423,6 +429,11 @@ So there we have it, we have a full posterior distribution over our estimated ca
423429

424430
+++
425431

432+
## Summary
433+
Of course, when using the difference in differences approach for real applications, there is a lot more due diligence that's needed. Readers are encouraged to check out the textbooks listed above in the introduction as well as a useful review paper {cite:p}`wing2018designing` which covers the important contextual issues in more detail. Additionally, {cite:t}`bertrand2004much` takes a skeptical look at the approach as well as proposing solutions to some of the problems they highlight.
434+
435+
+++
436+
426437
## References
427438

428439
:::{bibliography}

0 commit comments

Comments
 (0)