add example, more links, more references to papers

drbenvincent · drbenvincent · commit 192e33e4e032 · 2022-09-27T17:20:17.000+01:00
diff --git a/examples/causal_inference/difference_in_differences.ipynb b/examples/causal_inference/difference_in_differences.ipynb
@@ -52,17 +52,27 @@
     "\n",
     "This notebook provides a brief overview of the difference in differences approach to causal inference, and shows a working example of how to conduct this type of analysis under the Bayesian framework, using PyMC. While the notebooks provides a high level overview of the approach, I recommend consulting two excellent textbooks on causal inference. Both [The Effect](https://theeffectbook.net/) {cite:p}`huntington2021effect` and [Causal Inference: The Mixtape](https://mixtape.scunning.com) {cite:p}`cunningham2021causal` have chapters devoted to difference in differences.\n",
     "\n",
-    "Difference in differences would be a good approach to take for causal inference if:\n",
+    "Difference in differences](https://en.wikipedia.org/wiki/Difference_in_differences) would be a good approach to take for causal inference if:\n",
     "* you want to know the causal impact of a treatment/intervention\n",
     "* you have pre and post treatment measures\n",
     "* you have both a treatment and a control group\n",
-    "* the treatment was _not_ allocated by randomisation.\n",
+    "* the treatment was _not_ allocated by randomisation, that is, you are in a [quasi-experimental](https://en.wikipedia.org/wiki/Quasi-experiment) setting.\n",
     "\n",
     "Otherwise there are likely better suited approaches you could use.\n",
     "\n",
     "Note that our desire to estimate the causal impact of a treatment involves [counterfactual thinking](https://en.wikipedia.org/wiki/Counterfactual_thinking). This is because we are asking \"What would the post-treatment outcome of the treatment group be _if_ treatment had not been administered?\" but we can never observe this."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "6ec005f3-c443-4243-a4f5-c86252367fe8",
+   "metadata": {},
+   "source": [
+    "### Example\n",
+    "\n",
+    "A classic example is given by a study by {cite:t}`card1993minimum`. This study examined the effects of increasing the minimum wage upon employment in the fast food sector. This is a quasi-experimental setting because the intervention (increase in minimum wages) was not applied to different geographical units (e.g. states) randomly. The intevention was applied to New Jersey in April 1992. If they measured pre and post intervention employment rates in New Jersey only, then they would have failed to control for omitted variables changing over time (e.g. seasonal effects) which could provide alternative causal explanations for changes in employment rates. But by selecting a control state (Pennsylvania), this allows one to infer that changes in employment in Pennsylvania would match the counterfactual - what _would have happened if_ New Jersey had not received the intervention?"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "54f5c8aa-2a4d-4b77-ba64-a0e9df729103",
@@ -1144,6 +1154,15 @@
     "So there we have it, we have a full posterior distribution over our estimated causal impact using the difference in differences approach."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "bf284262-ef3f-4cc1-af07-f20bb3c69ce3",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "Of course, when using the difference in differences approach for real applications, there is a lot more due diligence that's needed. Readers are encouraged to check out the textbooks listed above in the introduction as well as a useful review paper {cite:p}`wing2018designing` which covers the important contextual issues in more detail. Additionally, {cite:t}`bertrand2004much` takes a skeptical look at the approach as well as proposing solutions to some of the problems they highlight."
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "b3b2ee6b-2581-4ee5-a305-b9712dd49f09",
diff --git a/examples/references.bib b/examples/references.bib
@@ -35,13 +35,29 @@ @book{berry1996statistics
   year          = {1996},
   publisher     = {Duxbury Press}
 }
+@article{bertrand2004much,
+  title         = {How much should we trust differences-in-differences estimates?},
+  author        = {Bertrand, Marianne and Duflo, Esther and Mullainathan, Sendhil},
+  journal       = {The Quarterly journal of economics},
+  volume        = {119},
+  number        = {1},
+  pages         = {249--275},
+  year          = {2004},
+  publisher     = {MIT Press}
+}
 @book{breen1996regression,
   title         = {Regression models: Censored, sample selected, or truncated data},
   author        = {Breen, Richard and others},
   volume        = {111},
   year          = {1996},
   publisher     = {Sage}
 }
+@misc{card1993minimum,
+  title         = {Minimum wages and employment: A case study of the fast food industry in New Jersey and Pennsylvania},
+  author        = {Card, David and Krueger, Alan B},
+  year          = {1993},
+  publisher     = {National Bureau of Economic Research Cambridge, Mass., USA}
+}
 @misc{carpenter2016hierarchical,
   title         = {Hierarchical partial pooling for repeated binary trials},
   author        = {Carpenter, Bob and Gabry, J and Goodrich, B},
@@ -493,6 +509,15 @@ @book{wilkinson2005grammar
   issn          = {1431-8784},
   isbn          = {978-0-387-24544-7}
 }
+@article{wing2018designing,
+  title         = {Designing difference in difference studies: best practices for public health policy research},
+  author        = {Wing, Coady and Simon, Kosali and Bello-Gomez, Ricardo A},
+  journal       = {Annu Rev Public Health},
+  volume        = {39},
+  number        = {1},
+  pages         = {453--469},
+  year          = {2018}
+}
 @article{Yao_2018,
   doi           = {10.1214/17-ba1091},
   url           = {https://doi.org/10.1214\%2F17-ba1091},
diff --git a/myst_nbs/causal_inference/difference_in_differences.myst.md b/myst_nbs/causal_inference/difference_in_differences.myst.md
@@ -40,18 +40,24 @@ az.style.use("arviz-darkgrid")
 
 This notebook provides a brief overview of the difference in differences approach to causal inference, and shows a working example of how to conduct this type of analysis under the Bayesian framework, using PyMC. While the notebooks provides a high level overview of the approach, I recommend consulting two excellent textbooks on causal inference. Both [The Effect](https://theeffectbook.net/) {cite:p}`huntington2021effect` and [Causal Inference: The Mixtape](https://mixtape.scunning.com) {cite:p}`cunningham2021causal` have chapters devoted to difference in differences.
 
-Difference in differences would be a good approach to take for causal inference if:
+Difference in differences](https://en.wikipedia.org/wiki/Difference_in_differences) would be a good approach to take for causal inference if:
 * you want to know the causal impact of a treatment/intervention
 * you have pre and post treatment measures
 * you have both a treatment and a control group
-* the treatment was _not_ allocated by randomisation.
+* the treatment was _not_ allocated by randomisation, that is, you are in a [quasi-experimental](https://en.wikipedia.org/wiki/Quasi-experiment) setting.
 
 Otherwise there are likely better suited approaches you could use.
 
 Note that our desire to estimate the causal impact of a treatment involves [counterfactual thinking](https://en.wikipedia.org/wiki/Counterfactual_thinking). This is because we are asking "What would the post-treatment outcome of the treatment group be _if_ treatment had not been administered?" but we can never observe this.
 
 +++
 
+### Example
+
+A classic example is given by a study by {cite:t}`card1993minimum`. This study examined the effects of increasing the minimum wage upon employment in the fast food sector. This is a quasi-experimental setting because the intervention (increase in minimum wages) was not applied to different geographical units (e.g. states) randomly. The intevention was applied to New Jersey in April 1992. If they measured pre and post intervention employment rates in New Jersey only, then they would have failed to control for omitted variables changing over time (e.g. seasonal effects) which could provide alternative causal explanations for changes in employment rates. But by selecting a control state (Pennsylvania), this allows one to infer that changes in employment in Pennsylvania would match the counterfactual - what _would have happened if_ New Jersey had not received the intervention?
+
++++
+
 ### Causal DAG
 
 The causal DAG for difference in differences is given below. It says:
@@ -423,6 +429,11 @@ So there we have it, we have a full posterior distribution over our estimated ca
 
 +++
 
+## Summary
+Of course, when using the difference in differences approach for real applications, there is a lot more due diligence that's needed. Readers are encouraged to check out the textbooks listed above in the introduction as well as a useful review paper {cite:p}`wing2018designing` which covers the important contextual issues in more detail. Additionally, {cite:t}`bertrand2004much` takes a skeptical look at the approach as well as proposing solutions to some of the problems they highlight.
+
++++
+
 ## References
 
 :::{bibliography}