diff --git a/examples/case_studies/multilevel_modeling.ipynb b/examples/case_studies/multilevel_modeling.ipynb index f25fc3a88..fc1ec5ef6 100644 --- a/examples/case_studies/multilevel_modeling.ipynb +++ b/examples/case_studies/multilevel_modeling.ipynb @@ -7,7 +7,7 @@ "(multilevel_modeling)=\n", "# A Primer on Bayesian Methods for Multilevel Modeling\n", "\n", - ":::{post} 27 February, 2022\n", + ":::{post} 24 October, 2022\n", ":tags: hierarchical model, case study \n", ":category: intermediate\n", ":author: Chris Fonnesbeck, Colin Carroll, Alex Andorra, Oriol Abril, Farhan Reynaldo\n", @@ -18,9 +18,19 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Hierarchical or multilevel modeling is a generalization of regression modeling. *Multilevel models* are regression models in which the constituent model parameters are given **probability models**. This implies that model parameters are allowed to **vary by group**. Observational units are often naturally **clustered**. Clustering induces dependence between observations, despite random sampling of clusters and random sampling within clusters.\n", + "Hierarchical or multilevel modeling is a generalization of regression modeling.\n", "\n", - "A *hierarchical model* is a particular multilevel model where parameters are nested within one another. Some multilevel structures are not hierarchical -- e.g. \"country\" and \"year\" are not nested, but may represent separate, but overlapping, clusters of parameters. We will motivate this topic using an environmental epidemiology example." + "*Multilevel models* are regression models in which the constituent model parameters are given **probability models**. This implies that model parameters are allowed to **vary by group**.\n", + "\n", + "Observational units are often naturally **clustered**. Clustering induces dependence between observations, despite random sampling of clusters and random sampling within clusters.\n", + "\n", + "A *hierarchical model* is a particular multilevel model where parameters are nested within one another.\n", + "\n", + "Some multilevel structures are not hierarchical. \n", + "\n", + "* e.g. \"country\" and \"year\" are not nested, but may represent separate, but overlapping, clusters of parameters\n", + "\n", + "We will motivate this topic using an environmental epidemiology example." ] }, { @@ -47,7 +57,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Data organization" + "### Data organization" ] }, { @@ -66,12 +76,13 @@ "name": "stdout", "output_type": "stream", "text": [ - "Running on PyMC v4.0.0b3\n" + "Running on PyMC v4.3.0\n" ] } ], "source": [ "import os\n", + "import warnings\n", "\n", "import aesara.tensor as at\n", "import arviz as az\n", @@ -79,8 +90,11 @@ "import numpy as np\n", "import pandas as pd\n", "import pymc as pm\n", + "import seaborn as sns\n", "import xarray as xr\n", "\n", + "warnings.filterwarnings(\"ignore\", module=\"scipy\")\n", + "\n", "print(f\"Running on PyMC v{pm.__version__}\")" ] }, @@ -94,13 +108,19 @@ "az.style.use(\"arviz-darkgrid\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The original data exists as several independent datasets, which we will import, merge, and process here. First is the data on measurements from individual homes from across the United States. We will extract just the subset from Minnesota." + ] + }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ - "# Import radon data\n", "try:\n", " srrs2 = pd.read_csv(os.path.join(\"..\", \"data\", \"srrs2.dat\"))\n", "except FileNotFoundError:\n", @@ -123,8 +143,12 @@ "metadata": {}, "outputs": [], "source": [ + "try:\n", + " cty = pd.read_csv(os.path.join(\"..\", \"data\", \"cty.dat\"))\n", + "except FileNotFoundError:\n", + " cty = pd.read_csv(pm.get_data(\"cty.dat\"))\n", + "\n", "srrs_mn[\"fips\"] = srrs_mn.stfips * 1000 + srrs_mn.cntyfips\n", - "cty = pd.read_csv(pm.get_data(\"cty.dat\"))\n", "cty_mn = cty[cty.st == \"MN\"].copy()\n", "cty_mn[\"fips\"] = 1000 * cty_mn.stfips + cty_mn.ctfips" ] @@ -149,251 +173,26 @@ "n = len(srrs_mn)" ] }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - " | idnum | \n", - "state | \n", - "state2 | \n", - "stfips | \n", - "zip | \n", - "region | \n", - "typebldg | \n", - "floor | \n", - "room | \n", - "basement | \n", - "... | \n", - "stopdt | \n", - "activity | \n", - "pcterr | \n", - "adjwt | \n", - "dupflag | \n", - "zipflag | \n", - "cntyfips | \n", - "county | \n", - "fips | \n", - "Uppm | \n", - "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", - "5081 | \n", - "MN | \n", - "MN | \n", - "27 | \n", - "55735 | \n", - "5 | \n", - "1 | \n", - "1 | \n", - "3 | \n", - "N | \n", - "... | \n", - "12288 | \n", - "2.2 | \n", - "9.7 | \n", - "1146.499190 | \n", - "1 | \n", - "0 | \n", - "1 | \n", - "AITKIN | \n", - "27001 | \n", - "0.502054 | \n", - "
1 | \n", - "5082 | \n", - "MN | \n", - "MN | \n", - "27 | \n", - "55748 | \n", - "5 | \n", - "1 | \n", - "0 | \n", - "4 | \n", - "Y | \n", - "... | \n", - "12088 | \n", - "2.2 | \n", - "14.5 | \n", - "471.366223 | \n", - "0 | \n", - "0 | \n", - "1 | \n", - "AITKIN | \n", - "27001 | \n", - "0.502054 | \n", - "
2 | \n", - "5083 | \n", - "MN | \n", - "MN | \n", - "27 | \n", - "55748 | \n", - "5 | \n", - "1 | \n", - "0 | \n", - "4 | \n", - "Y | \n", - "... | \n", - "21188 | \n", - "2.9 | \n", - "9.6 | \n", - "433.316718 | \n", - "0 | \n", - "0 | \n", - "1 | \n", - "AITKIN | \n", - "27001 | \n", - "0.502054 | \n", - "
3 | \n", - "5084 | \n", - "MN | \n", - "MN | \n", - "27 | \n", - "56469 | \n", - "5 | \n", - "1 | \n", - "0 | \n", - "4 | \n", - "Y | \n", - "... | \n", - "123187 | \n", - "1.0 | \n", - "24.3 | \n", - "461.623670 | \n", - "0 | \n", - "0 | \n", - "1 | \n", - "AITKIN | \n", - "27001 | \n", - "0.502054 | \n", - "
4 | \n", - "5085 | \n", - "MN | \n", - "MN | \n", - "27 | \n", - "55011 | \n", - "3 | \n", - "1 | \n", - "0 | \n", - "4 | \n", - "Y | \n", - "... | \n", - "13088 | \n", - "3.1 | \n", - "13.8 | \n", - "433.316718 | \n", - "0 | \n", - "0 | \n", - "3 | \n", - "ANOKA | \n", - "27003 | \n", - "0.428565 | \n", - "
5 rows × 27 columns
\n", - "