Skip to content

Commit c142ef4

Browse files
authored
add myst notebook representation of examples (#283)
* add myst notebook representation of examples * fix pre commit hook to work on notebooks onlyd * fix incorrect mention to wiki
1 parent 281a426 commit c142ef4

File tree

163 files changed

+152047
-445
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+152047
-445
lines changed

.jupytext.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[formats]
2+
"examples/" = "ipynb"
3+
"myst_nbs/" = ".myst.md:myst"

.pre-commit-config.yaml

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
repos:
2+
- repo: https://github.com/mwouts/jupytext
3+
rev: v1.13.7
4+
hooks:
5+
- id: jupytext
6+
files: ^examples/.+\.ipynb$
7+
args: ["--sync"]
28
- repo: https://github.com/psf/black
3-
rev: 21.8b0
9+
rev: 22.1.0
410
hooks:
511
- id: black-jupyter
612
- repo: https://github.com/nbQA-dev/nbQA
@@ -48,7 +54,7 @@ repos:
4854
entry: '%load_ext watermark.*%watermark -n -u -v -iv -w'
4955
language: pygrep
5056
minimum_pre_commit_version: 2.8.0
51-
name: Check notebooks have watermark (see Jupyter style guide from PyMC3 Wiki)
57+
name: Check notebooks have watermark (see Jupyter style guide from PyMC docs)
5258
types: [jupyter]
5359
- id: add-tags
5460
entry: python scripts/add_tags.py
@@ -59,3 +65,9 @@ repos:
5965
- nbqa==1.1.1
6066
- beautifulsoup4==4.9.3
6167
- myst_parser==0.13.7
68+
- repo: https://github.com/mwouts/jupytext
69+
rev: v1.13.7
70+
hooks:
71+
- id: jupytext
72+
files: ^examples/.+\.ipynb$
73+
args: ["--sync"]

examples/case_studies/BEST.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -232,8 +232,8 @@
232232
"outputs": [],
233233
"source": [
234234
"with model:\n",
235-
" lambda_1 = group1_std ** -2\n",
236-
" lambda_2 = group2_std ** -2\n",
235+
" lambda_1 = group1_std**-2\n",
236+
" lambda_2 = group2_std**-2\n",
237237
" group1 = pm.StudentT(\"drug\", nu=nu, mu=group1_mean, lam=lambda_1, observed=iq_drug)\n",
238238
" group2 = pm.StudentT(\"placebo\", nu=nu, mu=group2_mean, lam=lambda_2, observed=iq_placebo)"
239239
]
@@ -255,7 +255,7 @@
255255
" diff_of_means = pm.Deterministic(\"difference of means\", group1_mean - group2_mean)\n",
256256
" diff_of_stds = pm.Deterministic(\"difference of stds\", group1_std - group2_std)\n",
257257
" effect_size = pm.Deterministic(\n",
258-
" \"effect size\", diff_of_means / np.sqrt((group1_std ** 2 + group2_std ** 2) / 2)\n",
258+
" \"effect size\", diff_of_means / np.sqrt((group1_std**2 + group2_std**2) / 2)\n",
259259
" )"
260260
]
261261
},

examples/case_studies/bayesian_ab_testing.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -813,7 +813,7 @@
813813
"id": "c871fb6e",
814814
"metadata": {},
815815
"source": [
816-
"### Generalising to multi-variant tests "
816+
"### Generalising to multi-variant tests"
817817
]
818818
},
819819
{

examples/case_studies/binning.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3267,7 +3267,7 @@
32673267
"metadata": {},
32683268
"source": [
32693269
"## Authors\n",
3270-
"* Authored by [Eric Ma](https://github.com/ericmjl) and [Benjamin T. Vincent](https://github.com/drbenvincent) in September, 2021 ([pymc-examples#229](https://github.com/pymc-devs/pymc-examples/pull/229))\n"
3270+
"* Authored by [Eric Ma](https://github.com/ericmjl) and [Benjamin T. Vincent](https://github.com/drbenvincent) in September, 2021 ([pymc-examples#229](https://github.com/pymc-devs/pymc-examples/pull/229))"
32713271
]
32723272
},
32733273
{

examples/case_studies/blackbox_external_likelihood_numpy.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@
118118
"\n",
119119
"def my_loglike(theta, x, data, sigma):\n",
120120
" model = my_model(theta, x)\n",
121-
" return -(0.5 / sigma ** 2) * np.sum((data - model) ** 2)"
121+
" return -(0.5 / sigma**2) * np.sum((data - model) ** 2)"
122122
]
123123
},
124124
{

examples/case_studies/conditional-autoregressive-model.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1503,7 +1503,7 @@
15031503
"source": [
15041504
"`theano.scan` is much faster than using a python for loop, but it is still quite slow. One approach for improving it is to use linear algebra. That is, we should try to find a way to use matrix multiplication instead of looping (if you have experience in using MATLAB, it is the same philosophy). In our case, we can totally do that. \n",
15051505
"\n",
1506-
"For a similar problem, you can also have a look of [my port of Lee and Wagenmakers' book](https://github.com/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3). For example, in Chapter 19, the Stan code use [a for loop to generate the likelihood function](https://github.com/stan-dev/example-models/blob/master/Bayesian_Cognitive_Modeling/CaseStudies/NumberConcepts/NumberConcept_1_Stan.R#L28-L59), and I [generate the matrix outside and use matrix multiplication etc](http://nbviewer.jupyter.org/github/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3/blob/master/CaseStudies/NumberConceptDevelopment.ipynb#19.1-Knower-level-model-for-Give-N) to archive the same purpose. "
1506+
"For a similar problem, you can also have a look of [my port of Lee and Wagenmakers' book](https://github.com/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3). For example, in Chapter 19, the Stan code use [a for loop to generate the likelihood function](https://github.com/stan-dev/example-models/blob/master/Bayesian_Cognitive_Modeling/CaseStudies/NumberConcepts/NumberConcept_1_Stan.R#L28-L59), and I [generate the matrix outside and use matrix multiplication etc](http://nbviewer.jupyter.org/github/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3/blob/master/CaseStudies/NumberConceptDevelopment.ipynb#19.1-Knower-level-model-for-Give-N) to archive the same purpose."
15071507
]
15081508
},
15091509
{
@@ -3286,7 +3286,7 @@
32863286
"cell_type": "markdown",
32873287
"metadata": {},
32883288
"source": [
3289-
"As you can see above, the sparse representation returns the same estimates, while being much faster than any other implementation. "
3289+
"As you can see above, the sparse representation returns the same estimates, while being much faster than any other implementation."
32903290
]
32913291
},
32923292
{

examples/case_studies/factor_analysis.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -608,7 +608,7 @@
608608
"cell_type": "markdown",
609609
"metadata": {},
610610
"source": [
611-
"* This notebook was written by [chartl](https://github.com/chartl) on May 6, 2019 and updated by [Christopher Krapu](https://github.com/ckrapu) on April 4, 2021. "
611+
"* This notebook was written by [chartl](https://github.com/chartl) on May 6, 2019 and updated by [Christopher Krapu](https://github.com/ckrapu) on April 4, 2021."
612612
]
613613
},
614614
{

examples/case_studies/hierarchical_partial_pooling.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
"\n",
4444
"The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* informatino) will be automatically accounted for, by shrinking those estimates closer to the global mean.\n",
4545
"\n",
46-
"For far more in-depth discussion please refer to Stan [tutorial](http://mc-stan.org/documentation/case-studies/pool-binary-trials.html) {cite:p}`carpenter2016hierarchical` on the subject. The model and parameter values were taken from that example.\n"
46+
"For far more in-depth discussion please refer to Stan [tutorial](http://mc-stan.org/documentation/case-studies/pool-binary-trials.html) {cite:p}`carpenter2016hierarchical` on the subject. The model and parameter values were taken from that example."
4747
]
4848
},
4949
{
@@ -588,7 +588,7 @@
588588
"\n",
589589
":::{bibliography}\n",
590590
":filter: docname in docnames\n",
591-
":::\n"
591+
":::"
592592
]
593593
}
594594
],

examples/case_studies/item_response_nba.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -812,7 +812,7 @@
812812
"- y: the difference between the raw mean probability (from the data) and the posterior mean probability for each disadvantaged and committing player\n",
813813
"- x: as a function of the number of observations per disadvantaged and committing player.\n",
814814
"\n",
815-
"These plots show, as expected, that the hierarchical structure of our model tends to estimate posteriors towards the global mean for players with a low amount of observations. "
815+
"These plots show, as expected, that the hierarchical structure of our model tends to estimate posteriors towards the global mean for players with a low amount of observations."
816816
]
817817
},
818818
{
@@ -1015,7 +1015,7 @@
10151015
"source": [
10161016
"### Discovering extra hierarchical structure\n",
10171017
"\n",
1018-
"A natural question to ask is whether players skilled as disadvantaged players (i.e. players with high `θ`) are also likely to be skilled as committing players (i.e. with high `b`), and the other way around. So, the next two plots show the `θ` (resp. `b`) score for the top players with respect to `b` ( resp.`θ`). "
1018+
"A natural question to ask is whether players skilled as disadvantaged players (i.e. players with high `θ`) are also likely to be skilled as committing players (i.e. with high `b`), and the other way around. So, the next two plots show the `θ` (resp. `b`) score for the top players with respect to `b` ( resp.`θ`)."
10191019
]
10201020
},
10211021
{
@@ -1098,7 +1098,7 @@
10981098
"metadata": {},
10991099
"source": [
11001100
"These plots suggest that scoring high in `θ` does not correlate with high or low scores in `b`. Moreover, with a little knowledge of NBA basketball, one can visually note that a higher score in `b` is expected from players playing center or forward rather than guards or point guards. \n",
1101-
"Given the last observation, we decide to plot a histogram for the occurence of different positions for top disadvantaged (`θ`) and committing (`b`) players. Interestingly, we see below that the largest share of best disadvantaged players are guards, meanwhile, the largest share of best committing players are centers (and at the same time a very small share of guards). "
1101+
"Given the last observation, we decide to plot a histogram for the occurence of different positions for top disadvantaged (`θ`) and committing (`b`) players. Interestingly, we see below that the largest share of best disadvantaged players are guards, meanwhile, the largest share of best committing players are centers (and at the same time a very small share of guards)."
11021102
]
11031103
},
11041104
{

examples/case_studies/log-gaussian-cox-process.ipynb

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
"* What would randomly sampled patterns with the same statistical properties look like?\n",
3030
"* Is there a statistical correlation between the *frequency* and *magnitude* of point events?\n",
3131
"\n",
32-
"In this notebook, we'll use a grid-based approximation to the full LGCP with PyMC3 to fit a model and analyze its posterior summaries. We will also explore the usage of a marked Poisson process, an extension of this model to account for the distribution of *marks* associated with each data point.\n"
32+
"In this notebook, we'll use a grid-based approximation to the full LGCP with PyMC3 to fit a model and analyze its posterior summaries. We will also explore the usage of a marked Poisson process, an extension of this model to account for the distribution of *marks* associated with each data point."
3333
]
3434
},
3535
{
@@ -43,7 +43,7 @@
4343
"cell_type": "markdown",
4444
"metadata": {},
4545
"source": [
46-
"Our observational data concerns 231 sea anemones whose sizes and locations on the French coast were recorded. This data was taken from the [`spatstat` spatial modeling package in R](https://github.com/spatstat/spatstat) which is designed to address models like the LGCP and its subsequent refinements. The original source of this data is the textbook *Spatial data analysis by example* by Upton and Fingleton (1985) and a longer description of the data can be found there. \n"
46+
"Our observational data concerns 231 sea anemones whose sizes and locations on the French coast were recorded. This data was taken from the [`spatstat` spatial modeling package in R](https://github.com/spatstat/spatstat) which is designed to address models like the LGCP and its subsequent refinements. The original source of this data is the textbook *Spatial data analysis by example* by Upton and Fingleton (1985) and a longer description of the data can be found there."
4747
]
4848
},
4949
{
@@ -234,7 +234,7 @@
234234
"\n",
235235
"# Rescaling the unit of area so that our parameter estimates\n",
236236
"# are easier to read\n",
237-
"area_per_cell = resolution ** 2 / 100\n",
237+
"area_per_cell = resolution**2 / 100\n",
238238
"\n",
239239
"cells_x = int(280 / resolution)\n",
240240
"cells_y = int(180 / resolution)\n",
@@ -311,7 +311,7 @@
311311
"cell_type": "markdown",
312312
"metadata": {},
313313
"source": [
314-
"Our first step is to place prior distributions over the high-level parameters for the Gaussian process. This includes the length scale $\\rho$ for the covariance function and a constant mean $\\mu$ for the GP. "
314+
"Our first step is to place prior distributions over the high-level parameters for the Gaussian process. This includes the length scale $\\rho$ for the covariance function and a constant mean $\\mu$ for the GP."
315315
]
316316
},
317317
{
@@ -638,7 +638,7 @@
638638
"cell_type": "markdown",
639639
"metadata": {},
640640
"source": [
641-
"While there is some heterogeneity in the patterns these surfaces show, we obtain a posterior mean surface with a very clearly defined spatial surface with higher intensity in the upper right and lower intensity in the lower left.\n"
641+
"While there is some heterogeneity in the patterns these surfaces show, we obtain a posterior mean surface with a very clearly defined spatial surface with higher intensity in the upper right and lower intensity in the lower left."
642642
]
643643
},
644644
{
@@ -787,7 +787,7 @@
787787
"Equivalently, $$z_i \\sim N(\\alpha + \\beta \\lambda_i, \\sigma_\\epsilon^2)$$\n",
788788
"where $\\sigma_\\epsilon^2 = Var(\\epsilon_i)$.\n",
789789
"\n",
790-
"This equation states that the distribution of the marks is a linear function of the intensity field since we assume a normal likelihood for $\\epsilon$. It's essentially a simple linear regression of the marks on the intensity field; $\\alpha$ is the intercept and $\\beta$ is the slope. Then, standard priors are used for $\\epsilon, \\alpha, \\beta$. The point of this model is to figure out whether or not the growth of the anemones is correlated with their occurrence. If we find that $\\beta$ is negative, then that might hint that locations with more numerous anemones happen to also have smaller anemones and that competition for food may keep close neighbors small. "
790+
"This equation states that the distribution of the marks is a linear function of the intensity field since we assume a normal likelihood for $\\epsilon$. It's essentially a simple linear regression of the marks on the intensity field; $\\alpha$ is the intercept and $\\beta$ is the slope. Then, standard priors are used for $\\epsilon, \\alpha, \\beta$. The point of this model is to figure out whether or not the growth of the anemones is correlated with their occurrence. If we find that $\\beta$ is negative, then that might hint that locations with more numerous anemones happen to also have smaller anemones and that competition for food may keep close neighbors small."
791791
]
792792
},
793793
{
@@ -980,7 +980,7 @@
980980
"cell_type": "markdown",
981981
"metadata": {},
982982
"source": [
983-
"* This notebook was written by [Christopher Krapu](https://github.com/ckrapu) on September 6, 2020 and updated on April 1, 2021. "
983+
"* This notebook was written by [Christopher Krapu](https://github.com/ckrapu) on September 6, 2020 and updated on April 1, 2021."
984984
]
985985
},
986986
{

examples/case_studies/mediation_analysis.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@
6767
"Using definitions from Hayes (2018), we can define a few effects of interest:\n",
6868
"- **Direct effect:** is given by $c'$. Two cases that differ by one unit on $x$ but are equal on $m$ are estimated to differ by $c'$ units on $y$.\n",
6969
"- **Indirect effect:** is given by $a \\cdot b$. Two cases which differ by one unit of $x$ are estimated to differ by $a \\cdot b$ units on $y$ as a result of the effect of $x \\rightarrow m$ and $m \\rightarrow y$.\n",
70-
"- **Total effect:** is $c = c' + a \\cdot b$ which is simply the sum of the direct and indirect effects. This could be understood as: two cases that differ by one unit on $x$ are estimated to differ by $a \\cdot b$ units on $y$ due to both the direct pathway $x \\rightarrow y$ and the indirect pathway $c \\rightarrow m \\rightarrow m$. The total effect could also be estimated by evaluating the alternative model $y_i \\sim \\mathrm{Normal}(i_{Y*} + c \\cdot x_i, \\sigma_{Y*})$. "
70+
"- **Total effect:** is $c = c' + a \\cdot b$ which is simply the sum of the direct and indirect effects. This could be understood as: two cases that differ by one unit on $x$ are estimated to differ by $a \\cdot b$ units on $y$ due to both the direct pathway $x \\rightarrow y$ and the indirect pathway $c \\rightarrow m \\rightarrow m$. The total effect could also be estimated by evaluating the alternative model $y_i \\sim \\mathrm{Normal}(i_{Y*} + c \\cdot x_i, \\sigma_{Y*})$."
7171
]
7272
},
7373
{

0 commit comments

Comments
 (0)