Skip to content

Commit a7435fb

Browse files
authored
Spellcheck all notebooks (#492)
1 parent 2c721e6 commit a7435fb

26 files changed

+36
-36
lines changed

examples/case_studies/BART_introduction.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@
279279
"id": "8633b7b4",
280280
"metadata": {},
281281
"source": [
282-
"The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workins of BART."
282+
"The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workings of BART."
283283
]
284284
},
285285
{

examples/case_studies/BART_introduction.myst.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ The following figure shows two samples of $\mu$ from the posterior.
117117
plt.step(x_data, idata_coal.posterior["μ"].sel(chain=0, draw=[3, 10]).T);
118118
```
119119

120-
The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workins of BART.
120+
The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workings of BART.
121121

122122
```{code-cell} ipython3
123123
bart_trees = μ_.owner.op.all_trees

examples/case_studies/binning.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -758,7 +758,7 @@
758758
"id": "71c3cf64",
759759
"metadata": {},
760760
"source": [
761-
"Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overal posterior average with `.mean(dim=[\"draw\", \"chain\"])`."
761+
"Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overall posterior average with `.mean(dim=[\"draw\", \"chain\"])`."
762762
]
763763
},
764764
{

examples/case_studies/binning.myst.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ Recall that we used `mu = -2` and `sigma = 2` to generate the data.
299299
az.plot_posterior(trace1, var_names=["mu", "sigma"], ref_val=[true_mu, true_sigma]);
300300
```
301301

302-
Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overal posterior average with `.mean(dim=["draw", "chain"])`.
302+
Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overall posterior average with `.mean(dim=["draw", "chain"])`.
303303

304304
```{code-cell} ipython3
305305
:tags: []

examples/case_studies/conditional-autoregressive-model.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -286,11 +286,11 @@
286286
" self.mode = 0.0\n",
287287
"\n",
288288
" def get_mu(self, x):\n",
289-
" def weigth_mu(w, a):\n",
289+
" def weight_mu(w, a):\n",
290290
" a1 = tt.cast(a, \"int32\")\n",
291291
" return tt.sum(w * x[a1]) / tt.sum(w)\n",
292292
"\n",
293-
" mu_w, _ = scan(fn=weigth_mu, sequences=[self.w, self.a])\n",
293+
" mu_w, _ = scan(fn=weight_mu, sequences=[self.w, self.a])\n",
294294
"\n",
295295
" return mu_w\n",
296296
"\n",

examples/case_studies/conditional-autoregressive-model.myst.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -220,11 +220,11 @@ class CAR(distribution.Continuous):
220220
self.mode = 0.0
221221
222222
def get_mu(self, x):
223-
def weigth_mu(w, a):
223+
def weight_mu(w, a):
224224
a1 = tt.cast(a, "int32")
225225
return tt.sum(w * x[a1]) / tt.sum(w)
226226
227-
mu_w, _ = scan(fn=weigth_mu, sequences=[self.w, self.a])
227+
mu_w, _ = scan(fn=weight_mu, sequences=[self.w, self.a])
228228
229229
return mu_w
230230

examples/case_studies/hierarchical_partial_pooling.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@
4242
"\n",
4343
"It may be possible to cluster groups of \"similar\" players, and estimate group averages, but using a hierarchical modeling approach is a natural way of sharing information that does not involve identifying *ad hoc* clusters.\n",
4444
"\n",
45-
"The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* informatino) will be automatically accounted for, by shrinking those estimates closer to the global mean.\n",
45+
"The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* information) will be automatically accounted for, by shrinking those estimates closer to the global mean.\n",
4646
"\n",
4747
"For far more in-depth discussion please refer to Stan [tutorial](http://mc-stan.org/documentation/case-studies/pool-binary-trials.html) {cite:p}`carpenter2016hierarchical` on the subject. The model and parameter values were taken from that example."
4848
]

examples/case_studies/hierarchical_partial_pooling.myst.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Of course, neither approach is realistic. Clearly, all players aren't equally sk
4545

4646
It may be possible to cluster groups of "similar" players, and estimate group averages, but using a hierarchical modeling approach is a natural way of sharing information that does not involve identifying *ad hoc* clusters.
4747

48-
The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* informatino) will be automatically accounted for, by shrinking those estimates closer to the global mean.
48+
The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* information) will be automatically accounted for, by shrinking those estimates closer to the global mean.
4949

5050
For far more in-depth discussion please refer to Stan [tutorial](http://mc-stan.org/documentation/case-studies/pool-binary-trials.html) {cite:p}`carpenter2016hierarchical` on the subject. The model and parameter values were taken from that example.
5151

examples/case_studies/item_response_nba.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@
248248
"output_type": "stream",
249249
"text": [
250250
"Number of observed plays: 46861\n",
251-
"Number of disadvanteged players: 770\n",
251+
"Number of disadvantaged players: 770\n",
252252
"Number of committing players: 789\n",
253253
"Global probability of a foul being called: 23.3%\n",
254254
"\n",
@@ -378,7 +378,7 @@
378378
"\n",
379379
"# Display of main dataframe with some statistics\n",
380380
"print(f\"Number of observed plays: {len(df)}\")\n",
381-
"print(f\"Number of disadvanteged players: {len(disadvantaged)}\")\n",
381+
"print(f\"Number of disadvantaged players: {len(disadvantaged)}\")\n",
382382
"print(f\"Number of committing players: {len(committing)}\")\n",
383383
"print(f\"Global probability of a foul being called: \" f\"{100*round(df.foul_called.mean(),3)}%\\n\\n\")\n",
384384
"df.head()"

examples/case_studies/item_response_nba.myst.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ df.index.name = "play_id"
140140
141141
# Display of main dataframe with some statistics
142142
print(f"Number of observed plays: {len(df)}")
143-
print(f"Number of disadvanteged players: {len(disadvantaged)}")
143+
print(f"Number of disadvantaged players: {len(disadvantaged)}")
144144
print(f"Number of committing players: {len(committing)}")
145145
print(f"Global probability of a foul being called: " f"{100*round(df.foul_called.mean(),3)}%\n\n")
146146
df.head()

examples/case_studies/multilevel_modeling.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -1026,7 +1026,7 @@
10261026
"Neither of these models are satisfactory:\n",
10271027
"\n",
10281028
"* If we are trying to identify high-radon counties, pooling is useless -- because, by definition, the pooled model estimates radon at the state-level. In other words, pooling leads to maximal *underfitting*: the variation across counties is not taken into account and only the overall population is estimated.\n",
1029-
"* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarites across counties) is not estimated. \n",
1029+
"* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarities across counties) is not estimated. \n",
10301030
"\n",
10311031
"This issue is acute for small sample sizes, as seen above: in counties where we have few floor measurements, if radon levels are higher for those data points than for basement ones (Aitkin, Koochiching, Ramsey), the model will estimate that radon levels are higher in floors than basements for these counties. But we shouldn't trust this conclusion, because both scientific knowledge and the situation in other counties tell us that it is usually the reverse (basement radon > floor radon). So unless we have a lot of observations telling us otherwise for a given county, we should be skeptical and shrink our county-estimates to the state-estimates -- in other words, we should balance between cluster-level and population-level information, and the amount of shrinkage will depend on how extreme and how numerous the data in each cluster are. \n",
10321032
"\n",

examples/case_studies/multilevel_modeling.myst.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,7 @@ for i, c in enumerate(sample_counties):
341341
Neither of these models are satisfactory:
342342

343343
* If we are trying to identify high-radon counties, pooling is useless -- because, by definition, the pooled model estimates radon at the state-level. In other words, pooling leads to maximal *underfitting*: the variation across counties is not taken into account and only the overall population is estimated.
344-
* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarites across counties) is not estimated.
344+
* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarities across counties) is not estimated.
345345

346346
This issue is acute for small sample sizes, as seen above: in counties where we have few floor measurements, if radon levels are higher for those data points than for basement ones (Aitkin, Koochiching, Ramsey), the model will estimate that radon levels are higher in floors than basements for these counties. But we shouldn't trust this conclusion, because both scientific knowledge and the situation in other counties tell us that it is usually the reverse (basement radon > floor radon). So unless we have a lot of observations telling us otherwise for a given county, we should be skeptical and shrink our county-estimates to the state-estimates -- in other words, we should balance between cluster-level and population-level information, and the amount of shrinkage will depend on how extreme and how numerous the data in each cluster are.
347347

examples/case_studies/probabilistic_matrix_factorization.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -1058,7 +1058,7 @@
10581058
"source": [
10591059
"### Training Data vs. Test Data\n",
10601060
"\n",
1061-
"The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \\times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to idenfity which split is which, so we'll take a hash of the indices selected for testing and use that to save the data."
1061+
"The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \\times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to identify which split is which, so we'll take a hash of the indices selected for testing and use that to save the data."
10621062
]
10631063
},
10641064
{
@@ -1512,7 +1512,7 @@
15121512
"source": [
15131513
"It appears we get convergence of $U$ and $V$ after about the default tuning. When testing for convergence, we also want to see convergence of the particular statistics we are looking for, since different characteristics of the posterior may converge at different rates. Let's also do a traceplot of the RSME. We'll compute RMSE for both the train and the test set, even though the convergence is indicated by RMSE on the training set alone. In addition, let's compute a running RMSE on the train/test sets to see how aggregate performance improves or decreases as we continue to sample.\n",
15141514
"\n",
1515-
"Notice here that we are sampling from 1 chain only, which makes the convergence statisitcs like $\\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space."
1515+
"Notice here that we are sampling from 1 chain only, which makes the convergence statistics like $\\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space."
15161516
]
15171517
},
15181518
{

examples/case_studies/probabilistic_matrix_factorization.myst.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -486,7 +486,7 @@ def rmse(test_data, predicted):
486486

487487
### Training Data vs. Test Data
488488

489-
The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to idenfity which split is which, so we'll take a hash of the indices selected for testing and use that to save the data.
489+
The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to identify which split is which, so we'll take a hash of the indices selected for testing and use that to save the data.
490490

491491
```{code-cell} ipython3
492492
# Define a function for splitting train/test data.
@@ -668,7 +668,7 @@ pmf.traceplot()
668668

669669
It appears we get convergence of $U$ and $V$ after about the default tuning. When testing for convergence, we also want to see convergence of the particular statistics we are looking for, since different characteristics of the posterior may converge at different rates. Let's also do a traceplot of the RSME. We'll compute RMSE for both the train and the test set, even though the convergence is indicated by RMSE on the training set alone. In addition, let's compute a running RMSE on the train/test sets to see how aggregate performance improves or decreases as we continue to sample.
670670

671-
Notice here that we are sampling from 1 chain only, which makes the convergence statisitcs like $\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space.
671+
Notice here that we are sampling from 1 chain only, which makes the convergence statistics like $\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space.
672672

673673
```{code-cell} ipython3
674674
def _running_rmse(pmf_model, test_data, train_data, plot=True):

examples/gaussian_processes/MOGP-Coregion-Hadamard.ipynb

+3-3
Original file line numberDiff line numberDiff line change
@@ -178,13 +178,13 @@
178178
"name": "stdout",
179179
"output_type": "stream",
180180
"text": [
181-
"There are 262 pichers, in 182 game dates\n"
181+
"There are 262 pitchers, in 182 game dates\n"
182182
]
183183
}
184184
],
185185
"source": [
186186
"print(\n",
187-
" f\"There are {df['pitcher_name'].nunique()} pichers, in {df['game_date'].nunique()} game dates\"\n",
187+
" f\"There are {df['pitcher_name'].nunique()} pitchers, in {df['game_date'].nunique()} game dates\"\n",
188188
")"
189189
]
190190
},
@@ -699,7 +699,7 @@
699699
"adf = adf.sort_values([\"output_idx\", \"x\"])\n",
700700
"X = adf[\n",
701701
" [\"x\", \"output_idx\"]\n",
702-
"].values # Input data includes the index of game dates, and the index of picthers\n",
702+
"].values # Input data includes the index of game dates, and the index of pitchers\n",
703703
"Y = adf[\"avg_spin_rate\"].values # Output data includes the average spin rate of pitchers\n",
704704
"X.shape, Y.shape"
705705
]

0 commit comments

Comments
 (0)