Skip to content

Spellcheck the repository #492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/case_studies/BART_introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,7 @@
"id": "8633b7b4",
"metadata": {},
"source": [
"The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workins of BART."
"The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workings of BART."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/BART_introduction.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ The following figure shows two samples of $\mu$ from the posterior.
plt.step(x_data, idata_coal.posterior["μ"].sel(chain=0, draw=[3, 10]).T);
```

The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workins of BART.
The next figure shows 3 trees. As we can see these are very simple function and definitely not very good approximators by themselves. Inspecting individuals trees is generally not necessary when working with BART, we are showing them just so we can gain further intuition on the inner workings of BART.

```{code-cell} ipython3
bart_trees = μ_.owner.op.all_trees
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/binning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -758,7 +758,7 @@
"id": "71c3cf64",
"metadata": {},
"source": [
"Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overal posterior average with `.mean(dim=[\"draw\", \"chain\"])`."
"Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overall posterior average with `.mean(dim=[\"draw\", \"chain\"])`."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/binning.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,7 @@ Recall that we used `mu = -2` and `sigma = 2` to generate the data.
az.plot_posterior(trace1, var_names=["mu", "sigma"], ref_val=[true_mu, true_sigma]);
```

Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overal posterior average with `.mean(dim=["draw", "chain"])`.
Pretty good! And we can access the posterior mean estimates (stored as [xarray](http://xarray.pydata.org/en/stable/index.html) types) as below. The MCMC samples arrive back in a 2D matrix with one dimension for the MCMC chain (`chain`), and one for the sample number (`draw`). We can calculate the overall posterior average with `.mean(dim=["draw", "chain"])`.

```{code-cell} ipython3
:tags: []
Expand Down
4 changes: 2 additions & 2 deletions examples/case_studies/conditional-autoregressive-model.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -286,11 +286,11 @@
" self.mode = 0.0\n",
"\n",
" def get_mu(self, x):\n",
" def weigth_mu(w, a):\n",
" def weight_mu(w, a):\n",
" a1 = tt.cast(a, \"int32\")\n",
" return tt.sum(w * x[a1]) / tt.sum(w)\n",
"\n",
" mu_w, _ = scan(fn=weigth_mu, sequences=[self.w, self.a])\n",
" mu_w, _ = scan(fn=weight_mu, sequences=[self.w, self.a])\n",
"\n",
" return mu_w\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -220,11 +220,11 @@ class CAR(distribution.Continuous):
self.mode = 0.0
def get_mu(self, x):
def weigth_mu(w, a):
def weight_mu(w, a):
a1 = tt.cast(a, "int32")
return tt.sum(w * x[a1]) / tt.sum(w)
mu_w, _ = scan(fn=weigth_mu, sequences=[self.w, self.a])
mu_w, _ = scan(fn=weight_mu, sequences=[self.w, self.a])
return mu_w
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/hierarchical_partial_pooling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"\n",
"It may be possible to cluster groups of \"similar\" players, and estimate group averages, but using a hierarchical modeling approach is a natural way of sharing information that does not involve identifying *ad hoc* clusters.\n",
"\n",
"The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* informatino) will be automatically accounted for, by shrinking those estimates closer to the global mean.\n",
"The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* information) will be automatically accounted for, by shrinking those estimates closer to the global mean.\n",
"\n",
"For far more in-depth discussion please refer to Stan [tutorial](http://mc-stan.org/documentation/case-studies/pool-binary-trials.html) {cite:p}`carpenter2016hierarchical` on the subject. The model and parameter values were taken from that example."
]
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/hierarchical_partial_pooling.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Of course, neither approach is realistic. Clearly, all players aren't equally sk

It may be possible to cluster groups of "similar" players, and estimate group averages, but using a hierarchical modeling approach is a natural way of sharing information that does not involve identifying *ad hoc* clusters.

The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* informatino) will be automatically accounted for, by shrinking those estimates closer to the global mean.
The idea of hierarchical partial pooling is to model the global performance, and use that estimate to parameterize a population of players that accounts for differences among the players' performances. This tradeoff between global and individual performance will be automatically tuned by the model. Also, uncertainty due to different number of at bats for each player (*i.e.* information) will be automatically accounted for, by shrinking those estimates closer to the global mean.

For far more in-depth discussion please refer to Stan [tutorial](http://mc-stan.org/documentation/case-studies/pool-binary-trials.html) {cite:p}`carpenter2016hierarchical` on the subject. The model and parameter values were taken from that example.

Expand Down
4 changes: 2 additions & 2 deletions examples/case_studies/item_response_nba.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@
"output_type": "stream",
"text": [
"Number of observed plays: 46861\n",
"Number of disadvanteged players: 770\n",
"Number of disadvantaged players: 770\n",
"Number of committing players: 789\n",
"Global probability of a foul being called: 23.3%\n",
"\n",
Expand Down Expand Up @@ -378,7 +378,7 @@
"\n",
"# Display of main dataframe with some statistics\n",
"print(f\"Number of observed plays: {len(df)}\")\n",
"print(f\"Number of disadvanteged players: {len(disadvantaged)}\")\n",
"print(f\"Number of disadvantaged players: {len(disadvantaged)}\")\n",
"print(f\"Number of committing players: {len(committing)}\")\n",
"print(f\"Global probability of a foul being called: \" f\"{100*round(df.foul_called.mean(),3)}%\\n\\n\")\n",
"df.head()"
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/item_response_nba.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ df.index.name = "play_id"

# Display of main dataframe with some statistics
print(f"Number of observed plays: {len(df)}")
print(f"Number of disadvanteged players: {len(disadvantaged)}")
print(f"Number of disadvantaged players: {len(disadvantaged)}")
print(f"Number of committing players: {len(committing)}")
print(f"Global probability of a foul being called: " f"{100*round(df.foul_called.mean(),3)}%\n\n")
df.head()
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/multilevel_modeling.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1026,7 +1026,7 @@
"Neither of these models are satisfactory:\n",
"\n",
"* If we are trying to identify high-radon counties, pooling is useless -- because, by definition, the pooled model estimates radon at the state-level. In other words, pooling leads to maximal *underfitting*: the variation across counties is not taken into account and only the overall population is estimated.\n",
"* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarites across counties) is not estimated. \n",
"* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarities across counties) is not estimated. \n",
"\n",
"This issue is acute for small sample sizes, as seen above: in counties where we have few floor measurements, if radon levels are higher for those data points than for basement ones (Aitkin, Koochiching, Ramsey), the model will estimate that radon levels are higher in floors than basements for these counties. But we shouldn't trust this conclusion, because both scientific knowledge and the situation in other counties tell us that it is usually the reverse (basement radon > floor radon). So unless we have a lot of observations telling us otherwise for a given county, we should be skeptical and shrink our county-estimates to the state-estimates -- in other words, we should balance between cluster-level and population-level information, and the amount of shrinkage will depend on how extreme and how numerous the data in each cluster are. \n",
"\n",
Expand Down
2 changes: 1 addition & 1 deletion examples/case_studies/multilevel_modeling.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ for i, c in enumerate(sample_counties):
Neither of these models are satisfactory:

* If we are trying to identify high-radon counties, pooling is useless -- because, by definition, the pooled model estimates radon at the state-level. In other words, pooling leads to maximal *underfitting*: the variation across counties is not taken into account and only the overall population is estimated.
* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarites across counties) is not estimated.
* We do not trust extreme unpooled estimates produced by models using few observations. This leads to maximal *overfitting*: only the within-county variations are taken into account and the overall population (i.e the state-level, which tells us about similarities across counties) is not estimated.

This issue is acute for small sample sizes, as seen above: in counties where we have few floor measurements, if radon levels are higher for those data points than for basement ones (Aitkin, Koochiching, Ramsey), the model will estimate that radon levels are higher in floors than basements for these counties. But we shouldn't trust this conclusion, because both scientific knowledge and the situation in other counties tell us that it is usually the reverse (basement radon > floor radon). So unless we have a lot of observations telling us otherwise for a given county, we should be skeptical and shrink our county-estimates to the state-estimates -- in other words, we should balance between cluster-level and population-level information, and the amount of shrinkage will depend on how extreme and how numerous the data in each cluster are.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1058,7 +1058,7 @@
"source": [
"### Training Data vs. Test Data\n",
"\n",
"The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \\times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to idenfity which split is which, so we'll take a hash of the indices selected for testing and use that to save the data."
"The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \\times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to identify which split is which, so we'll take a hash of the indices selected for testing and use that to save the data."
]
},
{
Expand Down Expand Up @@ -1512,7 +1512,7 @@
"source": [
"It appears we get convergence of $U$ and $V$ after about the default tuning. When testing for convergence, we also want to see convergence of the particular statistics we are looking for, since different characteristics of the posterior may converge at different rates. Let's also do a traceplot of the RSME. We'll compute RMSE for both the train and the test set, even though the convergence is indicated by RMSE on the training set alone. In addition, let's compute a running RMSE on the train/test sets to see how aggregate performance improves or decreases as we continue to sample.\n",
"\n",
"Notice here that we are sampling from 1 chain only, which makes the convergence statisitcs like $\\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space."
"Notice here that we are sampling from 1 chain only, which makes the convergence statistics like $\\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,7 @@ def rmse(test_data, predicted):

### Training Data vs. Test Data

The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to idenfity which split is which, so we'll take a hash of the indices selected for testing and use that to save the data.
The next thing we need to do is split our data into a training set and a test set. Matrix factorization techniques use [transductive learning](http://en.wikipedia.org/wiki/Transduction_%28machine_learning%29) rather than inductive learning. So we produce a test set by taking a random sample of the cells in the full $N \times M$ data matrix. The values selected as test samples are replaced with `nan` values in a copy of the original data matrix to produce the training set. Since we'll be producing random splits, let's also write out the train/test sets generated. This will allow us to replicate our results. We'd like to be able to identify which split is which, so we'll take a hash of the indices selected for testing and use that to save the data.

```{code-cell} ipython3
# Define a function for splitting train/test data.
Expand Down Expand Up @@ -668,7 +668,7 @@ pmf.traceplot()

It appears we get convergence of $U$ and $V$ after about the default tuning. When testing for convergence, we also want to see convergence of the particular statistics we are looking for, since different characteristics of the posterior may converge at different rates. Let's also do a traceplot of the RSME. We'll compute RMSE for both the train and the test set, even though the convergence is indicated by RMSE on the training set alone. In addition, let's compute a running RMSE on the train/test sets to see how aggregate performance improves or decreases as we continue to sample.

Notice here that we are sampling from 1 chain only, which makes the convergence statisitcs like $\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space.
Notice here that we are sampling from 1 chain only, which makes the convergence statistics like $\hat{R}$ impossible (we can still compute the split-rhat but the purpose is different). The reason of not sampling multiple chain is that PMF might not have unique solution. Thus without constraints, the solutions are at best symmetrical, at worse identical under any rotation, in any case subject to label switching. In fact if we sample from multiple chains we will see large $\hat{R}$ indicating the sampler is exploring different solutions in different part of parameter space.

```{code-cell} ipython3
def _running_rmse(pmf_model, test_data, train_data, plot=True):
Expand Down
6 changes: 3 additions & 3 deletions examples/gaussian_processes/MOGP-Coregion-Hadamard.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -178,13 +178,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"There are 262 pichers, in 182 game dates\n"
"There are 262 pitchers, in 182 game dates\n"
]
}
],
"source": [
"print(\n",
" f\"There are {df['pitcher_name'].nunique()} pichers, in {df['game_date'].nunique()} game dates\"\n",
" f\"There are {df['pitcher_name'].nunique()} pitchers, in {df['game_date'].nunique()} game dates\"\n",
")"
]
},
Expand Down Expand Up @@ -699,7 +699,7 @@
"adf = adf.sort_values([\"output_idx\", \"x\"])\n",
"X = adf[\n",
" [\"x\", \"output_idx\"]\n",
"].values # Input data includes the index of game dates, and the index of picthers\n",
"].values # Input data includes the index of game dates, and the index of pitchers\n",
"Y = adf[\"avg_spin_rate\"].values # Output data includes the average spin rate of pitchers\n",
"X.shape, Y.shape"
]
Expand Down
Loading