diff --git a/source/_config.yml b/source/_config.yml
index 5651a12e..6bf00afd 100755
--- a/source/_config.yml
+++ b/source/_config.yml
@@ -49,7 +49,7 @@ html:
   extra_navbar: Powered by <a href="https://jupyterbook.org">Jupyter Book</a> # Will be displayed underneath the left navbar.
   extra_footer: "" # Will be displayed underneath the footer.
   google_analytics_id: "G-7XBFF4RSN2" # A GA id that can be used to track book views.
-  home_page_in_navbar: true # Whether to include your home page in the left Navigation Bar
+  home_page_in_navbar: false # Whether to include your home page in the left Navigation Bar
   baseurl: "" # The base URL where your book will be hosted. Used for creating image previews and social links. e.g.: https://mypage.com/mybook/
   comments:
     hypothesis: false
diff --git a/source/classification1.md b/source/classification1.md
index a393f295..30b2d90b 100755
--- a/source/classification1.md
+++ b/source/classification1.md
@@ -144,7 +144,7 @@ In this case, the file containing the breast cancer data set is a `.csv`
 file with headers. We'll use the `read_csv` function with no additional
 arguments, and then inspect its contents:
 
-```{index} read function; read\_csv
+```{index} read function; read_csv
 ```
 
 ```{code-cell} ipython3
@@ -183,7 +183,7 @@ total set of variables per image in this data set is:
 
 +++
 
-```{index} info
+```{index} DataFrame; info
 ```
 
 Below we use the `info` method to preview the data frame. This method can
@@ -195,7 +195,7 @@ as well as their data types and the number of non-missing entries.
 cancer.info()
 ```
 
-```{index} unique
+```{index} Series; unique
 ```
 
 From the summary of the data above, we can see that `Class` is of type `object`.
@@ -213,7 +213,7 @@ method. The `replace` method takes one argument: a dictionary that maps
 previous values to desired new values.
 We will verify the result using the `unique` method.
 
-```{index} replace
+```{index} Series; replace
 ```
 
 ```{code-cell} ipython3
@@ -227,7 +227,7 @@ cancer["Class"].unique()
 
 ### Exploring the cancer data
 
-```{index} groupby, count
+```{index} DataFrame; groupby, Series;size
 ```
 
 ```{code-cell} ipython3
@@ -239,9 +239,9 @@ glue("malignant_pct", "{:0.0f}".format(100*cancer["Class"].value_counts(normaliz
 ```
 
 Before we start doing any modeling, let's explore our data set. Below we use
-the `groupby` and `count` methods to find the number and percentage
+the `groupby` and `size` methods to find the number and percentage
 of benign and malignant tumor observations in our data set. When paired with
-`groupby`, `count` counts the number of observations for each value of the `Class`
+`groupby`, `size` counts the number of observations for each value of the `Class`
 variable. Then we calculate the percentage in each group by dividing by the total
 number of observations and multiplying by 100.
 The total number of observations equals the number of rows in the data frame,
@@ -256,7 +256,7 @@ tumor observations.
 100 * cancer.groupby("Class").size() / cancer.shape[0]
 ```
 
-```{index} value_counts
+```{index} Series; value_counts
 ```
 
 The `pandas` package also has a more convenient specialized `value_counts` method for
@@ -621,8 +621,6 @@ glue("fig:05-multiknn-1", perim_concav_with_new_point3)
 Scatter plot of concavity versus perimeter with new observation represented as a red diamond.
 :::
 
-```{index} pandas.DataFrame; assign
-```
 
 ```{code-cell} ipython3
 new_obs_Perimeter = 0
@@ -952,7 +950,7 @@ knn = KNeighborsClassifier(n_neighbors=5)
 knn
 ```
 
-```{index} scikit-learn; X & y
+```{index} scikit-learn; fit, scikit-learn; predictors, scikit-learn; response
 ```
 
 In order to fit the model on the breast cancer data, we need to call `fit` on
@@ -1061,10 +1059,13 @@ predictors (colored by diagnosis) for both the unstandardized data we just
 loaded, and the standardized version of that same data. But first, we need to
 standardize the `unscaled_cancer` data set with `scikit-learn`.
 
-```{index} pipeline, scikit-learn; make_column_transformer
+```{index} see: Pipeline; scikit-learn
+```
+
+```{index} see: make_column_transformer; scikit-learn
 ```
 
-```{index} double: scikit-learn; pipeline
+```{index} scikit-learn;Pipeline, scikit-learn; make_column_transformer
 ```
 
 The `scikit-learn` framework provides a collection of *preprocessors* used to manipulate
@@ -1090,13 +1091,13 @@ preprocessor = make_column_transformer(
 preprocessor
 ```
 
-```{index} scikit-learn; ColumnTransformer, scikit-learn; StandardScaler, scikit-learn; fit_transform
+```{index} scikit-learn; make_column_transformer, scikit-learn; StandardScaler 
 ```
 
-```{index} ColumnTransformer; StandardScaler
+```{index} see: StandardScaler; scikit-learn
 ```
 
-```{index} scikit-learn; fit, scikit-learn; transform
+```{index} scikit-learn; fit, scikit-learn; make_column_selector, scikit-learn; StandardScaler
 ```
 
 You can see that the preprocessor includes a single standardization step
@@ -1119,7 +1120,10 @@ preprocessor = make_column_transformer(
 preprocessor
 ```
 
-```{index} see: fit, transform, fit_transform; scikit-learn
+```{index} see: fit ; scikit-learn
+```
+
+```{index} scikit-learn; transform
 ```
 
 We are now ready to standardize the numerical predictor columns in the `unscaled_cancer` data frame.
@@ -1409,6 +1413,9 @@ detection, there are many cases in which the "important" class to identify
 (presence of disease, malicious email) is much rarer than the "unimportant"
 class (no disease, normal email).
 
+```{index} concat
+```
+
 To better illustrate the problem, let's revisit the scaled breast cancer data,
 `cancer`; except now we will remove many of the observations of malignant tumors, simulating
 what the data would look like if the cancer was rare. We will do this by
@@ -1603,7 +1610,7 @@ Imbalanced data with background color indicating the decision of the classifier
 
 +++
 
-```{index} oversampling, scikit-learn; sample
+```{index} oversampling, DataFrame; sample
 ```
 
 Despite the simplicity of the problem, solving it in a statistically sound manner is actually
@@ -1747,6 +1754,9 @@ entries, one option is to simply remove those observations prior to building
 the K-nearest neighbors classifier. We can accomplish this by using the
 `dropna` method prior to working with the data.
 
+```{index} missing data; dropna
+```
+
 ```{code-cell} ipython3
 no_missing_cancer = missing_cancer.dropna()
 no_missing_cancer
@@ -1758,8 +1768,11 @@ possible approach is to *impute* the missing entries, i.e., fill in synthetic
 values based on the other observations in the data set. One reasonable choice
 is to perform *mean imputation*, where missing entries are filled in using the
 mean of the present entries in each variable. To perform mean imputation, we
-use a `SimpleImputer` transformer with the default arguments, and wrap it in a
-`ColumnTransformer` to indicate which columns need imputation.
+use a `SimpleImputer` transformer with the default arguments, and use
+`make_column_transformer` to indicate which columns need imputation.
+
+```{index} scikit-learn; SimpleImputer, missing data;mean imputation
+```
 
 ```{code-cell} ipython3
 from sklearn.impute import SimpleImputer
@@ -1792,7 +1805,7 @@ question you are answering.
 (08:puttingittogetherworkflow)=
 ## Putting it together in a `Pipeline`
 
-```{index} scikit-learn; pipeline
+```{index} scikit-learn; Pipeline
 ```
 
 The `scikit-learn` package collection also provides the [`Pipeline`](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html?highlight=pipeline#sklearn.pipeline.Pipeline),
diff --git a/source/classification2.md b/source/classification2.md
index 649b5aa3..0b7cebec 100755
--- a/source/classification2.md
+++ b/source/classification2.md
@@ -121,6 +121,9 @@ $$\mathrm{accuracy} = \frac{\mathrm{number \; of  \; correct  \; predictions}}{\
 Process for splitting the data and finding the prediction accuracy.
 ```
 
+```{index} confusion matrix
+```
+
 Accuracy is a convenient, general-purpose way to summarize the performance of a classifier with
 a single number.  But prediction accuracy by itself does not tell the whole
 story.  In particular, accuracy alone only tells us how often the classifier
@@ -165,6 +168,9 @@ disastrous error, since it may lead to a patient who requires treatment not rece
 Since we are particularly interested in identifying malignant cases, this
 classifier would likely be unacceptable even with an accuracy of 89%.
 
+```{index} positive label, negative label, true positive, true negative, false positive, false negative
+```
+
 Focusing more on one label than the other is
 common in classification problems. In such cases, we typically refer to the label we are more
 interested in identifying as the *positive* label, and the other as the
@@ -178,6 +184,9 @@ classifier can make, corresponding to the four entries in the confusion matrix:
 - **True Negative:** A benign observation that was classified as benign (bottom right in {numref}`confusion-matrix-table`).
 - **False Negative:** A malignant observation that was classified as benign (top right in {numref}`confusion-matrix-table`).
 
+```{index} precision, recall
+```
+
 A perfect classifier would have zero false negatives and false positives (and
 therefore, 100% accuracy). However, classifiers in practice will almost always
 make some errors. So you should think about which kinds of error are most
@@ -358,6 +367,12 @@ in `np.random.seed` will lead to different patterns of randomness, but as long a
 value your analysis results will be the same. In the remainder of the textbook,
 we will set the seed once at the beginning of each chapter.
 
+```{index} RandomState
+```
+
+```{index} see: RandomState; seed
+```
+
 ````{note}
 When you use `np.random.seed`, you are really setting the seed for the `numpy`
 package's *default random number generator*. Using the global default random
@@ -516,7 +531,7 @@ glue("cancer_train_nrow", "{:d}".format(len(cancer_train)))
 glue("cancer_test_nrow", "{:d}".format(len(cancer_test)))
 ```
 
-```{index} info
+```{index} DataFrame; info
 ```
 
 We can see from the `info` method above that the training set contains {glue:text}`cancer_train_nrow` observations,
@@ -525,7 +540,7 @@ a train / test split of 75% / 25%, as desired. Recall from {numref}`Chapter %s <
 that we use the `info` method to preview the number of rows, the variable names, their data types, and
 missing entries of a data frame.
 
-```{index} groupby, count
+```{index} Series; value_counts
 ```
 
 We can use the `value_counts` method with the `normalize` argument set to `True`
@@ -557,7 +572,7 @@ training and test data sets.
 
 +++
 
-```{index} pipeline, pipeline; make_column_transformer, pipeline; StandardScaler
+```{index} scikit-learn; Pipeline, scikit-learn; make_column_transformer, scikit-learn; StandardScaler
 ```
 
 Fortunately, `scikit-learn` helps us handle this properly as long as we wrap our
@@ -603,7 +618,7 @@ knn_pipeline
 
 ### Predict the labels in the test set
 
-```{index} pandas.concat
+```{index} scikit-learn; predict
 ```
 
 Now that we have a K-nearest neighbors classifier object, we can use it to
@@ -622,7 +637,7 @@ cancer_test[["ID", "Class", "predicted"]]
 (eval-performance-clasfcn2)=
 ### Evaluate performance
 
-```{index} scikit-learn; score
+```{index} scikit-learn; score, scikit-learn; precision_score, scikit-learn; recall_score
 ```
 
 Finally, we can assess our classifier's performance. First, we will examine accuracy.
@@ -695,6 +710,9 @@ arguments: the actual labels first, then the predicted labels second. Note that
 `crosstab` orders its columns alphabetically, but the positive label is still `Malignant`,
 even if it is not in the top left corner as in the example confusion matrix earlier in this chapter.
 
+```{index} crosstab
+```
+
 ```{code-cell} ipython3
 pd.crosstab(
     cancer_test["Class"],
@@ -774,7 +792,7 @@ a recall of {glue:text}`cancer_rec_1`%.
 That sounds pretty good! Wait, *is* it good?
 Or do we need something higher?
 
-```{index} accuracy; assessment
+```{index} accuracy;assessment, precision;assessment, recall;assessment
 ```
 
 In general, a *good* value for accuracy (as well as precision and recall, if applicable)
@@ -1026,6 +1044,12 @@ cv_5_df = pd.DataFrame(
 cv_5_df
 ```
 
+```{index} see: sem;standard error
+```
+
+```{index} standard error, DataFrame;agg
+```
+
 The validation scores we are interested in are contained in the `test_score` column.
 We can then aggregate the *mean* and *standard error*
 of the classifier's validation accuracy across the folds.
@@ -1098,6 +1122,9 @@ cv_10_metrics["test_score"]["sem"] = cv_5_metrics["test_score"]["sem"] / np.sqrt
 cv_10_metrics
 ```
 
+```{index} cross-validation; folds
+```
+
 In this case, using 10-fold instead of 5-fold cross validation did
 reduce the standard error very slightly. In fact, due to the randomness in how the data are split, sometimes
 you might even end up with a *higher* standard error when increasing the number of folds!
@@ -1153,6 +1180,11 @@ functionality, named `GridSearchCV`, to automatically handle the details for us.
 Before we use `GridSearchCV`, we need to create a new pipeline
 with a `KNeighborsClassifier` that has the number of neighbors left unspecified.
 
+```{index} see: make_pipeline; scikit-learn
+```
+```{index} scikit-learn;make_pipeline
+```
+
 ```{code-cell} ipython3
 knn = KNeighborsClassifier()
 cancer_tune_pipe = make_pipeline(cancer_preprocessor, knn)
@@ -1534,6 +1566,9 @@ us automatically. To make predictions and assess the estimated accuracy of the b
 `score` and `predict` methods of the fit `GridSearchCV` object. We can then pass those predictions to
 the `precision`, `recall`, and `crosstab` functions to assess the estimated precision and recall, and print a confusion matrix.
 
+```{index} scikit-learn;predict, scikit-learn;score, scikit-learn;precision_score, scikit-learn;recall_score, crosstab
+```
+
 ```{code-cell} ipython3
 cancer_test["predicted"] = cancer_tune_grid.predict(
     cancer_test[["Smoothness", "Concavity"]]
@@ -1637,7 +1672,7 @@ Overview of K-NN classification.
 
 +++
 
-```{index} scikit-learn, pipeline, cross-validation, K-nearest neighbors; classification, classification
+```{index} scikit-learn;Pipeline, cross-validation, K-nearest neighbors; classification, classification
 ```
 
 The overall workflow for performing K-nearest neighbors classification using `scikit-learn` is as follows:
@@ -1755,19 +1790,7 @@ for i in range(len(ks)):
     cancer_tune_pipe = make_pipeline(cancer_preprocessor, KNeighborsClassifier())
     param_grid = {
         "kneighborsclassifier__n_neighbors": range(1, 21),
-    }  ## double check: in R textbook, it is tune_grid(..., grid=20), so I guess it matches RandomizedSearchCV
-       ## instead of GridSeachCV?
-    # param_grid_rand = {
-    #     "kneighborsclassifier__n_neighbors": range(1, 100),
-    # }
-    # cancer_tune_grid = RandomizedSearchCV(
-    #     estimator=cancer_tune_pipe,
-    #     param_distributions=param_grid_rand,
-    #     n_iter=20,
-    #     cv=5,
-    #     n_jobs=-1,
-    #     return_train_score=True,
-    # )
+    }  
     cancer_tune_grid = GridSearchCV(
         estimator=cancer_tune_pipe,
         param_grid=param_grid,
@@ -1980,7 +2003,10 @@ where to learn more about advanced predictor selection methods.
 
 +++
 
-### Forward selection in `scikit-learn`
+### Forward selection in Python
+
+```{index} variable selection; implementation
+```
 
 We now turn to implementing forward selection in Python.
 First we will extract a smaller set of predictors to work with in this illustrative example&mdash;`Smoothness`,
diff --git a/source/clustering.md b/source/clustering.md
index 7dc7815a..11ce494f 100755
--- a/source/clustering.md
+++ b/source/clustering.md
@@ -308,7 +308,7 @@ have.
 clus = penguins_clustered[penguins_clustered["cluster"] == 0][["bill_length_standardized", "flipper_length_standardized"]]
 ```
 
-```{index} see: within-cluster sum-of-squared-distances; WSSD
+```{index} see: within-cluster sum of squared distances; WSSD
 ```
 
 ```{index} WSSD
@@ -623,7 +623,7 @@ are changing, and the algorithm terminates.
 
 ### Random restarts
 
-```{index} K-means; init argument
+```{index} K-means; restart
 ```
 
 Unlike the classification and regression models we studied in previous chapters, K-means can get "stuck" in a bad solution.
@@ -792,7 +792,10 @@ Total WSSD for K clusters ranging from 1 to 9.
 
 ## K-means in Python
 
-```{index} K-means; kmeans function, scikit-learn; KMeans
+```{index} K-means, scikit-learn; KMeans
+```
+
+```{index} see: KMeans; scikit-learn
 ```
 
 We can perform K-means in Python using a workflow similar to those
@@ -807,6 +810,9 @@ To address this problem, we typically standardize our data before clustering,
 which ensures that each variable has a mean of 0 and standard deviation of 1.
 The `StandardScaler` function in `scikit-learn` can be used to do this.
 
+```{index} scikit-learn; StandardScaler, scikit-learn;KMeans, standardization;K-means, K-means;standardization
+```
+
 ```{code-cell} ipython3
 from sklearn.preprocessing import StandardScaler
 from sklearn.compose import make_column_transformer
@@ -826,6 +832,9 @@ To indicate that we are performing K-means clustering, we will create a `KMeans`
 model object. It takes at
 least one argument: the number of clusters `n_clusters`, which we set to 3.
 
+```{index} KMeans;n_clusters
+```
+
 ```{code-cell} ipython3
 from sklearn.cluster import KMeans
 
@@ -833,6 +842,9 @@ kmeans = KMeans(n_clusters=3)
 kmeans
 ```
 
+```{index} scikit-learn;make_pipeline, scikit-learn;Pipeline, scikit-learn;fit
+```
+
 To actually run the K-means clustering, we combine the preprocessor and model object
 in a `Pipeline`, and use the `fit` function. Note that the K-means
 algorithm uses a random initialization of assignments, but since we set
@@ -846,7 +858,7 @@ penguin_clust.fit(penguins)
 penguin_clust
 ```
 
-```{index} K-means; inertia_, K-means; cluster_centers_, K-means; labels_, K-means; predict
+```{index} KMeans; labels_, KMeans; inertia_
 ```
 
 The fit `KMeans` object&mdash;which is the second item in the
@@ -874,6 +886,9 @@ adding the `:N` suffix ensures that `altair`
 will treat the `cluster` variable as a nominal/categorical variable, and
 hence use a discrete color map for the visualization.
 
+```{index} altair; :N
+```
+
 ```{code-cell} ipython3
 cluster_plot=alt.Chart(penguins).mark_circle().encode(
     x=alt.X("flipper_length_mm").title("Flipper Length").scale(zero=False),
@@ -895,10 +910,10 @@ glue("cluster_plot", cluster_plot, display=True)
 The data colored by the cluster assignments returned by K-means.
 :::
 
-```{index} WSSD; total, K-means; inertia_
+```{index} WSSD; total, KMeans; inertia_
 ```
 
-```{index} see: WSSD; K-means inertia
+```{index} see: WSSD; KMeans
 ```
 
 As mentioned above,
@@ -920,6 +935,9 @@ where we repeat an operation multiple times
 and return a list with the result.
 Here is an examples of a list comprehension that stores the numbers 0-2 in a list:
 
+```{index} list comprehension
+```
+
 ```{code-cell} ipython3
 [n for n in range(3)]
 ```
@@ -992,9 +1010,6 @@ glue("elbow_plot", elbow_plot, display=True)
 A plot showing the total WSSD versus the number of clusters.
 :::
 
-```{index} K-means; init argument
-```
-
 It looks like three clusters is the right choice for this data,
 since that is where the "elbow" of the line is the most distinct.
 In the plot,
@@ -1008,6 +1023,9 @@ This is because K-means can get "stuck" in a bad solution
 due to an unlucky initialization of the initial center positions
 as we mentioned earlier in the chapter.
 
+```{index} KMeans; n_init
+```
+
 ```{note}
 It is rare that the implementation of K-means from `scikit-learn`
 gets stuck in a bad solution, because `scikit-learn` tries to choose
diff --git a/source/inference.md b/source/inference.md
index dfb36c07..44136c9c 100755
--- a/source/inference.md
+++ b/source/inference.md
@@ -168,7 +168,7 @@ We can find the proportion of listings for each room type
 by using the `value_counts` function with the `normalize` parameter
 as we did in previous chapters.
 
-```{index} pandas.DataFrame; df[], count, len
+```{index} DataFrame; [], DataFrame; value_counts
 ```
 
 ```{code-cell} ipython3
@@ -187,13 +187,13 @@ value, {glue:text}`population_proportion`, is the population parameter. Remember
 parameter value is usually unknown in real data analysis problems, as it is
 typically not possible to make measurements for an entire population.
 
-```{index} pandas.DataFrame; sample
+```{index} DataFrame; sample, seed;numpy.random.seed
 ```
 
 Instead, perhaps we can approximate it with a small subset of data!
 To investigate this idea, let's try randomly selecting 40 listings (*i.e.,* taking a random sample of
 size 40 from our population), and computing the proportion for that sample.
-We will use the `sample` method of the `pandas.DataFrame`
+We will use the `sample` method of the `DataFrame`
 object to take the sample. The argument `n` of `sample` is the size of the sample to take
 and since we are starting to use randomness here,
 we are also setting the random seed via numpy to make the results reproducible.
@@ -213,6 +213,9 @@ airbnb.sample(n=40)["room_type"].value_counts(normalize=True)
 glue("sample_1_proportion", "{:.3f}".format(airbnb.sample(n=40, random_state=155)["room_type"].value_counts(normalize=True)["Entire home/apt"]))
 ```
 
+```{index} DataFrame; value_counts
+```
+
 Here we see that the proportion of entire home/apartment listings in this
 random sample is {glue:text}`sample_1_proportion`. Wow&mdash;that's close to our
 true population value! But remember, we computed the proportion using a random sample of size 40.
@@ -245,7 +248,7 @@ commonly refer to as $n$) from a population is called
 a **sampling distribution**. The sampling distribution will help us see how much we would
 expect our sample proportions from this population to vary for samples of size 40.
 
-```{index} pandas.DataFrame; sample
+```{index} DataFrame; sample
 ```
 
 We again use the `sample` to take samples of size 40 from our
@@ -281,6 +284,9 @@ to compute the number of qualified observations in each sample; finally compute
 Both the first and last few entries of the resulting data frame are printed
 below to show that we end up with 20,000 point estimates, one for each of the 20,000 samples.
 
+```{index} DataFrame;groupby, DataFrame;reset_index
+```
+
 ```{code-cell} ipython3
 (
     samples
@@ -473,7 +479,7 @@ The price per night of all Airbnb rentals in Vancouver, BC
 is \${glue:text}`population_mean`, on average. This value is our
 population parameter since we are calculating it using the population data.
 
-```{index} pandas.DataFrame; sample
+```{index} DataFrame; sample
 ```
 
 Now suppose we did not have access to the population data (which is usually the
@@ -492,6 +498,9 @@ We can create a histogram to visualize the distribution of observations in the
 sample ({numref}`fig:11-example-means-sample-hist`), and calculate the mean
 of our sample.
 
+```{index} altair;mark_bar
+```
+
 ```{code-cell} ipython3
 :tags: [remove-output]
 
@@ -978,7 +987,7 @@ mean of the sample is \${glue:text}`estimate_mean`.
 Remember, in practice, we usually only have this one sample from the population. So
 this sample and estimate are the only data we can work with.
 
-```{index} bootstrap; in Python, scikit-learn; resample (bootstrap)
+```{index} bootstrap; in Python, DataFrame; sample (bootstrap)
 ```
 
 We now perform steps 1&ndash;5 listed above to generate a single bootstrap
@@ -1097,6 +1106,9 @@ generate a bootstrap distribution of these point estimates. The bootstrap
 distribution ({numref}`fig:11-bootstrapping5`) suggests how we might expect
 our point estimate to behave if we take multiple samples.
 
+```{index} DataFrame;reset_index, DataFrame;rename, DataFrame;groupby, Series;mean
+```
+
 ```{code-cell} ipython3
 boot20000_means = (
     boot20000
@@ -1240,7 +1252,10 @@ Quantiles are expressed in proportions rather than percentages,
 so the 2.5th and 97.5th percentiles
 would be the 0.025 and 0.975 quantiles, respectively.
 
-```{index} numpy; percentile, pandas.DataFrame; df[]
+```{index} DataFrame; [], DataFrame;quantile
+```
+
+```{index} percentile
 ```
 
 ```{code-cell} ipython3
diff --git a/source/intro.md b/source/intro.md
index 576deba0..606f3d27 100755
--- a/source/intro.md
+++ b/source/intro.md
@@ -264,7 +264,7 @@ Non-Official & Non-Aboriginal languages,American Sign Language,2685,3020,1145,21
 Non-Official & Non-Aboriginal languages,Amharic,22465,12785,200,33670
 ```
 
-```{index} function, argument, read function; read\_csv
+```{index} function, argument, read function; read_csv
 ```
 
 To load this data into Python so that we can do things with it (e.g., perform
@@ -437,7 +437,13 @@ can_lang
 
 ## Creating subsets of data frames with `[]` & `loc[]`
 
-```{index} pandas.DataFrame; [], pandas.DataFrame; loc[]
+```{index} see: []; DataFrame
+```
+
+```{index} see: loc[]; DataFrame
+```
+
+```{index} DataFrame; [], DataFrame; loc[], selecting columns
 ```
 
 Now that we've loaded our data into Python, we can start wrangling the data to
@@ -469,7 +475,7 @@ high-level categories of languages, which include "Aboriginal languages",
 our question we want to filter our data set so we restrict our attention
 to only those languages in the "Aboriginal languages" category.
 
-```{index} pandas.DataFrame; [], filter, logical statement, logical statement; equivalency operator, string
+```{index} DataFrame; [], filtering rows, logical statement, logical operator; equivalency (==), string
 ```
 
 We can use the `[]` operation to obtain the subset of rows with desired values
@@ -515,7 +521,7 @@ can_lang[can_lang["category"] == "Aboriginal languages"]
 ### Using `[]` to select columns
 
 
-```{index} pandas.DataFrame; [], select;
+```{index} DataFrame; [], selecting columns
 ```
 
 We can also use the `[]` operation to select columns from a data frame.
@@ -545,7 +551,7 @@ can_lang[["language", "mother_tongue"]]
 
 ### Using `loc[]` to filter rows and select columns
 
-```{index} pandas.DataFrame; loc[]
+```{index} DataFrame; loc[], selecting columns
 ```
 
 The `[]` operation is only used when you want to filter rows *or* select columns;
@@ -606,7 +612,7 @@ So it looks like the `loc[]` operation gave us the result we wanted!
 
 ## Using `sort_values` and `head` to select rows by ordered values
 
-```{index} pandas.DataFrame; sort_values, pandas.DataFrame; head
+```{index} DataFrame; sort_values, DataFrame; head
 ```
 
 We have used the `[]` and `loc[]` operations on a data frame to obtain a table
@@ -652,7 +658,7 @@ ten_lang
 (ch1-adding-modifying)=
 ## Adding and modifying columns
 
-```{index} assign
+```{index} adding columns, modifying columns
 ```
 
 Recall that our data analysis question referred to the *count* of Canadians
@@ -700,9 +706,6 @@ as a mother tongue by between 0.008% and 0.18% of the Canadian population.
 
 ## Combining analysis steps with chaining and multiline expressions
 
-```{index} chaining methods
-```
-
 It took us 3 steps to find the ten Aboriginal languages most often reported in
 2016 as mother tongues in Canada. Starting from the `can_lang` data frame, we:
 
@@ -771,6 +774,9 @@ what the rest of the expression is. We could, of course,
 put all of the code on one line of code, but splitting it across
 multiple lines helps a lot with code readability.
 
+```{index} chaining
+```
+
 We still have to handle the issue that each line of code---i.e., each step in the analysis---introduces
 a new temporary object. To address this issue, we can *chain* multiple operations together without
 assigning intermediate objects. The key idea of chaining is that the *output* of
@@ -866,7 +872,9 @@ First, we need to import the `altair` package.
 
 ```{code-cell} ipython3
 import altair as alt
+```
 
+```{index} altair; mark_bar, altair; encoding channel
 ```
 
 +++
@@ -916,7 +924,7 @@ Bar plot of the ten Aboriginal languages most often reported by Canadian residen
 
 +++
 
-```{index} see: .; chaining methods
+```{index} see: .; chaining
 ```
 
 ### Formatting `altair` charts
@@ -935,7 +943,7 @@ Canadian Residents)" would be much more informative. To make the code easier to
 read, we're spreading it out over multiple lines just as we did in the previous
 section with pandas.
 
-```{index} plot; labels, plot; axis labels
+```{index} plot; labels, plot; axis labels, altair; alt.X, altair; alt.Y, altair; title
 ```
 
 Adding additional labels to our visualizations that we create in `altair` is
diff --git a/source/jupyter.md b/source/jupyter.md
index 6f14c442..85110c29 100755
--- a/source/jupyter.md
+++ b/source/jupyter.md
@@ -410,15 +410,18 @@ notebook.
 
 ## Exploring data files
 
+```{index} separator
+```
+
 It is essential to preview data files before you try to read them into Python to see
-whether or not there are column names, what the delimiters are, and if there are
+whether or not there are column names, what the separators are, and if there are
 lines you need to skip. In Jupyter, you preview data files stored as plain text
 files (e.g., comma- and tab-separated files) in their plain text format ({numref}`open-data-w-editor-2`) by
 right-clicking on the file's name in the Jupyter file explorer, selecting
 **Open with**, and then selecting **Editor** ({numref}`open-data-w-editor-1`).
 Suppose you do not specify to open
 the data file with an editor. In that case, Jupyter will render a nice table
-for you, and you will not be able to see the column delimiters, and therefore
+for you, and you will not be able to see the column separators, and therefore
 you will not know which function to use, nor which arguments to use and values
 to specify for them.
 
diff --git a/source/preface-text.md b/source/preface-text.md
index 78148f79..39c506d2 100755
--- a/source/preface-text.md
+++ b/source/preface-text.md
@@ -15,11 +15,9 @@ kernelspec:
 
 # Preface
 
-```{index} data science, auditable, reproducible
+```{index} data science; definition, auditable, reproducible
 ```
 
-
-
 This textbook aims to be an approachable introduction to the world of data science.
 In this book, we define **data science** as the process of generating
 insight from data through **reproducible** and **auditable** processes.
diff --git a/source/reading.md b/source/reading.md
index 61c9d53c..442e5921 100755
--- a/source/reading.md
+++ b/source/reading.md
@@ -88,9 +88,6 @@ with respect to your *working directory* (i.e., "where you are currently") on th
 On the other hand, an absolute path indicates where the file is
 with respect to the computer's filesystem base (or *root*) folder, regardless of where you are working.
 
-```{index} Happiness Report
-```
-
 Suppose our computer's filesystem looks like the picture in
 {numref}`Filesystem`. We are working in a
 file titled `worksheet_02.ipynb`, and our current working directory is `worksheet_02`;
@@ -126,6 +123,15 @@ happy_data = pd.read_csv("data/happiness_report.csv")
 Note that there is no forward slash at the beginning of a relative path; if we accidentally typed `"/data/happiness_report.csv"`,
 Python would look for a folder named `data` in the root folder of the computer&mdash;but that doesn't exist!
 
+```{index} path; previous, path; current
+```
+
+```{index} see: ..; path
+```
+
+```{index} see: .; path
+```
+
 Aside from specifying places to go in a path using folder names (like `data` and `worksheet_02`), we can also specify two additional
 special places: the *current directory* and the *previous directory*. We indicate the current working directory with a single dot `.`, and
 the previous directory with two dots `..`. So for instance, if we wanted to reach the `bike_share.csv` file from the `worksheet_02` folder, we could
@@ -177,7 +183,7 @@ to where the resource is located on the remote machine.
 (readcsv)=
 ### `read_csv` to read in comma-separated values files
 
-```{index} csv, reading; separator, read function; read\_csv
+```{index} csv, reading; separator, read function; read_csv
 ```
 
 Now that we have learned about *where* data could be, we will learn about *how*
@@ -277,7 +283,7 @@ canlang_data = pd.read_csv("data/can_lang_meta-data.csv")
 ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 6
 ```
 
-```{index} Error
+```{index} ParserError
 ```
 
 ```{index} read function; skiprows argument
@@ -330,7 +336,7 @@ Non-Official & Non-Aboriginal languages	Amharic	22465	12785	200	33670
 ```{index} see: tab-separated values; tsv
 ```
 
-```{index} tsv, read function; read_tsv
+```{index} tsv
 ```
 
 To read in `.tsv` (**t**ab **s**eparated **v**alues) files, we can set the `sep` argument
@@ -362,7 +368,7 @@ arguments depending on the file format, our resulting data frame
 
 ### Using the `header` argument to handle missing column names
 
-```{index} read function; header, reading; separator
+```{index} read function; header argument, reading; separator
 ```
 
 The `can_lang_no_names.tsv` file contains a slightly different version
@@ -401,7 +407,7 @@ canlang_data = pd.read_csv(
 canlang_data
 ```
 
-```{index} pandas.DataFrame; rename, pandas
+```{index} DataFrame; rename, pandas
 ```
 
 It is best to rename your columns manually in this scenario. The current column names
@@ -528,6 +534,10 @@ X?a??4VT?,D?Jq
 ```{index} read function; read_excel
 ```
 
+```{index} Excel spreadsheet; reading
+```
+
+
 This type of file representation allows Excel files to store additional things
 that you cannot store in a `.csv` file, such as fonts, text formatting,
 graphics, multiple sheets and more. And despite looking odd in a plain text
@@ -614,12 +624,15 @@ usually stored and accessed locally on one computer from
 a file with a `.db` extension (or sometimes a `.sqlite` extension).
 Similar to Excel files, these are not plain text files and cannot be read in a plain text editor.
 
-```{index} database; connect, ibis, ibis; ibis
+```{index} database; connection, ibis; connect
 ```
 
 ```{index} see: ibis; database
 ```
 
+```{index} see: database; ibis
+```
+
 The first thing you need to do to read data into Python from a database is to
 connect to the database. For an SQLite database, we will do that using
 the `connect` function from the
@@ -642,7 +655,7 @@ import ibis
 conn = ibis.sqlite.connect("data/can_lang.db")
 ```
 
-```{index} database; tables; list_tables
+```{index} database; table, ibis; list_tables, ibis; sqlite
 ```
 
 Often relational databases have many tables; thus, in order to retrieve
@@ -656,7 +669,7 @@ tables = conn.list_tables()
 tables
 ```
 
-```{index} database; table, ibis; table
+```{index} table, ibis; table
 ```
 
 The `list_tables` function returned only one name---`"can_lang"`---which tells us
@@ -672,7 +685,10 @@ canlang_table = conn.table("can_lang")
 canlang_table
 ```
 
-```{index} database; count, ibis; count
+```{index} ibis; count
+```
+
+```{index} see: count; ibis
 ```
 
 Although it looks like we might have obtained the whole data frame from the database, we didn't!
@@ -687,7 +703,7 @@ In `ibis`, we can do that using the `count` function from the table object.
 canlang_table.count()
 ```
 
-```{index} execute, ibis; execute
+```{index} ibis; execute
 ```
 
 Wait a second...this isn't the number of rows in the database. In fact, we haven't actually sent our
@@ -708,7 +724,9 @@ the *actual* text of the SQL query that `ibis` sends to the database, you can us
 instead of `execute`. But note that you have to pass the result of `compile` to the `str` function to turn it into
 a human-readable string first.
 
-```{index} compile, ibis; compile
+```{index} see: compile;ibis
+```
+```{index} ibis; compile, str
 ```
 
 ```{code-cell} ipython3
@@ -725,7 +743,7 @@ The `ibis` package provides lots of `pandas`-like tools for working with databas
 For example, we can look at the first few rows of the table by using the `head` function,
 followed by `execute` to retrieve the response.
 
-```{index} database; head, ibis;
+```{index} ibis; head
 ```
 
 ```{code-cell} ipython3
@@ -742,7 +760,7 @@ the `language` and `mother_tongue` columns.
 We can use the `[]` operation with a logical statement
 to obtain only certain rows. Below we filter the data to include only Aboriginal languages.
 
-```{index} database; filter, ibis;
+```{index} database; filter rows, ibis; []
 ```
 
 ```{code-cell} ipython3
@@ -755,7 +773,7 @@ We didn't call `execute` because we are not ready to bring the data into Python
 We can still use the database to do some work to obtain *only* the small amount of data we want to work with locally
 in Python. Let's add the second part of our SQL query: selecting only the `language` and `mother_tongue` columns.
 
-```{index} database; select, ibis;
+```{index} database; select columns
 ```
 
 ```{code-cell} ipython3
@@ -777,7 +795,7 @@ that we need for analysis; we do eventually need to call `execute`.
 For example, `ibis` does not provide the `tail` function to look at the last
 rows in a database, even though `pandas` does.
 
-```{index} pandas.DataFrame; tail
+```{index} DataFrame; tail
 ```
 
 ```{code-cell} ipython3
@@ -821,6 +839,9 @@ Note that the `host` (`fakeserver.stat.ubc.ca`), `user` (`user0001`), and
 `password` (`abc123`) below are *not real*; you will not actually
 be able to connect to a database using this information.
 
+```{index} ibis; postgres, ibis; connect
+```
+
 ```python
 conn = ibis.postgres.connect(
     database = "can_mov_db",
@@ -836,6 +857,9 @@ that connecting to and working with a Postgres database is identical to
 connecting to and working with an SQLite database. For example, we can again use
 `list_tables` to find out what tables are in the `can_mov_db` database:
 
+```{index} ibis; list_tables
+```
+
 ```python
 conn.list_tables()
 ```
@@ -848,6 +872,9 @@ We see that there are 10 tables in this database. Let's first look at the
 `"ratings"` table to find the lowest rating that exists in the `can_mov_db`
 database.
 
+```{index} ibis; table
+```
+
 ```python
 ratings_table = conn.table("ratings")
 ratings_table
@@ -860,7 +887,7 @@ AlchemyTable: ratings
   num_votes       int64
 ```
 
-```{index} ibis; select
+```{index} ibis; []
 ```
 
 To find the lowest rating that exists in the data base, we first need to
@@ -882,7 +909,7 @@ Selection[r0]
     average_rating: r0.average_rating
 ```
 
-```{index} database; order_by, ibis; head, ibis; ibis
+```{index} database; ordering, ibis; order_by, ibis; head
 ```
 
 Next we use the `order_by` function from `ibis` order the table by `average_rating`,
@@ -929,7 +956,7 @@ Databases are beneficial in a large-scale setting:
 
 ## Writing data from Python to a `.csv` file
 
-```{index} write function; to_csv, pandas.DataFrame; to_csv
+```{index} write function; to_csv, DataFrame; to_csv
 ```
 
 At the middle and end of a data analysis, we often want to write a data frame
@@ -1309,6 +1336,9 @@ argument&mdash;the URL of the page to scrape&mdash;and will return a list of
 data frames corresponding to all the tables it finds at that URL. We can see
 below that `read_html` found 17 tables on the Wikipedia page for Canada.
 
+```{index} read function; read_html
+```
+
 ```python
 canada_wiki_tables = pd.read_html("https://en.wikipedia.org/wiki/Canada")
 len(canada_wiki_tables)
@@ -1356,16 +1386,16 @@ hope that it gives you enough of a basic idea that you can learn how to use
 another API if needed. In particular, in this book we will show you the basics
 of how to use the `requests` package in Python to access data from the NASA "Astronomy Picture
 of the Day" API (a great source of desktop backgrounds, by the way&mdash;take a look at the stunning
-picture of the Rho-Ophiuchi cloud complex in {numref}`fig:NASA-API-Rho-Ophiuchi` from July 13, 2023!).
+picture of the Rho-Ophiuchi cloud complex {cite:p}`rhoophiuchi` in {numref}`fig:NASA-API-Rho-Ophiuchi` from July 13, 2023!).
 
-```{index} API; requests, NASA, API; token; key
+```{index} requests, NASA, API; token
 ```
 
 ```{figure} img/reading/NASA-API-Rho-Ophiuchi.png
 :name: fig:NASA-API-Rho-Ophiuchi
 :width: 400px
 
-The James Webb Space Telescope's NIRCam image of the Rho Ophiuchi molecular cloud complex {cite:p}`rhoophiuchi`.
+The James Webb Space Telescope's NIRCam image of the Rho Ophiuchi molecular cloud complex.
 ```
 
 +++
@@ -1411,6 +1441,9 @@ That should be more than enough for our purposes in this section.
 
 #### Accessing the NASA API
 
+```{index} API; HTTP, API; query parameters, API; endpoint
+```
+
 The NASA API is what is known as an *HTTP API*: this is a particularly common
 kind of API, where you can obtain data simply by accessing a
 particular URL as if it were a regular website.  To make a query to the NASA
@@ -1459,6 +1492,12 @@ disks.","hdurl":"https://apod.nasa.gov/apod/image/2307/STScI-01_RhoOph.png",
 Rho Ophiuchi","url":"https://apod.nasa.gov/apod/image/2307/STScI-01_RhoOph1024.png"}
 ```
 
+```{index} see: JavaScript Object Notation; JSON
+```
+
+```{index} JSON, requests; get, requests; json
+```
+
 Neat! There is definitely some data there, but it's a bit hard to
 see what it all is. As it turns out, this is a common format for data called
 *JSON* (JavaScript Object Notation). We won't encounter this kind of data much in this book,
diff --git a/source/regression1.md b/source/regression1.md
index d7b23af3..028b21a4 100755
--- a/source/regression1.md
+++ b/source/regression1.md
@@ -81,6 +81,9 @@ numerical, and so predicting them given past data is considered a regression pro
 ```{index} classification; comparison to regression
 ```
 
+```{index} regression; comparison to classification
+```
+
 Just like in the classification setting, there are many possible methods that we can use
 to predict numerical response variables. In this chapter we will
 focus on the **K-nearest neighbors** algorithm {cite:p}`knnfix,knncover`, and in the next chapter
@@ -136,6 +139,9 @@ is fair, or perhaps how to set the price of a new listing.
 We begin the analysis by loading and examining the data,
 as well as setting the seed value.
 
+```{index} seed;numpy.random.seed
+```
+
 ```{code-cell} ipython3
 import altair as alt
 import numpy as np
@@ -214,7 +220,7 @@ predict the former.
 
 ## K-nearest neighbors regression
 
-```{index} K-nearest neighbors; regression
+```{index} K-nearest neighbors, K-nearest neighbors; regression
 ```
 
 Much like in the case of classification,
@@ -227,7 +233,7 @@ how well it predicts house sale price. This subsample is taken to allow us to
 illustrate the mechanics of K-NN regression with a few data points; later in
 this chapter we will use all the data.
 
-```{index} pandas.DataFrame; sample
+```{index} DataFrame; sample
 ```
 
 To take a small random sample of size 30, we'll use the
@@ -281,7 +287,7 @@ Scatter plot of price (USD) versus house size (square feet) with vertical line i
 
 +++
 
-```{index} pandas.DataFrame; assign, pandas.DataFrame; head, pandas.DataFrame; sort_values, abs
+```{index} DataFrame; abs, DataFrame; nsmallest
 ```
 
 We will employ the same intuition from {numref}`Chapters %s <classification1>` and {numref}`%s <classification2>`, and use the
@@ -291,9 +297,6 @@ For the example shown in {numref}`fig:07-small-eda-regr`,
 we find and label the 5 nearest neighbors to our observation
 of a house that is 2,000 square feet.
 
-```{index} nsmallest
-```
-
 ```{code-cell} ipython3
 small_sacramento["dist"] = (2000 - small_sacramento["sqft"]).abs()
 nearest_neighbors = small_sacramento.nsmallest(5, "dist")
@@ -303,7 +306,6 @@ nearest_neighbors
 ```{code-cell} ipython3
 :tags: [remove-cell]
 
-
 nn_plot = small_plot + rule
 
 # plot horizontal lines which is perpendicular to x=2000
@@ -389,7 +391,7 @@ about what the data must look like for it to work.
 
 ## Training, evaluating, and tuning the model
 
-```{index} training data, test data
+```{index} training set, test set
 ```
 
 As usual, we must start by putting some test data away in a lock box
@@ -538,7 +540,7 @@ training or testing data. But many people just use RMSE for both,
 and rely on context to denote which data the root mean squared error is being calculated on.
 ```
 
-```{index} scikit-learn, scikit-learn; pipeline, scikit-learn; make_pipeline, scikit-learn; make_column_transformer
+```{index} scikit-learn, scikit-learn; Pipeline, scikit-learn; make_pipeline, scikit-learn; make_column_transformer
 ```
 
 Now that we know how we can assess how well our model predicts a numerical
diff --git a/source/regression2.md b/source/regression2.md
index f7245fdf..0ed649e1 100755
--- a/source/regression2.md
+++ b/source/regression2.md
@@ -150,6 +150,9 @@ Scatter plot of sale price versus size with line of best fit for subset of the S
 ```{index} straight line; equation
 ```
 
+```{index} see: line; straight line
+```
+
 The equation for the straight line is:
 
 $$\text{house sale price} = \beta_0 + \beta_1 \cdot (\text{house size}),$$
@@ -348,7 +351,7 @@ Below we illustrate how we can use the usual `scikit-learn` workflow to predict
 price given house size. We use a simple linear regression approach on the full
 Sacramento real estate data set.
 
-```{index} scikit-learn; random_state
+```{index} seed; numpy.random.seed
 ```
 
 As usual, we start by loading packages, setting the seed, loading data, and
@@ -731,6 +734,13 @@ mlm.fit(
 ```
 Finally, we make predictions on the test data set to assess the quality of our model.
 
+```{index} scikit-learn;predict, scikit-learn;mean_squared_error
+```
+```{index} see: mean_squared_error;scikit-learn
+```
+```{index} see: predict;scikit-learn
+```
+
 ```{code-cell} ipython3
 sacramento_test["predicted"] = mlm.predict(sacramento_test[["sqft","beds"]])
 
@@ -1059,10 +1069,7 @@ Scatter plot of the full data, with outlier highlighted in red.
 
 ### Multicollinearity
 
-```{index} colinear
-```
-
-```{index} see: multicolinear; colinear
+```{index} multicollinearity
 ```
 
 The second, and much more subtle, issue can occur when performing multivariable
diff --git a/source/setup.md b/source/setup.md
index a540198d..e9bf4b31 100755
--- a/source/setup.md
+++ b/source/setup.md
@@ -66,7 +66,7 @@ exactly right! To keep things simple, we instead recommend that you install
 [Docker](https://docker.com). Docker lets you run your Jupyter notebooks inside
 a pre-built *container* that comes with precisely the right versions of
 all software packages needed run the worksheets that come with this book.
-```{index} Docker
+```{index} Docker, container
 ```
 
 ```{note}
@@ -85,6 +85,8 @@ installed on your computer&mdash;or even if you haven't installed Python at all!
 visit [the online Docker documentation](https://docs.docker.com/desktop/install/windows-install/),
 and download the `Docker Desktop Installer.exe` file. Double-click the file to open the installer
 and follow the instructions on the installation wizard, choosing **WSL-2** instead of **Hyper-V** when prompted.
+```{index} Docker;installation
+```
 
 ```{note}
 Occasionally, when you first run Docker on Windows, you will encounter an error message. Some common errors you may see:
@@ -99,6 +101,8 @@ Occasionally, when you first run Docker on Windows, you will encounter an error
   to help you with this, as editing the BIOS can be dangerous. Detailed instructions for doing this are beyond the scope of this book.
 ```
 
+```{index} Docker;image, Docker;tag
+```
 **Running JupyterLab** Run Docker Desktop. Once it is running, you need to download and run the
 Docker *image* that we have made available for the worksheets (an *image* is like a "snapshot" of a
 computer with all the right packages pre-installed). You only need to do this step one time; the image will remain
diff --git a/source/version-control.md b/source/version-control.md
index a553efdf..4a3b3d78 100755
--- a/source/version-control.md
+++ b/source/version-control.md
@@ -149,6 +149,9 @@ want to use one for your project.
 ```{index} repository, repository;local, repository;remote
 ```
 
+```{index} see: repository; version control
+```
+
 Typically, when we put a data analysis project under version control,
 we create two copies of the repository ({numref}`vc1-no-changes`).
 One copy we use as our primary workspace where we create, edit, and delete files.
@@ -197,8 +200,6 @@ one for each commit: `Created README.md` and `Added analysis draft`.
 ```{index} hash
 ```
 
-
-
 The hash is a string of characters consisting of about 40 letters and numbers.
 The purpose of the hash is to serve as a unique identifier for the commit,
 and is used by Git to index project history. Although hashes are quite long&mdash;imagine
@@ -233,11 +234,14 @@ name: vc2-changes
 Local repository with changes to files.
 ```
 
-```{index} git;add, staging area
+```{index} git;add, staging area, git;commit
+```
+
+```{index} see: staging area; git
 ```
 
 Once you reach a point that you want Git to keep a record
-of the current version of your work, you need to commit
+of the current version of your work, you need to **commit**
 (i.e., snapshot) your changes. A prerequisite to this is telling Git which
 files should be included in that snapshot. We call this step **adding** the
 files to the **staging area**.
@@ -256,8 +260,6 @@ name: vc-ba2-add
 Adding modified files to the staging area in the local repository.
 ```
 
-
-
 Once the files we wish to commit have been added
 to the staging area, we can then commit those files to the repository history ({numref}`vc-ba3-commit`).
 When we do this, we are required to include a helpful *commit message* to tell
@@ -282,8 +284,6 @@ Committing the modified files in the staging area to the local repository histor
 ```{index} git;push
 ```
 
-
-
 Once you have made one or more commits that you want to share with your collaborators,
 you need to **push** (i.e., send) those commits back to GitHub ({numref}`vc5-push`). This updates
 the history in the remote repository (i.e., GitHub) to match what you have in your
@@ -330,15 +330,11 @@ name: vc7-pull
 Pulling changes from the remote GitHub repository to synchronize your local repository.
 ```
 
-
-
 ## Working with remote repositories using GitHub
 
 ```{index} repository;remote, GitHub, git;clone
 ```
 
-
-
 Now that you have been introduced to some of the key general concepts
 and workflows of Git version control, we will walk through the practical steps.
 There are several different ways to start using version control
@@ -368,10 +364,9 @@ name: new-repository-01
 New repositories on GitHub can be created by clicking on "New Repository" from the + menu.
 ```
 
-```{index} repository;public
+```{index} repository;public, repository;private
 ```
 
-
 Repositories can be set up with a variety of configurations, including a name,
 optional description,  and the inclusion (or not) of several template files.
 One of the most important configuration items to choose is the visibility to the outside world,
@@ -394,8 +389,6 @@ name: new-repository-02
 Repository configuration for a project that is public and initialized with a README.md template file.
 ```
 
-
-
 A newly created public repository with a `README.md` template file should look something
 like what is shown in {numref}`new-repository-03`.
 
@@ -406,8 +399,6 @@ name: new-repository-03
 Respository configuration for a project that is public and initialized with a README.md template file.
 ```
 
-
-
 +++
 
 ### Editing files on GitHub with the pen tool
@@ -437,8 +428,6 @@ The text box where edits can be made after clicking on the pen tool.
 ```{index} GitHub; commit
 ```
 
-
-
 After you are done with your edits, they can be "saved" by *committing* your
 changes. When you *commit a file* in a repository, the version control system
 takes a snapshot of what the file looks like. As you continue working on the
@@ -470,8 +459,6 @@ Saving changes using the pen tool requires committing those changes, and an asso
 ```{index} GitHub; add file
 ```
 
-
-
 The "Add file" menu can be used to create new plain text files and upload files
 from your computer. To create a new plain text file, click the "Add file"
 drop-down menu and select the "Create new file" option
@@ -487,8 +474,6 @@ New plain text files can be created directly on GitHub.
 ```{index} markdown
 ```
 
-
-
 A page will open with a small text box for the file name to be entered, and a
 larger text box where the desired file content text can be entered. Note the two
 tabs, "Edit new file" and "Preview". Toggling between them lets you enter and
@@ -573,8 +558,6 @@ to learn how to use Jupyter before reading this chapter.
 ```{index} GitHub; personal access token
 ```
 
-
-
 To send and retrieve work between your local repository
 and the remote repository on GitHub,
 you will frequently need to authenticate with GitHub
@@ -641,18 +624,11 @@ name: generate-pat-03
 Display of the newly generated personal access token.
 ```
 
-
-
 ### Cloning a repository using Jupyter
 
-<!--Now that we have everything we need for authentication,
-the next step is -->
-
 ```{index} git;clone
 ```
 
-
-
 *Cloning* a remote repository from GitHub
 to create a local repository results in a
 copy that knows where it was obtained from so that it knows where to send/receive
@@ -758,8 +734,6 @@ Adding `eda.ipynb` makes it visible in the staging area.
 ```{index} git;commit
 ```
 
-
-
 To snapshot the changes with an associated commit message,
 you must put a message in the text box at the bottom of the Git pane
 and click on the blue "Commit" button ({numref}`git-commit-01`).
@@ -779,12 +753,10 @@ name: git-commit-01
 A commit message must be added into the Jupyter Git extension commit text box before the blue Commit button can be used to record the commit.
 ```
 
-
 After "committing" the file(s), you will see there are 0 "Staged" files.
 You are now ready to push your changes
 to the remote repository on GitHub ({numref}`git-commit-03`).
 
-
 ```{figure} img/version-control/git_commit_03.png
 ---
 name: git-commit-03
@@ -792,15 +764,11 @@ name: git-commit-03
 After recording a commit, the staging area should be empty.
 ```
 
-
-
 ### Pushing the commits to GitHub
 
 ```{index} git;push
 ```
 
-
-
 To send the committed changes back to the remote repository on
 GitHub, you need to *push* them. To do this,
 click on the cloud icon with the up arrow on the Jupyter Git tab
@@ -813,7 +781,6 @@ name: git-push-01
 The Jupyter Git extension "push" button (circled in red).
 ```
 
-
 You will then be prompted to enter your GitHub username
 and the personal access token that you generated
 earlier (not your account password!). Click
@@ -826,7 +793,6 @@ name: git-push-02
 Enter your Git credentials to authorize the push to the remote repository.
 ```
 
-
 If the files were successfully pushed to the project repository on
 GitHub, you will be shown a success message ({numref}`git-push-03`).
 Click "Dismiss" to continue working in Jupyter.
@@ -838,7 +804,6 @@ name: git-push-03
 The prompt that the push was successful.
 ```
 
-
 If you visit the remote repository on GitHub,
 you will see that the changes now exist there too
 ({numref}`git-push-04`)!
@@ -850,7 +815,6 @@ name: git-push-04
 The GitHub web interface shows a preview of the commit message, and the time of the most recently pushed commit for each file.
 ```
 
-
 ## Collaboration
 
 ### Giving collaborators access to your project
@@ -858,8 +822,6 @@ The GitHub web interface shows a preview of the commit message, and the time of
 ```{index} GitHub; collaborator access
 ```
 
-
-
 As mentioned earlier, GitHub allows you to control who has access to your
 project. The default of both public and private projects are that only the
 person who created the GitHub repository has permissions to create, edit and
@@ -988,7 +950,6 @@ name: merge-conflict-01
 Error message that indicates that there are changes on the remote repository that you do not have locally.
 ```
 
-
 Usually, getting out of this situation is not too troublesome. First you need
 to pull the changes that exist on GitHub that you do not yet have in the local
 repository.  Usually when this happens, Git can automatically merge the changes
@@ -1010,15 +971,11 @@ same line of the same file and that Git will not be able to automatically merge
 the changes.
 ```
 
-
-
 ### Handling merge conflicts
 
 ```{index} git;merge conflict
 ```
 
-
-
 To fix the merge conflict, you need to open the offending file
 in a plain text editor and look for special marks that Git puts in the file to
 tell you where the merge conflict occurred ({numref}`merge-conflict-04`).
diff --git a/source/viz.md b/source/viz.md
index 1f56039d..35867bbe 100755
--- a/source/viz.md
+++ b/source/viz.md
@@ -241,6 +241,9 @@ The `ppm` column holds the value of CO$_{\text{2}}$ in parts per million
 that was measured on each date, and is type `float64`; this is the usual
 type for decimal numbers.
 
+```{index} dates and times
+```
+
 ```{note}
 `read_csv` was able to parse the `date_measured` column into the
 `datetime` vector type because it was entered
@@ -267,7 +270,7 @@ and the CO$_{\text{2}}$ concentration as the `y` coordinate.
 We create a chart with the `alt.Chart()` function.
 There are a few basic aspects of a plot that we need to specify:
 
-```{index} altair; graphical mark, altair; encoding channel
+```{index} altair; graphical mark, altair; encoding channel, altair; mark_point
 ```
 
 - The name of the **data frame** to visualize.
@@ -649,9 +652,6 @@ glue("can_lang_plot", can_lang_plot, display=False)
 Scatter plot of number of Canadians reporting a language as their mother tongue vs the primary language at home
 :::
 
-```{index} escape character
-```
-
 To make an initial improvement in the interpretability
 of {numref}`can_lang_plot`, we should
 replace the default axis
@@ -663,6 +663,9 @@ where each string in the list will correspond to a new line of text.
 We can also increase the font size to further
 improve readability.
 
+```{index} altair; multiline labels
+```
+
 ```{code-cell} ipython3
 can_lang_plot_labels = alt.Chart(can_lang).mark_circle().encode(
     x=alt.X("most_at_home").title(
@@ -687,8 +690,6 @@ Scatter plot of number of Canadians reporting a language as their mother tongue
 :::
 
 
-
-
 ```{code-cell} ipython3
 :tags: ["remove-cell"]
 import numpy as np
@@ -717,7 +718,7 @@ in the magnitude of these two numbers!
 We can confirm that the two points in the upper right-hand corner correspond
 to Canada's two official languages by filtering the data:
 
-```{index} pandas.DataFrame; loc[]
+```{index} DataFrame; loc[]
 ```
 
 ```{code-cell} ipython3
@@ -785,6 +786,9 @@ To fix these issue,
 we can limit the number of ticks and gridlines to only include the seven major ones,
 and change the number formatting to include a suffix which makes the labels shorter.
 
+```{index} altair; tick count, altair; tick formatting
+```
+
 ```{code-cell} ipython3
 can_lang_plot_log_revised = alt.Chart(can_lang).mark_circle().encode(
     x=alt.X("most_at_home")
@@ -844,7 +848,7 @@ using `_` so that it is easier to read;
 this does not affect how Python interprets the number
 and is just added for readability.
 
-```{index} pandas.DataFrame; assign, pandas.DataFrame; [[]]
+```{index} DataFrame; column assignment, DataFrame; []
 ```
 
 ```{code-cell} ipython3
@@ -898,21 +902,21 @@ To fully answer the question, we need to use
  {numref}`can_lang_plot_percent`
 to assess a few key characteristics of the data:
 
-```{index} relationship; positive negative none
+```{index} relationship; positive, relationship; negative, relationship; none
 ```
 
 - **Direction:** if the y variable tends to increase when the x variable increases, then y has a **positive** relationship with x. If
   y tends to decrease when x increases, then y has a **negative** relationship with x. If y does not meaningfully increase or decrease
   as x increases, then y has **little or no** relationship with x.
 
-```{index} relationship; strong weak
+```{index} relationship; strong, relationship; weak
 ```
 
 - **Strength:** if the y variable *reliably* increases, decreases, or stays flat as x increases,
   then the relationship is **strong**. Otherwise, the relationship is **weak**. Intuitively,
   the relationship is strong when the scatter points are close together and look more like a "line" or "curve" than a "cloud."
 
-```{index} relationship; linear nonlinear
+```{index} relationship; linear, relationship; nonlinear
 ```
 
 - **Shape:** if you can draw a straight line roughly through the data points, the relationship is **linear**. Otherwise, it is **nonlinear**.
@@ -985,6 +989,9 @@ and specify that we want it on the top of the chart.
 This automatically changes the legend items to be laid out horizontally instead of vertically,
 but we could also keep the vertical layout by specifying `direction="vertical"` inside `alt.Legend`.
 
+```{index} altair; alt.Legend
+```
+
 ```{code-cell} ipython3
 can_lang_plot_legend = alt.Chart(can_lang).mark_circle().encode(
     x=alt.X("most_at_home_percent")
@@ -1014,6 +1021,9 @@ glue("can_lang_plot_legend", can_lang_plot_legend.properties(height=320, width=4
 Scatter plot of percentage of Canadians reporting a language as their mother tongue vs the primary language at home colored by language category with the legend edited.
 :::
 
+```{index} color palette, color blindness simulator
+```
+
 In {numref}`can_lang_plot_legend`, the points are colored with
 the default `altair` color scheme, which is called `"tableau10"`. This is an appropriate choice for most situations and is also easy to read for people with reduced color vision.
 In general, the color schemes that are used by default in Altair are adapted to the type of data that is displayed and selected to be easy to interpret both for people with good and reduced color vision.
@@ -1021,9 +1031,6 @@ If you are unsure about a certain color combination, you can use
 this [color blindness simulator](https://www.color-blindness.com/coblis-color-blindness-simulator/) to check
 if your visualizations are color-blind friendly.
 
-```{index} color palette; color blindness simulator
-```
-
 All the available color schemes and information on how to create your own can be viewed [in the Altair documentation](https://altair-viz.github.io/user_guide/customization.html#customizing-colors).
 To change the color scheme of our chart,
 we can add the `scheme` argument in the `scale` of the `color` encoding.
@@ -1048,7 +1055,7 @@ can_lang_plot_theme = alt.Chart(can_lang).mark_point(filled=True).encode(
     y=alt.Y("mother_tongue_percent")
         .scale(type="log")
         .axis(tickCount=7)
-        .title("Mother tongue (percentage of Canadian residents)"),
+        .title(["Mother tongue", "(percentage of Canadian residents)"]),
     color=alt.Color("category")
         .legend(orient="top")
         .title("")
@@ -1081,6 +1088,9 @@ via the `Tooltip` encoding channel,
 so that text labels for each point show up once we hover over it with the mouse pointer.
 Here we also add the exact values of the variables on the x and y-axis to the tooltip.
 
+```{index} altair; alt.Tooltip
+```
+
 ```{code-cell} ipython3
 can_lang_plot_tooltip = alt.Chart(can_lang).mark_point(filled=True).encode(
     x=alt.X("most_at_home_percent")
@@ -1090,7 +1100,7 @@ can_lang_plot_tooltip = alt.Chart(can_lang).mark_point(filled=True).encode(
     y=alt.Y("mother_tongue_percent")
         .scale(type="log")
         .axis(tickCount=7)
-        .title("Mother tongue (percentage of Canadian residents)"),
+        .title(["Mother tongue", "(percentage of Canadian residents)"]),
     color=alt.Color("category")
         .legend(orient="top")
         .title("")
@@ -1218,7 +1228,7 @@ as `sort_values` followed by `head`, but are slightly more efficient because the
 In general, it is good to use more specialized functions when they are available!
 ```
 
-```{index} pandas.DataFrame; nlargest; nsmallest
+```{index} DataFrame; nlargest, DataFrame; nsmallest
 ```
 
 ```{code-cell} ipython3
@@ -1338,7 +1348,10 @@ morley_df = pd.read_csv("data/morley.csv")
 morley_df
 ```
 
-```{index} distribution, altair; histogram
+```{index} distribution, altair; histogram, altair; count
+```
+
+```{index} see: count; altair
 ```
 
 In this experimental data,
@@ -1416,7 +1429,7 @@ Histogram of Michelson's speed of light data.
 
 #### Adding layers to an `altair` chart
 
-```{index} altair; +; mark_rule
+```{index} altair; +, altair; mark_rule, altair; layers
 ```
 
 {numref}`morley_hist` is a great start.
@@ -1696,6 +1709,9 @@ When you create a histogram in `altair`, it tries to choose a reasonable number
 We can change the number of bins by using the `maxbins` parameter
 inside the `bin` method.
 
+```{index} altair; maxbins
+```
+
 ```{code-cell} ipython3
 morley_hist_maxbins = alt.Chart(morley_df).mark_bar().encode(
     x=alt.X("RelativeError").bin(maxbins=30),
@@ -1950,7 +1966,7 @@ bad, while raster images eventually start to look "pixelated."
 ```{index} PDF
 ```
 
-```{index} see: portable document dormat; PDF
+```{index} see: portable document format; PDF
 ```
 
 ```{note}
diff --git a/source/wrangling.md b/source/wrangling.md
index 2c400af3..4cd6d36e 100755
--- a/source/wrangling.md
+++ b/source/wrangling.md
@@ -72,7 +72,10 @@ This knowledge will be helpful in effectively utilizing these objects in our dat
 ```{index} data frame; definition
 ```
 
-```{index} pandas.DataFrame
+```{index} see: data frame; DataFrame
+```
+
+```{index} DataFrame
 ```
 
 A data frame is a table-like structure for storing data in Python. Data frames are
@@ -109,7 +112,7 @@ A data frame storing data regarding the population of various regions in Canada.
 
 ### What is a series?
 
-```{index} pandas.Series
+```{index} Series 
 ```
 
 In Python, `pandas` **series** are are objects that can contain one or more elements (like a list).
@@ -117,10 +120,8 @@ They are a single column, are ordered, can be indexed, and can contain any data
 The `pandas` package uses `Series` objects to represent the columns in a data frame.
 `Series` can contain a mix of data types, but it is good practice to only include a single type in a series
 because all observations of one variable should be the same type.
-Python
-has several different basic data types, as shown in
-{numref}`tab:datatype-table`.
-You can create a `pandas` series using the
+Python has several different basic data types, as shown in
+{numref}`tab:datatype-table`. You can create a `pandas` series using the
 `pd.Series()` function.  For example, to create the series `region` as shown
 in {numref}`fig:02-series`, you can write the following.
 
@@ -140,39 +141,29 @@ region
 Example of a `pandas` series whose type is string.
 ```
 
-
-```{code-cell} ipython3
-:tags: [remove-cell]
-
-# The following table was taken from DSCI511 Lecture 1, credit to Arman Seyed-Ahmadi, MDS 2021
-```
-
-```{index} data types, string, integer, floating point number, boolean, list, set, dictionary, tuple, none
-```
-
-```{index} see: str; string
+```{index} data types; string (str), data types; integer (int), data types; floating point number (float), data types; boolean (bool), data types; NoneType (none)
 ```
 
-```{index} see: int; integer
+```{index} see: str; data types
 ```
 
-```{index} see: float; floating point number
+```{index} see: int; data types
 ```
 
-```{index} see: bool; boolean
+```{index} see: float; data types
 ```
 
-```{index} see: NoneType; none
+```{index} see: bool; data types
 ```
 
-```{index} see: dict; dictionary
+```{index} see: NoneType; data types
 ```
 
 ```{table} Basic data types in Python
 :name: tab:datatype-table
 | Data type             | Abbreviation | Description                                   | Example                                    |
 | :-------------------- | :----------- | :-------------------------------------------- | :----------------------------------------- |
-| integer               | `int`        | positive/negative/zero whole numbers               | `42`                                       |
+| integer               | `int`        | positive/negative/zero whole numbers          | `42`                                       |
 | floating point number | `float`      | real number in decimal form                   | `3.14159`                                  |
 | boolean               | `bool`       | true or false                                 | `True`                                     |
 | string                | `str`        | text                                          | `"Hello World"`                            |
@@ -249,6 +240,12 @@ to both `DataFrames` and `Series` as "data frames" in the text.
 There are other types that represent data structures in Python.
 We summarize the most common ones in {numref}`tab:datastruc-table`.
 
+```{index} data structures; list, data structures; set, data structures; dictionary (dict), data structures; tuple
+```
+
+```{index} see: dict; data structures
+```
+
 ```{table} Basic data structures in Python
 :name: tab:datastruc-table
 | Data Structure | Description |
@@ -378,7 +375,7 @@ represented as individual columns to make the data tidy.
 
 ### Tidying up: going from wide to long using `melt`
 
-```{index} pandas.DataFrame; melt
+```{index} DataFrame; melt
 ```
 
 One task that is commonly performed to get data into a tidy format
@@ -548,7 +545,7 @@ been met:
 (pivot-wider)=
 ### Tidying up: going from long to wide using `pivot`
 
-```{index} pandas.DataFrame; pivot
+```{index} DataFrame; pivot
 ```
 
 Suppose we have observations spread across multiple rows rather than in a single
@@ -654,6 +651,9 @@ lang_home_tidy.columns = [
 lang_home_tidy
 ```
 
+```{index} DataFrame; reset_index
+```
+
 In the first step, note that we added a call to `reset_index`. When `pivot` is called with
 multiple column names passed to the `index`, those entries become the "name" of each row that
 would be used when you filter rows with `[]` or `loc` rather than just simple numbers. This
@@ -665,6 +665,9 @@ The second operation we applied is to rename the columns. When we perform the `p
 operation, it keeps the original column name `"count"` and adds the `"type"` as a second column name.
 Having two names for a column can be confusing! So we rename giving each column only one name.
 
+```{index} DataFrame; info
+```
+
 We can print out some useful information about our data frame using the `info` function.
 In the first row it tells us the `type` of `lang_home_tidy` (it is a `pandas` `DataFrame`). The second
 row tells us how many rows there are: 1070, and to index those rows, you can use numbers between
@@ -697,16 +700,19 @@ more columns, and we would see the data set "widen."
 +++
 
 (str-split)=
-### Tidying up: using `str.split` to deal with multiple delimiters
+### Tidying up: using `str.split` to deal with multiple separators
+
+```{index} Series; str.split, separator
+```
 
-```{index} pandas.Series; str.split, delimiter
+```{index} see: delimiter; separator
 ```
 
 Data are also not considered tidy when multiple values are stored in the same
 cell. The data set we show below is even messier than the ones we dealt with
 above: the `Toronto`, `Montréal`, `Vancouver`, `Calgary` and `Edmonton` columns
 contain the number of Canadians reporting their primary language at home and
-work in one column separated by the delimiter (`/`). The column names are the
+work in one column separated by the separator (`/`). The column names are the
 values of a variable, *and* each value does not have its own cell! To turn this
 messy data into tidy data, we'll have to fix these issues.
 
@@ -786,7 +792,7 @@ tidy_lang.info()
 Object columns in `pandas` data frames are columns of strings or columns with
 mixed types. In the previous example in {numref}`pivot-wider`, the
 `most_at_home` and `most_at_work` variables were `int64` (integer), which is a type of numeric data.
-This change is due to the delimiter (`/`) when we read in this messy data set.
+This change is due to the separator (`/`) when we read in this messy data set.
 Python read these columns in as string types, and by default, `str.split` will
 return columns with the `object` data type.
 
@@ -828,6 +834,12 @@ This section will highlight more advanced usage of the `[]` function,
 including an in-depth treatment of the variety of logical statements
 one can use in the `[]` to select subsets of rows.
 
+```{index} DataFrame; [], logical statement
+```
+
+```{index} see: logical statement; logical operator
+```
+
 +++
 
 ### Extracting columns by name
@@ -867,6 +879,13 @@ tidy_lang["language"]
 
 
 ### Extracting rows that have a certain value with `==`
+
+```{index} logical operator; equivalency (==) 
+```
+
+```{index} see: ==; logical operator
+```
+
 Suppose we are only interested in the subset of rows in `tidy_lang` corresponding to the
 official languages of Canada (English and French).
 We can extract these rows by using the *equivalency operator* (`==`)
@@ -886,6 +905,12 @@ official_langs
 
 ### Extracting rows that do not have a certain value with `!=`
 
+```{index} logical operator; inequivalency (!=) 
+```
+
+```{index} see: !=; logical operator
+```
+
 What if we want all the other language categories in the data set *except* for
 those in the `"Official languages"` category? We can accomplish this with the `!=`
 operator, which means "not equal to". So if we want to find all the rows
@@ -900,6 +925,12 @@ tidy_lang[tidy_lang["category"] != "Official languages"]
 (filter-and)=
 ### Extracting rows satisfying multiple conditions using `&`
 
+```{index} logical operator; and (&)
+```
+
+```{index} see: &; logical operator
+```
+
 Suppose now we want to look at only the rows
 for the French language in Montréal.
 To do this, we need to filter the data set
@@ -921,6 +952,12 @@ tidy_lang[
 
 ### Extracting rows satisfying at least one condition using `|`
 
+```{index} logical operator; or (|)
+```
+
+```{index} see: |; logical operator
+```
+
 Suppose we were interested in only those rows corresponding to cities in Alberta
 in the `official_langs` data set (Edmonton and Calgary).
 We can't use `&` as we did above because `region`
@@ -940,6 +977,12 @@ official_langs[
 
 ### Extracting rows with values in a list using `isin`
 
+```{index} logical operator; containment (isin) 
+```
+
+```{index} see: isin; logical operator
+```
+
 Next, suppose we want to see the populations of our five cities.
 Let's read in the `region_data.csv` file
 that comes from the 2016 Canadian census,
@@ -987,6 +1030,21 @@ pd.Series(["Vancouver", "Toronto"]).isin(pd.Series(["Toronto", "Vancouver"]))
 
 ### Extracting rows above or below a threshold using `>` and `<`
 
+```{index} logical operator; greater than (> and >=), logical operator; less than (< and <=)
+```
+
+```{index} see: >; logical operator
+```
+
+```{index} see: >=; logical operator
+```
+
+```{index} see: <; logical operator
+```
+
+```{index} see: <=; logical operator
+```
+
 ```{code-cell} ipython3
 :tags: [remove-cell]
 
@@ -1017,6 +1075,9 @@ than French in Montréal according to the 2016 Canadian census.
 
 ### Extracting rows using `query`
 
+```{index} logical statement; query
+```
+
 You can also extract rows above, below, equal or not-equal to a threshold using the
 `query` method. For example the following gives us the same result as when we used
 `official_langs[official_langs["most_at_home"] > 2669195]`.
@@ -1032,7 +1093,7 @@ to make long chains of filtering operations a bit easier to read.
 (loc-iloc)=
 ## Using `loc[]` to filter rows and select columns
 
-```{index} pandas.DataFrame; loc[]
+```{index} DataFrame; loc[]
 ```
 
 The `[]` operation is only used when you want to either filter rows **or** select columns;
@@ -1111,7 +1172,7 @@ corresponding to the column names that start with the desired characters.
 tidy_lang.loc[:, tidy_lang.columns.str.startswith("most")]
 ```
 
-```{index} pandas.Series; str.contains
+```{index} Series; str.contains
 ```
 
 We could also have chosen the columns containing an underscore `_` by using the
@@ -1123,7 +1184,7 @@ tidy_lang.loc[:, tidy_lang.columns.str.contains("_")]
 ```
 
 ## Using `iloc[]` to extract rows and columns by position
-```{index} pandas.DataFrame; iloc[], column range
+```{index} DataFrame; iloc[], column range
 ```
 Another approach for selecting rows and columns is to use `iloc[]`,
 which provides the ability to index with the position rather than the label of the columns.
@@ -1158,7 +1219,7 @@ accidentally put in the wrong integer index! If you did not correctly remember
 that the `language` column was index `1`, and used `2` instead, your code
 might end up having a bug that is quite hard to track down.
 
-```{index} pandas.Series; str.startswith
+```{index} Series; str.startswith
 ```
 
 +++ {"tags": []}
@@ -1203,6 +1264,9 @@ region_lang = pd.read_csv("data/region_lang.csv")
 region_lang
 ```
 
+```{index} Series; min, Series; max
+```
+
 We use `.min` to calculate the minimum
 and `.max` to calculate maximum number of Canadians
 reporting a particular language as their primary language at home,
@@ -1230,6 +1294,9 @@ total number of people in the survey, we could use the `sum` summary statistic m
 region_lang["most_at_home"].sum()
 ```
 
+```{index} Series; sum, Series; mean, Series; median, Series; std, summary statistic
+```
+
 Other handy summary statistics include the `mean`, `median` and `std` for
 computing the mean, median, and standard deviation of observations, respectively.
 We can also compute multiple statistics at once using `agg` to "aggregate" results.
@@ -1273,6 +1340,12 @@ summary statistics that you can compute with `pandas`.
 +++
 +++
 
+```{index} see: NaN; missing data
+```
+
+```{index} missing data
+```
+
 
 ```{note}
 In `pandas`, the value `NaN` is often used to denote missing data.
@@ -1329,7 +1402,7 @@ region_lang.loc[:, "mother_tongue":"lang_known"].agg(["mean", "std"])
 
 +++
 
-```{index} pandas.DataFrame; groupby
+```{index} DataFrame; groupby
 ```
 What happens if we want to know how languages vary by region? In this case,
 we need a new tool that lets us group rows by region. This can be achieved
@@ -1434,6 +1507,9 @@ region_lang.groupby("region")[["most_at_home", "most_at_work", "lang_known"]].ma
 To see how many observations there are in each group,
 we can use `value_counts`.
 
+```{index} DataFrame; value_counts
+```
+
 ```{code-cell} ipython3
 :tags: ["output_scroll"]
 region_lang.value_counts("region")
@@ -1476,11 +1552,14 @@ we can see that this would be the columns from `mother_tongue` to `lang_known`.
 region_lang
 ```
 
-```{index} pandas.DataFrame; apply, pandas.DataFrame; loc[]
+```{index} DataFrame; apply, DataFrame; loc[]
 ```
 
 We can simply call the `.astype` function to apply it across the desired range of columns.
 
+```{index} DataFrame; astype, Series; astype
+```
+
 ```{code-cell} ipython3
 region_lang_nums = region_lang.loc[:, "mother_tongue":"lang_known"].astype("int32")
 region_lang_nums.info()
@@ -1530,7 +1609,7 @@ you can use the more general [`apply`](https://pandas.pydata.org/docs/reference/
 ## Modifying and adding columns
 
 
-```{index} pandas.DataFrame; []
+```{index} DataFrame; [], column assignment, assign
 ```
 
 When we compute summary statistics or apply functions,
@@ -1666,6 +1745,10 @@ See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stab
 :tags: [remove-input]
 english_lang
 ```
+
+```{index} SettingWithCopyWarning
+```
+
 Wait a moment...what is that warning message? It seems to suggest that something went wrong, but
 if we inspect the `english_lang` data frame above, it looks like the city populations were added
 just fine! As it turns out, this is caused by the earlier filtering we did from `region_lang` to
@@ -1680,6 +1763,9 @@ For the rest of the book, we will silence that warning to help with readability.
 pd.options.mode.chained_assignment = None
 ```
 
+```{index} DataFrame; merge
+```
+
 ```{note}
 Inserting the data column `[4098927, 5928040, ...]` manually as we did above is generally very error-prone and is not recommended.
 We do it here to demonstrate another usage of `assign` and regular column assignment.
@@ -1714,6 +1800,9 @@ english_lang
 
 ## Using `merge` to combine data frames
 
+```{index} DataFrame; merge
+```
+
 Let's return to the situation right before we added the city populations
 of Toronto, Montréal, Vancouver, Calgary, and Edmonton to the `english_lang` data frame. Before adding the new column, we had filtered
 `region_lang` to create the `english_lang` data frame containing only English speakers in the five cities