diff --git a/examples/case_studies/LKJ.ipynb b/examples/howto/LKJ.ipynb
similarity index 99%
rename from examples/case_studies/LKJ.ipynb
rename to examples/howto/LKJ.ipynb
index 1c34dcb5b..e17a17300 100644
--- a/examples/case_studies/LKJ.ipynb
+++ b/examples/howto/LKJ.ipynb
@@ -161,7 +161,7 @@
     "\n",
     "The LKJ distribution provides a prior on the correlation matrix, $\\mathbf{C} = \\textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\\Sigma$. Since inverting $\\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\Sigma$, $\\Sigma = \\mathbf{L} \\mathbf{L}^{\\top}$, where $\\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\\mathbf{x} - \\mu)^{\\top} \\Sigma^{-1} (\\mathbf{x} - \\mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.\n",
     "\n",
-    "PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
+    "PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
     "\n",
     "In this example, we model the standard deviations with $\\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\\mathbf{C} \\sim \\textrm{LKJ}(\\eta = 2)$."
    ]
@@ -308,7 +308,7 @@
     "id": "QOCi1RKvr2Ph"
    },
    "source": [
-    "We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:"
+    "We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:"
    ]
   },
   {
diff --git a/examples/case_studies/LKJ.myst.md b/examples/howto/LKJ.myst.md
similarity index 92%
rename from examples/case_studies/LKJ.myst.md
rename to examples/howto/LKJ.myst.md
index 774907a62..2fc5d8dbe 100644
--- a/examples/case_studies/LKJ.myst.md
+++ b/examples/howto/LKJ.myst.md
@@ -101,7 +101,7 @@ $$f(\mathbf{x}\ |\ \mu, \Sigma^{-1}) = (2 \pi)^{-\frac{k}{2}} |\Sigma|^{-\frac{1
 
 The LKJ distribution provides a prior on the correlation matrix, $\mathbf{C} = \textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\Sigma$. Since inverting $\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\Sigma$, $\Sigma = \mathbf{L} \mathbf{L}^{\top}$, where $\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\mathbf{x} - \mu)^{\top} \Sigma^{-1} (\mathbf{x} - \mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.
 
-PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
+PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
 
 In this example, we model the standard deviations with $\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\mathbf{C} \sim \textrm{LKJ}(\eta = 2)$.
 
@@ -175,7 +175,7 @@ with model:
 
 +++ {"id": "QOCi1RKvr2Ph"}
 
-We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:
+We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:
 
 ```{code-cell} ipython3
 ---
diff --git a/examples/case_studies/longitudinal_models.ipynb b/examples/time_series/longitudinal_models.ipynb
similarity index 100%
rename from examples/case_studies/longitudinal_models.ipynb
rename to examples/time_series/longitudinal_models.ipynb
diff --git a/examples/case_studies/longitudinal_models.myst.md b/examples/time_series/longitudinal_models.myst.md
similarity index 100%
rename from examples/case_studies/longitudinal_models.myst.md
rename to examples/time_series/longitudinal_models.myst.md
diff --git a/sphinxext/thumbnail_extractor.py b/sphinxext/thumbnail_extractor.py
index f84ab8f90..d752139ef 100644
--- a/sphinxext/thumbnail_extractor.py
+++ b/sphinxext/thumbnail_extractor.py
@@ -105,17 +105,17 @@
     "introductory": "Introductory",
     "fundamentals": "Library Fundamentals",
     "howto": "How to",
-    "generalized_linear_models": "(Generalized) Linear and Hierarchical Linear Models",
+    "generalized_linear_models": "Generalized Linear Models",
     "case_studies": "Case Studies",
     "causal_inference": "Causal Inference",
-    "diagnostics_and_criticism": "Diagnostics and Model Criticism",
     "gaussian_processes": "Gaussian Processes",
+    "time_series": "Time Series",
+    "spatial": "Spatial Analysis",
+    "diagnostics_and_criticism": "Diagnostics and Model Criticism",
     "bart": "Bayesian Additive Regressive Trees",
     "mixture_models": "Mixture Models",
     "survival_analysis": "Survival Analysis",
-    "time_series": "Time Series",
-    "spatial": "Spatial Analysis",
-    "ode_models": "Inference in ODE models",
+    "ode_models": "ODE models",
     "samplers": "MCMC",
     "variational_inference": "Variational Inference",
 }