update gaussian_processes.rst (#6693)

daniel-saunders-phil · OriolAbril · web-flow · commit bee65b4cb462 · 2023-05-15T22:14:32.000+02:00
* moved and cleaned up GP guide

* added GP guide to core_notebooks index

* implement oriol's suggestions

* implement oriol's suggestions

* Update docs/source/learn/core_notebooks/Gaussian_Processes.rst

Co-authored-by: Oriol Abril-Pla &lt;oriol.abril.pla@gmail.com&gt;

* Update docs/source/learn/core_notebooks/Gaussian_Processes.rst

Co-authored-by: Oriol Abril-Pla &lt;oriol.abril.pla@gmail.com&gt;

---------

Co-authored-by: Oriol Abril-Pla &lt;oriol.abril.pla@gmail.com&gt;
diff --git a/docs/source/learn/core_notebooks/Gaussian_Processes.rst b/docs/source/learn/core_notebooks/Gaussian_Processes.rst
@@ -0,0 +1,265 @@
+.. _gp_guide:
+
+******************
+Gaussian Processes
+******************
+
+GP Basics
+=========
+
+Sometimes an unknown parameter or variable in a model is not a scalar value or
+a fixed-length vector, but a *function*.  A Gaussian process (GP) can be used
+as a prior probability distribution whose support is over the space of
+continuous functions.  A GP prior on the function :math:`f(x)` is usually written,
+
+.. math::
+
+  f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,.
+
+The function values are modeled as a draw from a multivariate normal
+distribution that is parameterized by the mean function, :math:`m(x)`, and the
+covariance function, :math:`k(x, x')`.  Gaussian processes are a convenient
+choice as priors over functions due to the marginalization and conditioning
+properties of the multivariate normal distribution.  Usually, the marginal
+distribution over :math:`f(x)` is evaluated during the inference step.  The
+conditional distribution is then used for predicting the function values
+:math:`f(x_*)` at new points, :math:`x_*`.
+
+The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate
+normal,
+
+.. math::
+
+  \begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim
+  \text{N}\left(
+    \begin{bmatrix} m(x)  \\ m(x_*)    \\ \end{bmatrix} \,,
+    \begin{bmatrix} k(x,x')    & k(x_*, x)    \\
+                    k(x_*, x) &  k(x_*, x_*')  \\ \end{bmatrix}
+          \right) \,.
+
+Starting from the joint distribution, one obtains the marginal distribution
+of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`.  The conditional
+distribution is
+
+.. math::
+
+  f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\,
+    k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,.
+
+.. note::
+
+  For more information on GPs, check out the book `Gaussian Processes for
+  Machine Learning <http://www.gaussianprocess.org/gpml/>`_ by Rasmussen &
+  Williams, or `this introduction <http://www.inference.org.uk/mackay/gpB.pdf>`_
+  by D. Mackay.
+
+PyMC is a great environment for working with fully Bayesian Gaussian Process
+models.  GPs in PyMC have a clear syntax and are highly composable, and many
+predefined covariance functions (or kernels), mean functions, and several GP
+implementations are included.  GPs are treated as distributions that can be
+used within larger or hierarchical models, not just as standalone regression
+models.
+
+Mean and covariance functions
+=============================
+
+Those who have used the GPy or GPflow Python packages will find the syntax for
+construction mean and covariance functions somewhat familiar.  When first
+instantiated, the mean and covariance functions are parameterized, but not
+given their inputs yet.  The covariance functions must additionally be provided
+with the number of dimensions of the input matrix, and a list that indexes
+which of those dimensions they are to operate on.  The reason for this design
+is so that covariance functions can be constructed that are combinations of
+other covariance functions.
+
+For example, to construct an exponentiated quadratic covariance function that
+operates on the second and third column of a three column matrix representing
+three predictor variables::
+
+    ls = [2, 5] # the lengthscales
+    cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2])
+
+Here the :code:`ls`, or lengthscale, parameter is two dimensional, allowing the second
+and third dimension to have a different lengthscale.  The reason we have to
+specify :code:`input_dim`, the total number of columns of :code:`X`, and
+:code:`active_dims`, which of those columns or dimensions the covariance
+function will act on, is because :code:`cov_func` hasn't actually seen the
+input data yet.  The :code:`active_dims` argument is optional, and defaults to
+all columns of the matrix of inputs.
+
+Covariance functions in PyMC closely follow the algebraic rules for kernels,
+which allows users to combine covariance functions into new ones, for example:
+
+- The sum of two covariance functions is also a covariance function::
+
+
+    cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...)
+
+- The product of two covariance functions is also a covariance function::
+
+
+    cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)
+
+- The product (or sum) of a covariance function with a scalar is a
+  covariance function::
+
+
+    cov_func = eta**2 * pm.gp.cov.Matern32(...)
+
+
+
+After the covariance function is defined, it is now a function that is
+evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`).  Since
+PyMC is built on top of PyTensor, it is relatively easy to define and experiment
+with non-standard covariance and mean functions.  For more information check out
+the :ref:`tutorial <GP-MeansAndCovs>` on covariance functions.
+
+
+GP Implementations
+==================
+
+PyMC includes several GP implementations, including marginal and latent
+variable models and also some fast approximations.  Their usage all follows a
+similar pattern:  First, a GP is instantiated with a mean function and a
+covariance function.  Then, GP objects can be added together, allowing for
+function characteristics to be carefully modeled and separated.  Finally, one
+of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP
+object to actually construct the PyMC random variable that represents the
+function prior.
+
+Using :code:`gp.Latent` for the example, the syntax to first specify the GP
+is::
+
+    gp = pm.gp.Latent(mean_func, cov_func)
+
+The first argument is the mean function and the second is the covariance
+function.  We've made the GP object, but we haven't made clear which function
+it is to be a prior for, what the inputs are, or what parameters it will be
+conditioned on.
+
+.. note::
+
+  The :code:`gp.Marginal` class and similar don't have a :code:`prior` method.
+  Instead they have a :code:`marginal_likelihood` method that is used similarly,
+  but has additional required arguments, such as the observed data, noise,
+  or other, depending on the implementation.  See the notebooks for examples.
+  The :code:`conditional` method works similarly.
+
+Calling the `prior` method will create a PyMC random variable that represents
+the latent function :math:`f(x) = \mathbf{f}`::
+
+	f = gp.prior("f", X)
+
+:code:`f` is a random variable that can be used within a PyMC model like any
+other type of random variable.  The first argument is the name of the random
+variable representing the function we are placing the prior over.
+The second argument is the inputs to the function that the prior is over,
+:code:`X`.  The inputs are usually known and present in the data, but they can
+also be PyMC random variables.  If the inputs are an PyTensor tensor or a
+PyMC random variable, the :code:`shape` needs to be given.
+
+Usually at this point, inference is performed on the model.  The
+:code:`conditional` method creates the conditional, or predictive,
+distribution over the latent function at arbitrary :math:`x_*` input points,
+:math:`f(x_*)`.  To construct the conditional distribution we write::
+
+	f_star = gp.conditional("f_star", X_star)
+
+.. _additive_gp:
+
+Additive GPs
+============
+
+The GP implementation in PyMC is constructed so that it is easy to define
+additive GPs and sample from individual GP components.  We can write::
+
+    gp1 = pm.gp.Marginal(mean_func1, cov_func1)
+    gp2 = pm.gp.Marginal(mean_func2, cov_func2)
+    gp3 = gp1 + gp2
+
+The GP objects have to have the same type, :code:`gp.Marginal` cannot
+be added to :code:`gp.Latent`.
+
+Consider two independent GP distributed functions, :math:`f_1(x) \sim
+\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim
+\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`.  The joint distribution of
+:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2` and :math:`f_1^* + f_2^*` is
+
+.. math::
+
+  \begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^*
+               \\ f_1 + f_2    \\ f_1^* + f_2^* \end{bmatrix} \sim
+  \text{N}\left(
+    \begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\
+                    m_1 + m_2    \\ m_1^* + m_2^*   \\ \end{bmatrix} \,,\,
+    \begin{bmatrix}
+      K_1       &  K_1^*     &   0       &    0      & K_1        & K_1^*              \\
+      K_1^{*^T} &  K_1^{**}  &   0       &    0      & K_1^*      & K_1^{**}           \\
+      0         &  0         & K_2       & K_2^*     & K_2        & K_2^{*}            \\
+      0         &  0         & K_2^{*^T} & K_2^{**}  & K_2^{*}    & K_2^{**}           \\
+      K_1       &  K_1^{*}   & K_2       & K_2^{*}   & K_1 + K_2  & K_1^{*} + K_2^{*}  \\
+      K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**}
+    \end{bmatrix}
+  \right) \,.
+
+Using the joint distribution to obtain the conditional distribution of :math:`f_1^*`
+with the contribution due to :math:`f_1 + f_2` factored out, we get
+
+.. math::
+  f_1^* \mid f_1 + f_2 \sim \text{N}\left(
+    m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\,
+    K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,.
+
+
+These equations show how to break down GP models into individual components to see how each
+contributes to the data.  For more information, check out `David Duvenaud's PhD
+thesis <https://www.cs.toronto.edu/~duvenaud/thesis.pdf>`_.
+
+The GP objects in PyMC keeps track of these marginals automatically.  The
+following code sketch shows how to define the conditional distribution of
+:math:`f_2^*`.  We use `gp.Marginal` in the example, but the same works for
+other implementations.  The first block fits the GP prior.  We denote
+:math:`f_1 + f_2` as just :math:`f` for brevity::
+
+    with pm.Model() as model:
+        gp1 = pm.gp.Marginal(mean_func1, cov_func1)
+        gp2 = pm.gp.Marginal(mean_func2, cov_func2)
+
+        # gp represents f1 + f2.
+        gp = gp1 + gp2
+
+        f = gp.marginal_likelihood("f", X, y, sigma)
+
+        idata = pm.sample(1000)
+
+
+To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we
+also need to include the additional arguments, :code:`X`, :code:`y`, and
+:code:`sigma`::
+
+    with model:
+        # conditional distributions of f1 and f2
+        f1_star = gp1.conditional("f1_star", X_star,
+                                  given={"X": X, "y": y, "sigma": sigma, "gp": gp})
+        f2_star = gp2.conditional("f2_star", X_star,
+                                  given={"X": X, "y": y, "sigma": sigma, "gp": gp})
+
+        # conditional of f1 + f2, `given` not required
+        f_star = gp.conditional("f_star", X_star)
+
+This second block produces the conditional distributions.  Notice that extra
+arguments are required for conditionals of :math:`f1` and :math:`f2`, but not
+:math:`f`.  This is because those arguments are cached when
+:code:`.marginal_likelihood` is called on :code:`gp`.
+
+.. note::
+  When constructing conditionals, the additional arguments :code:`X`, :code:`y`,
+  :code:`sigma` and :code:`gp` must be provided as a dict called `given`!
+
+Since the marginal likelihoood method of :code:`gp1` or :code:`gp2` weren't called,
+their conditionals need to be provided with the required inputs.  In the same
+fashion as the prior, :code:`f_star`, :code:`f1_star` and :code:`f2_star` are random
+variables that can now be used like any other random variable in PyMC.
+
+Check the :ref:`notebooks <gaussian_processes>`
+for detailed demonstrations of the usage of GP functionality in PyMC.
diff --git a/docs/source/learn/core_notebooks/index.md b/docs/source/learn/core_notebooks/index.md
@@ -10,4 +10,5 @@ model_comparison
 posterior_predictive
 dimensionality
 pymc_pytensor
+Gaussian_Processes
 :::