Skip to content

Commit bee65b4

Browse files
update gaussian_processes.rst (#6693)
* moved and cleaned up GP guide * added GP guide to core_notebooks index * implement oriol's suggestions * implement oriol's suggestions * Update docs/source/learn/core_notebooks/Gaussian_Processes.rst Co-authored-by: Oriol Abril-Pla <[email protected]> * Update docs/source/learn/core_notebooks/Gaussian_Processes.rst Co-authored-by: Oriol Abril-Pla <[email protected]> --------- Co-authored-by: Oriol Abril-Pla <[email protected]>
1 parent 8f43799 commit bee65b4

File tree

2 files changed

+266
-0
lines changed

2 files changed

+266
-0
lines changed
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
.. _gp_guide:
2+
3+
******************
4+
Gaussian Processes
5+
******************
6+
7+
GP Basics
8+
=========
9+
10+
Sometimes an unknown parameter or variable in a model is not a scalar value or
11+
a fixed-length vector, but a *function*. A Gaussian process (GP) can be used
12+
as a prior probability distribution whose support is over the space of
13+
continuous functions. A GP prior on the function :math:`f(x)` is usually written,
14+
15+
.. math::
16+
17+
f(x) \sim \mathcal{GP}(m(x), \, k(x, x')) \,.
18+
19+
The function values are modeled as a draw from a multivariate normal
20+
distribution that is parameterized by the mean function, :math:`m(x)`, and the
21+
covariance function, :math:`k(x, x')`. Gaussian processes are a convenient
22+
choice as priors over functions due to the marginalization and conditioning
23+
properties of the multivariate normal distribution. Usually, the marginal
24+
distribution over :math:`f(x)` is evaluated during the inference step. The
25+
conditional distribution is then used for predicting the function values
26+
:math:`f(x_*)` at new points, :math:`x_*`.
27+
28+
The joint distribution of :math:`f(x)` and :math:`f(x_*)` is multivariate
29+
normal,
30+
31+
.. math::
32+
33+
\begin{bmatrix} f(x) \\ f(x_*) \\ \end{bmatrix} \sim
34+
\text{N}\left(
35+
\begin{bmatrix} m(x) \\ m(x_*) \\ \end{bmatrix} \,,
36+
\begin{bmatrix} k(x,x') & k(x_*, x) \\
37+
k(x_*, x) & k(x_*, x_*') \\ \end{bmatrix}
38+
\right) \,.
39+
40+
Starting from the joint distribution, one obtains the marginal distribution
41+
of :math:`f(x)`, as :math:`\text{N}(m(x),\, k(x, x'))`. The conditional
42+
distribution is
43+
44+
.. math::
45+
46+
f(x_*) \mid f(x) \sim \text{N}\left( k(x_*, x) k(x, x)^{-1} [f(x) - m(x)] + m(x_*) ,\,
47+
k(x_*, x_*) - k(x, x_*) k(x, x)^{-1} k(x, x_*) \right) \,.
48+
49+
.. note::
50+
51+
For more information on GPs, check out the book `Gaussian Processes for
52+
Machine Learning <http://www.gaussianprocess.org/gpml/>`_ by Rasmussen &
53+
Williams, or `this introduction <http://www.inference.org.uk/mackay/gpB.pdf>`_
54+
by D. Mackay.
55+
56+
PyMC is a great environment for working with fully Bayesian Gaussian Process
57+
models. GPs in PyMC have a clear syntax and are highly composable, and many
58+
predefined covariance functions (or kernels), mean functions, and several GP
59+
implementations are included. GPs are treated as distributions that can be
60+
used within larger or hierarchical models, not just as standalone regression
61+
models.
62+
63+
Mean and covariance functions
64+
=============================
65+
66+
Those who have used the GPy or GPflow Python packages will find the syntax for
67+
construction mean and covariance functions somewhat familiar. When first
68+
instantiated, the mean and covariance functions are parameterized, but not
69+
given their inputs yet. The covariance functions must additionally be provided
70+
with the number of dimensions of the input matrix, and a list that indexes
71+
which of those dimensions they are to operate on. The reason for this design
72+
is so that covariance functions can be constructed that are combinations of
73+
other covariance functions.
74+
75+
For example, to construct an exponentiated quadratic covariance function that
76+
operates on the second and third column of a three column matrix representing
77+
three predictor variables::
78+
79+
ls = [2, 5] # the lengthscales
80+
cov_func = pm.gp.cov.ExpQuad(input_dim=3, ls=ls, active_dims=[1, 2])
81+
82+
Here the :code:`ls`, or lengthscale, parameter is two dimensional, allowing the second
83+
and third dimension to have a different lengthscale. The reason we have to
84+
specify :code:`input_dim`, the total number of columns of :code:`X`, and
85+
:code:`active_dims`, which of those columns or dimensions the covariance
86+
function will act on, is because :code:`cov_func` hasn't actually seen the
87+
input data yet. The :code:`active_dims` argument is optional, and defaults to
88+
all columns of the matrix of inputs.
89+
90+
Covariance functions in PyMC closely follow the algebraic rules for kernels,
91+
which allows users to combine covariance functions into new ones, for example:
92+
93+
- The sum of two covariance functions is also a covariance function::
94+
95+
96+
cov_func = pm.gp.cov.ExpQuad(...) + pm.gp.cov.ExpQuad(...)
97+
98+
- The product of two covariance functions is also a covariance function::
99+
100+
101+
cov_func = pm.gp.cov.ExpQuad(...) * pm.gp.cov.Periodic(...)
102+
103+
- The product (or sum) of a covariance function with a scalar is a
104+
covariance function::
105+
106+
107+
cov_func = eta**2 * pm.gp.cov.Matern32(...)
108+
109+
110+
111+
After the covariance function is defined, it is now a function that is
112+
evaluated by calling :code:`cov_func(x, x)` (or :code:`mean_func(x)`). Since
113+
PyMC is built on top of PyTensor, it is relatively easy to define and experiment
114+
with non-standard covariance and mean functions. For more information check out
115+
the :ref:`tutorial <GP-MeansAndCovs>` on covariance functions.
116+
117+
118+
GP Implementations
119+
==================
120+
121+
PyMC includes several GP implementations, including marginal and latent
122+
variable models and also some fast approximations. Their usage all follows a
123+
similar pattern: First, a GP is instantiated with a mean function and a
124+
covariance function. Then, GP objects can be added together, allowing for
125+
function characteristics to be carefully modeled and separated. Finally, one
126+
of `prior`, `marginal_likelihood` or `conditional` methods is called on the GP
127+
object to actually construct the PyMC random variable that represents the
128+
function prior.
129+
130+
Using :code:`gp.Latent` for the example, the syntax to first specify the GP
131+
is::
132+
133+
gp = pm.gp.Latent(mean_func, cov_func)
134+
135+
The first argument is the mean function and the second is the covariance
136+
function. We've made the GP object, but we haven't made clear which function
137+
it is to be a prior for, what the inputs are, or what parameters it will be
138+
conditioned on.
139+
140+
.. note::
141+
142+
The :code:`gp.Marginal` class and similar don't have a :code:`prior` method.
143+
Instead they have a :code:`marginal_likelihood` method that is used similarly,
144+
but has additional required arguments, such as the observed data, noise,
145+
or other, depending on the implementation. See the notebooks for examples.
146+
The :code:`conditional` method works similarly.
147+
148+
Calling the `prior` method will create a PyMC random variable that represents
149+
the latent function :math:`f(x) = \mathbf{f}`::
150+
151+
f = gp.prior("f", X)
152+
153+
:code:`f` is a random variable that can be used within a PyMC model like any
154+
other type of random variable. The first argument is the name of the random
155+
variable representing the function we are placing the prior over.
156+
The second argument is the inputs to the function that the prior is over,
157+
:code:`X`. The inputs are usually known and present in the data, but they can
158+
also be PyMC random variables. If the inputs are an PyTensor tensor or a
159+
PyMC random variable, the :code:`shape` needs to be given.
160+
161+
Usually at this point, inference is performed on the model. The
162+
:code:`conditional` method creates the conditional, or predictive,
163+
distribution over the latent function at arbitrary :math:`x_*` input points,
164+
:math:`f(x_*)`. To construct the conditional distribution we write::
165+
166+
f_star = gp.conditional("f_star", X_star)
167+
168+
.. _additive_gp:
169+
170+
Additive GPs
171+
============
172+
173+
The GP implementation in PyMC is constructed so that it is easy to define
174+
additive GPs and sample from individual GP components. We can write::
175+
176+
gp1 = pm.gp.Marginal(mean_func1, cov_func1)
177+
gp2 = pm.gp.Marginal(mean_func2, cov_func2)
178+
gp3 = gp1 + gp2
179+
180+
The GP objects have to have the same type, :code:`gp.Marginal` cannot
181+
be added to :code:`gp.Latent`.
182+
183+
Consider two independent GP distributed functions, :math:`f_1(x) \sim
184+
\mathcal{GP}\left(m_1(x),\, k_1(x, x')\right)` and :math:`f_2(x) \sim
185+
\mathcal{GP}\left( m_2(x),\, k_2(x, x')\right)`. The joint distribution of
186+
:math:`f_1,\, f_1^*,\, f_2,\, f_2^*,\, f_1 + f_2` and :math:`f_1^* + f_2^*` is
187+
188+
.. math::
189+
190+
\begin{bmatrix} f_1 \\ f_1^* \\ f_2 \\ f_2^*
191+
\\ f_1 + f_2 \\ f_1^* + f_2^* \end{bmatrix} \sim
192+
\text{N}\left(
193+
\begin{bmatrix} m_1 \\ m_1^* \\ m_2 \\ m_2^* \\
194+
m_1 + m_2 \\ m_1^* + m_2^* \\ \end{bmatrix} \,,\,
195+
\begin{bmatrix}
196+
K_1 & K_1^* & 0 & 0 & K_1 & K_1^* \\
197+
K_1^{*^T} & K_1^{**} & 0 & 0 & K_1^* & K_1^{**} \\
198+
0 & 0 & K_2 & K_2^* & K_2 & K_2^{*} \\
199+
0 & 0 & K_2^{*^T} & K_2^{**} & K_2^{*} & K_2^{**} \\
200+
K_1 & K_1^{*} & K_2 & K_2^{*} & K_1 + K_2 & K_1^{*} + K_2^{*} \\
201+
K_1^{*^T} & K_1^{**} & K_2^{*^T} & K_2^{**} & K_1^{*^T}+K_2^{*^T} & K_1^{**}+K_2^{**}
202+
\end{bmatrix}
203+
\right) \,.
204+
205+
Using the joint distribution to obtain the conditional distribution of :math:`f_1^*`
206+
with the contribution due to :math:`f_1 + f_2` factored out, we get
207+
208+
.. math::
209+
f_1^* \mid f_1 + f_2 \sim \text{N}\left(
210+
m_1^* + K_1^{*^T}(K_1 + K_2)^{-1}\left[f_1 + f_2 - m_1 - m_2\right] \,,\,
211+
K_1^{**} - K_1^{*^T}(K_1 + K_2)^{-1}K_1^* \right) \,.
212+
213+
214+
These equations show how to break down GP models into individual components to see how each
215+
contributes to the data. For more information, check out `David Duvenaud's PhD
216+
thesis <https://www.cs.toronto.edu/~duvenaud/thesis.pdf>`_.
217+
218+
The GP objects in PyMC keeps track of these marginals automatically. The
219+
following code sketch shows how to define the conditional distribution of
220+
:math:`f_2^*`. We use `gp.Marginal` in the example, but the same works for
221+
other implementations. The first block fits the GP prior. We denote
222+
:math:`f_1 + f_2` as just :math:`f` for brevity::
223+
224+
with pm.Model() as model:
225+
gp1 = pm.gp.Marginal(mean_func1, cov_func1)
226+
gp2 = pm.gp.Marginal(mean_func2, cov_func2)
227+
228+
# gp represents f1 + f2.
229+
gp = gp1 + gp2
230+
231+
f = gp.marginal_likelihood("f", X, y, sigma)
232+
233+
idata = pm.sample(1000)
234+
235+
236+
To construct the conditional distribution of :code:`gp1` or :code:`gp2`, we
237+
also need to include the additional arguments, :code:`X`, :code:`y`, and
238+
:code:`sigma`::
239+
240+
with model:
241+
# conditional distributions of f1 and f2
242+
f1_star = gp1.conditional("f1_star", X_star,
243+
given={"X": X, "y": y, "sigma": sigma, "gp": gp})
244+
f2_star = gp2.conditional("f2_star", X_star,
245+
given={"X": X, "y": y, "sigma": sigma, "gp": gp})
246+
247+
# conditional of f1 + f2, `given` not required
248+
f_star = gp.conditional("f_star", X_star)
249+
250+
This second block produces the conditional distributions. Notice that extra
251+
arguments are required for conditionals of :math:`f1` and :math:`f2`, but not
252+
:math:`f`. This is because those arguments are cached when
253+
:code:`.marginal_likelihood` is called on :code:`gp`.
254+
255+
.. note::
256+
When constructing conditionals, the additional arguments :code:`X`, :code:`y`,
257+
:code:`sigma` and :code:`gp` must be provided as a dict called `given`!
258+
259+
Since the marginal likelihoood method of :code:`gp1` or :code:`gp2` weren't called,
260+
their conditionals need to be provided with the required inputs. In the same
261+
fashion as the prior, :code:`f_star`, :code:`f1_star` and :code:`f2_star` are random
262+
variables that can now be used like any other random variable in PyMC.
263+
264+
Check the :ref:`notebooks <gaussian_processes>`
265+
for detailed demonstrations of the usage of GP functionality in PyMC.

docs/source/learn/core_notebooks/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ model_comparison
1010
posterior_predictive
1111
dimensionality
1212
pymc_pytensor
13+
Gaussian_Processes
1314
:::

0 commit comments

Comments
 (0)