Skip to content

Commit 6778109

Browse files
committed
Address comments from review
1 parent ba8d56d commit 6778109

File tree

2 files changed

+315
-314
lines changed

2 files changed

+315
-314
lines changed

examples/howto/blackbox_external_likelihood_numpy.ipynb

+282-281
Large diffs are not rendered by default.

examples/howto/blackbox_external_likelihood_numpy.myst.md

+33-33
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3 (ipykernel)
8+
display_name: pymc-examples
99
language: python
10-
name: python3
10+
name: pymc-examples
1111
---
1212

1313
(blackbox_external_likelihood_numpy)=
@@ -25,8 +25,6 @@ There is a {ref}`related example <wrapping_jax_function>` that discusses how to
2525

2626
```{code-cell} ipython3
2727
import arviz as az
28-
import IPython
29-
import matplotlib
3028
import matplotlib.pyplot as plt
3129
import numpy as np
3230
import pymc as pm
@@ -45,9 +43,9 @@ az.style.use("arviz-darkgrid")
4543
```
4644

4745
## Introduction
48-
PyMC is a great tool for doing Bayesian inference and parameter estimation. It has a load of {doc}`in-built probability distributions <pymc:api/distributions>` that you can use to set up priors and likelihood functions for your particular model. You can even create your own {class}`Custom Distribution <pymc.CustomDist>`.
46+
PyMC is a great tool for doing Bayesian inference and parameter estimation. It has many {doc}`in-built probability distributions <pymc:api/distributions>` that you can use to set up priors and likelihood functions for your particular model. You can even create your own {class}`Custom Distribution <pymc.CustomDist>` with a custom logp defined by PyTensor operations or automatically inferred from the generative graph.
4947

50-
However, this is not necessarily that simple if you have a model function, or probability distribution, that, for example, relies on an external code that you have little/no control over (and may even be, for example, wrapped `C` code rather than Python). This can be problematic when you need to pass parameters as PyMC distributions to these external functions; your external function probably wants you to pass it floating point numbers rather than PyMC distributions!
48+
Despite all these "batteries included", you may still find yourself dealing with a model function or probability distribution that relies on complex external code that you cannot avoid but to use. This code is unlikely to work with the kind of abstract PyTensor variables that PyMC uses: {ref}`pymc:pymc_pytensor`.
5149

5250
```python
5351
import pymc as pm
@@ -66,7 +64,7 @@ Another issue is that if you want to be able to use the gradient-based step samp
6664

6765
Defining a model/likelihood that PyMC can use and that calls your "black box" function is possible, but it relies on creating a custom PyTensor Op. This is, hopefully, a clear description of how to do this, including one way of writing a gradient function that could be generally applicable.
6866

69-
In the examples below, we create a very simple model and log-likelihood function in numpy.
67+
In the examples below, we create a very simple lineral model and log-likelihood function in numpy.
7068

7169
```{code-cell} ipython3
7270
def my_model(m, c, x):
@@ -105,19 +103,17 @@ with pm.Model():
105103
trace = pm.sample(1000)
106104
```
107105

108-
But, this will give an error like:
106+
But, this will likely give an error when the black-box function does not accept PyTensor tensor objects as inputs.
109107

110-
```
111-
ValueError: setting an array element with a sequence.
112-
```
113-
114-
This is because `m` and `c` are PyTensor tensor-type objects.
108+
So, what we actually need to do is create a {ref}`PyTensor Op <pytensor:creating_an_op>`. This will be a new class that wraps our log-likelihood function while obeying the PyTensor API contract. We will do this below, initially without defining a {func}`grad` for the Op.
115109

116-
So, what we actually need to do is create a {ref}`PyTensor Op <pytensor:creating_an_op>`. This will be a new class that wraps our log-likelihood function (or just our model function, if that is all that is required) into something that can take in PyTensor tensor objects, but internally can cast them as floating point values that can be passed to our log-likelihood function. We will do this below, initially without defining a {func}`grad` for the Op.
110+
:::{tip}
111+
Depending on your application you may only need to wrap a custom log-likelihood or a subset of the whole model (such as a function that computes an infinite series summation using an advanced library like mpmath), which can then be chained with other PyMC distributions and PyTensor operations to define your whole model. There is a trade-off here, usually the more you leave out of a black-box the more you may benefit from PyTensor rewrites and optimizations. We suggest you always try to define the whole model in PyMC and PyTensor, and only use black-boxes where strictly necessary.
112+
:::
117113

118114
+++
119115

120-
## PyTensor Op without grad
116+
## PyTensor Op without gradients
121117

122118
+++
123119

@@ -128,7 +124,7 @@ So, what we actually need to do is create a {ref}`PyTensor Op <pytensor:creating
128124
129125
130126
class LogLike(Op):
131-
def make_node(self, m, c, sigma, x, data):
127+
def make_node(self, m, c, sigma, x, data) -> Apply:
132128
# Convert inputs to tensor variables
133129
m = pt.as_tensor(m)
134130
c = pt.as_tensor(c)
@@ -143,10 +139,10 @@ class LogLike(Op):
143139
# outputs = [pt.vector()]
144140
outputs = [data.type()]
145141
146-
# Apply is an object that combins inputs, outputs and an Op (self)
142+
# Apply is an object that combines inputs, outputs and an Op (self)
147143
return Apply(self, inputs, outputs)
148144
149-
def perform(self, node, inputs, outputs):
145+
def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
150146
# This is the method that compute numerical output
151147
# given numerical inputs. Everything here is numpy arrays
152148
m, c, sigma, x, data = inputs # this will contain my variables
@@ -155,6 +151,8 @@ class LogLike(Op):
155151
loglike_eval = my_loglike(m, c, sigma, x, data)
156152
157153
# Save the result in the outputs list provided by PyTensor
154+
# There is one list per output, each containing another list
155+
# pre-populated with a `None` where the result should be saved.
158156
outputs[0][0] = np.asarray(loglike_eval)
159157
```
160158

@@ -221,7 +219,7 @@ with pm.Model() as no_grad_model:
221219
m = pm.Uniform("m", lower=-10.0, upper=10.0, initval=mtrue)
222220
c = pm.Uniform("c", lower=-10.0, upper=10.0, initval=ctrue)
223221
224-
# use a Potential to "call" the Op and include it in the logp computation
222+
# use a CustomDist with a custom logp function
225223
likelihood = pm.CustomDist(
226224
"likelihood", m, c, sigma, x, observed=data, logp=custom_dist_loglike
227225
)
@@ -243,7 +241,7 @@ We sholud get exactly the same values as when we tested manually!
243241
no_grad_model.compile_logp(vars=[likelihood], sum=False)(ip)
244242
```
245243

246-
We can also double check that PyTensor will error out if we try to extract the model `dlogp`
244+
We can also double-check that PyMC will error out if we try to extract the model gradients with respect to the logp (which we call `dlogp`)
247245

248246
```{code-cell} ipython3
249247
try:
@@ -263,13 +261,17 @@ with no_grad_model:
263261
az.plot_trace(idata_no_grad, lines=[("m", {}, mtrue), ("c", {}, ctrue)]);
264262
```
265263

266-
## PyTensor Op with grad
264+
## PyTensor Op with gradients
267265

268266
What if we wanted to use NUTS or HMC? If we knew the analytical derivatives of the model/likelihood function then we could add a {func}`grad` to the Op using existing PyTensor operations.
269267

270268
But, what if we don't know the analytical form. If our model/likelihood, is implemented in a framework that provides automatic differentiation (just like PyTensor does), it's possible to reuse their functionality. This {ref}`related example <wrapping_jax_function> shows how to do this when working with JAX functions.
271269

272-
But, if our model/likelihood truly is a "black box" then we can just use the good-old-fashioned [finite difference](https://en.wikipedia.org/wiki/Finite_difference) to find the gradients - this can be slow, especially if there are a large number of variables, or the model takes a long time to evaluate. We use the handy SciPy {func}`~scipy.optimize.approx_fprime` function to achieve this.
270+
If our model/likelihood truly is a "black box" then we can try to use approximation methods like [finite difference](https://en.wikipedia.org/wiki/Finite_difference) to find the gradients. We illustrate this approach with the handy SciPy {func}`~scipy.optimize.approx_fprime` function to achieve this.
271+
272+
:::{caution}
273+
Finite differences are rarely recommended as a way to compute gradients. They can be too slow or unstable for practical uses. We suggest you use them only as a last resort.
274+
:::
273275

274276
+++
275277

@@ -311,14 +313,14 @@ finite_differences_loglike(mtrue, ctrue, sigma, x, data)
311313

312314
So, now we can just redefine our Op with a `grad()` method, right?
313315

314-
It's not quite so simple! The `grad()` method itself requires that its inputs are PyTensor tensor variables, whereas our `gradients` function above, like our `my_loglike` function, wants a list of floating point values. So, we need to define another Op that calculates the gradients. Below, I define a new version of the `LogLike` Op, called `LogLikeWithGrad` this time, that has a `grad()` method. This is followed by anothor Op called `LogLikeGrad` that, when called with a vector of PyTensor tensor variables, returns another vector of values that are the gradients (i.e., the [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant)) of our log-likelihood function at those values. Note that the `grad()` method itself does not return the gradients directly, but instead returns the [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant)-vector product (you can hopefully just copy what I've done and not worry about what this means too much!).
316+
It's not quite so simple! The `grad()` method itself requires that its inputs are PyTensor tensor variables, whereas our `gradients` function above, like our `my_loglike` function, wants a list of floating point values. So, we need to define another Op that calculates the gradients. Below, I define a new version of the `LogLike` Op, called `LogLikeWithGrad` this time, that has a `grad()` method. This is followed by anothor Op called `LogLikeGrad` that, when called with a vector of PyTensor tensor variables, returns another vector of values that are the gradients (i.e., the [Jacobian](https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant)) of our log-likelihood function at those values. Note that the `grad()` method itself does not return the gradients directly, but instead returns the [Jacobian-vector product](https://en.wikipedia.org/wiki/Pushforward_(differential)).
315317

316318
```{code-cell} ipython3
317319
# define a pytensor Op for our likelihood function
318320
319321
320322
class LogLikeWithGrad(Op):
321-
def make_node(self, m, c, sigma, x, data):
323+
def make_node(self, m, c, sigma, x, data) -> Apply:
322324
# Same as before
323325
m = pt.as_tensor(m)
324326
c = pt.as_tensor(c)
@@ -330,13 +332,15 @@ class LogLikeWithGrad(Op):
330332
outputs = [data.type()]
331333
return Apply(self, inputs, outputs)
332334
333-
def perform(self, node, inputs, outputs):
335+
def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
334336
# Same as before
335337
m, c, sigma, x, data = inputs # this will contain my variables
336338
loglike_eval = my_loglike(m, c, sigma, x, data)
337339
outputs[0][0] = np.asarray(loglike_eval)
338340
339-
def grad(self, inputs, g):
341+
def grad(
342+
self, inputs: list[pt.TensorVariable], g: list[pt.TensorVariable]
343+
) -> list[pt.TensorVariable]:
340344
# NEW!
341345
# the method that calculates the gradients - it actually returns the vector-Jacobian product
342346
m, c, sigma, x, data = inputs
@@ -361,7 +365,7 @@ class LogLikeWithGrad(Op):
361365
362366
363367
class LogLikeGrad(Op):
364-
def make_node(self, m, c, sigma, x, data):
368+
def make_node(self, m, c, sigma, x, data) -> Apply:
365369
m = pt.as_tensor(m)
366370
c = pt.as_tensor(c)
367371
sigma = pt.as_tensor(sigma)
@@ -375,7 +379,7 @@ class LogLikeGrad(Op):
375379
376380
return Apply(self, inputs, outputs)
377381
378-
def perform(self, node, inputs, outputs):
382+
def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
379383
m, c, sigma, x, data = inputs
380384
381385
# calculate gradients
@@ -448,7 +452,7 @@ with pm.Model() as grad_model:
448452
m = pm.Uniform("m", lower=-10.0, upper=10.0)
449453
c = pm.Uniform("c", lower=-10.0, upper=10.0)
450454
451-
# use a Potential to "call" the Op and include it in the logp computation
455+
# use a CustomDist with a custom logp function
452456
likelihood = pm.CustomDist(
453457
"likelihood", m, c, sigma, x, observed=data, logp=custom_dist_loglike
454458
)
@@ -582,7 +586,3 @@ az.plot_trace(idata_potential, lines=[("m", {}, mtrue), ("c", {}, ctrue)]);
582586

583587
:::{include} ../page_footer.md
584588
:::
585-
586-
```{code-cell} ipython3
587-
588-
```

0 commit comments

Comments
 (0)