Skip to content

Commit 688272a

Browse files
authored
update v4 (#339)
1 parent 568edd5 commit 688272a

File tree

2 files changed

+269
-86
lines changed

2 files changed

+269
-86
lines changed

examples/samplers/SMC-ABC_Lotka-Volterra_example.ipynb

+233-70
Large diffs are not rendered by default.

myst_nbs/samplers/SMC-ABC_Lotka-Volterra_example.myst.md

+36-16
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,25 @@ jupytext:
66
format_version: 0.13
77
jupytext_version: 1.13.7
88
kernelspec:
9-
display_name: Python (PyMC3 Dev)
9+
display_name: Python 3.9.7 ('base')
1010
language: python
11-
name: pymc3-dev
11+
name: python3
1212
---
1313

14+
(ABC_introduction)=
15+
# Approximate Bayesian Computation
16+
:::{post} May 31, 2022
17+
:tags: SMC, ABC
18+
:category: beginner, explanation
19+
:::
20+
1421
```{code-cell} ipython3
1522
import arviz as az
1623
import matplotlib.pyplot as plt
1724
import numpy as np
18-
import pymc3 as pm
25+
import pymc as pm
26+
27+
print(f"Running on PyMC v{pm.__version__}")
1928
```
2029

2130
```{code-cell} ipython3
@@ -69,8 +78,8 @@ data = np.random.normal(loc=0, scale=1, size=1000)
6978
Clearly under normal circumstances using a Gaussian likelihood will do the job very well. But that would defeat the purpose of this example, the notebook would end here and everything would be very boring. So, instead of that we are going to define a simulator. A very straightforward simulator for normal data is a pseudo random number generator, in real life our simulator will be most likely something fancier.
7079

7180
```{code-cell} ipython3
72-
def normal_sim(a, b):
73-
return np.random.normal(a, b, 1000)
81+
def normal_sim(rng, a, b, size=1000):
82+
return rng.normal(a, b, size=size)
7483
```
7584

7685
Defining an ABC model in PyMC3 is in general, very similar to defining other PyMC3 models. The two important differences are: we need to define a `Simulator` _distribution_ and we need to use `sample_smc` with `kernel="ABC"`. The `Simulator` works as a generic interface to pass the synthetic data generating function (_normal_sim_ in this example), its parameters, the observed data and optionally a distance function and a summary statistics. In the following code we are using the default distance, `gaussian_kernel`, and the `sort` summary_statistic. As the name suggests `sort` sorts the data before computing the distance.
@@ -83,8 +92,8 @@ with pm.Model() as example:
8392
b = pm.HalfNormal("b", sigma=1)
8493
s = pm.Simulator("s", normal_sim, params=(a, b), sum_stat="sort", epsilon=1, observed=data)
8594
86-
trace, sim_data = pm.sample_smc(kernel="ABC", parallel=True, save_sim_data=True)
87-
idata = az.from_pymc3(trace, posterior_predictive=sim_data)
95+
idata = pm.sample_smc()
96+
idata.extend(pm.sample_posterior_predictive(idata))
8897
```
8998

9099
Judging by `plot_trace` the sampler did its job very well, which is not surprising given this is a very simple model. Anyway, it is always reassuring to look at a flat rank plot :-)
@@ -100,7 +109,7 @@ az.summary(idata, kind="stats")
100109
The posterior predictive check shows that we have an overall good fit, but the synthetic data has heavier tails than the observed one. You may want to decrease the value of epsilon, and see if you can get a tighter fit.
101110

102111
```{code-cell} ipython3
103-
az.plot_ppc(idata);
112+
az.plot_ppc(idata, num_pp_samples=500);
104113
```
105114

106115
## Lotka–Volterra
@@ -139,23 +148,23 @@ def dX_dt(X, t, a, b, c, d):
139148
140149
141150
# simulator function
142-
def competition_model(a, b):
151+
def competition_model(rng, a, b, size=None):
143152
return odeint(dX_dt, y0=X0, t=t, rtol=0.01, args=(a, b, c, d))
144153
```
145154

146155
Using the simulator function we will obtain a dataset with some noise added, for using it as observed data.
147156

148157
```{code-cell} ipython3
149158
# function for generating noisy data to be used as observed data.
150-
def add_noise(a, b, c, d):
159+
def add_noise(a, b):
151160
noise = np.random.normal(size=(size, 2))
152-
simulated = competition_model(a, b) + noise
161+
simulated = competition_model(None, a, b) + noise
153162
return simulated
154163
```
155164

156165
```{code-cell} ipython3
157166
# plotting observed data.
158-
observed = add_noise(a, b, c, d)
167+
observed = add_noise(a, b)
159168
_, ax = plt.subplots(figsize=(12, 4))
160169
ax.plot(observed[:, 0], "x", label="prey")
161170
ax.plot(observed[:, 1], "x", label="predator")
@@ -174,8 +183,7 @@ with pm.Model() as model_lv:
174183
175184
sim = pm.Simulator("sim", competition_model, params=(a, b), epsilon=10, observed=observed)
176185
177-
trace_lv = pm.sample_smc(kernel="ABC", parallel=True)
178-
idata_lv = az.from_pymc3(trace_lv)
186+
idata_lv = pm.sample_smc()
179187
```
180188

181189
```{code-cell} ipython3
@@ -189,18 +197,30 @@ az.plot_posterior(idata_lv);
189197
```{code-cell} ipython3
190198
# plot results
191199
_, ax = plt.subplots(figsize=(14, 6))
200+
posterior = idata_lv.posterior.stack(samples=("draw", "chain"))
192201
ax.plot(observed[:, 0], "o", label="prey", c="C0", mec="k")
193202
ax.plot(observed[:, 1], "o", label="predator", c="C1", mec="k")
194-
ax.plot(competition_model(trace_lv["a"].mean(), trace_lv["b"].mean()), linewidth=3)
203+
ax.plot(competition_model(None, posterior["a"].mean(), posterior["b"].mean()), linewidth=3)
195204
for i in np.random.randint(0, size, 75):
196-
sim = competition_model(trace_lv["a"][i], trace_lv["b"][i])
205+
sim = competition_model(None, posterior["a"][i], posterior["b"][i])
197206
ax.plot(sim[:, 0], alpha=0.1, c="C0")
198207
ax.plot(sim[:, 1], alpha=0.1, c="C1")
199208
ax.set_xlabel("time")
200209
ax.set_ylabel("population")
201210
ax.legend();
202211
```
203212

213+
## References
214+
215+
:::{bibliography}
216+
:filter: docname in docnames
217+
218+
martin2021bayesian
219+
:::
220+
204221
```{code-cell} ipython3
205222
%watermark -n -u -v -iv -w
206223
```
224+
225+
:::{include} ../page_footer.md
226+
:::

0 commit comments

Comments
 (0)