Skip to content

Commit 2c721e6

Browse files
authored
Updated AR notebook to PyMC 5.0 (#494)
* Updated AR notebook to PyMC 5.0 * Fixed update list
1 parent 43138a0 commit 2c721e6

File tree

2 files changed

+295
-200
lines changed

2 files changed

+295
-200
lines changed

examples/time_series/AR.ipynb

+223-157
Large diffs are not rendered by default.

examples/time_series/AR.myst.md

+72-43
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,18 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: 'Python 3.8.2 64-bit (''pymc3'': conda)'
8+
display_name: pie
99
language: python
10-
name: python38264bitpymc3conda8b8223a2f9874eff9bd3e12d36ed2ca2
10+
name: python3
1111
---
1212

13-
+++ {"ein.tags": ["worksheet-0"], "slideshow": {"slide_type": "-"}}
14-
15-
# Analysis of An $AR(1)$ Model in PyMC3
13+
(AR)=
14+
# Analysis of An $AR(1)$ Model in PyMC
15+
:::{post} Jan 7, 2023
16+
:tags: time series, autoregressive
17+
:category: intermediate,
18+
:author: Ed Herbst, Chris Fonnesbeck
19+
:::
1620

1721
```{code-cell} ipython3
1822
---
@@ -23,7 +27,7 @@ slideshow:
2327
import arviz as az
2428
import matplotlib.pyplot as plt
2529
import numpy as np
26-
import pymc3 as pm
30+
import pymc as pm
2731
```
2832

2933
```{code-cell} ipython3
@@ -35,14 +39,14 @@ az.style.use("arviz-darkgrid")
3539

3640
+++ {"ein.tags": ["worksheet-0"], "slideshow": {"slide_type": "-"}}
3741

38-
Consider the following AR(1) process, initialized in the
42+
Consider the following AR(2) process, initialized in the
3943
infinite past:
4044
$$
41-
y_t = \theta y_{t-1} + \epsilon_t,
45+
y_t = \rho_0 + \rho_1 y_{t-1} + \rho_2 y_{t-2} + \epsilon_t,
4246
$$
43-
where $\epsilon_t \overset{iid}{\sim} {\cal N}(0,1)$. Suppose you'd like to learn about $\theta$ from a a sample of observations $Y^T = \{ y_0, y_1,\ldots, y_T \}$.
47+
where $\epsilon_t \overset{iid}{\sim} {\cal N}(0,1)$. Suppose you'd like to learn about $\rho$ from a a sample of observations $Y^T = \{ y_0, y_1,\ldots, y_T \}$.
4448

45-
First, let's generate some synthetic sample data. We simulate the 'infinite past' by generating 10,000 samples from an AR(1) process and then discarding the first 5,000:
49+
First, let's generate some synthetic sample data. We simulate the 'infinite past' by generating 10,000 samples from an AR(2) process and then discarding the first 5,000:
4650

4751
```{code-cell} ipython3
4852
---
@@ -51,17 +55,18 @@ slideshow:
5155
slide_type: '-'
5256
---
5357
T = 10000
54-
y = np.zeros((T,))
5558
5659
# true stationarity:
57-
true_theta = 0.95
60+
true_rho = 0.7, -0.3
5861
# true standard deviation of the innovation:
5962
true_sigma = 2.0
6063
# true process mean:
61-
true_center = 0.0
64+
true_center = -1.0
6265
63-
for t in range(1, T):
64-
y[t] = true_theta * y[t - 1] + np.random.normal(loc=true_center, scale=true_sigma)
66+
y = np.random.normal(loc=true_center, scale=true_sigma, size=T)
67+
y[1] += true_rho[0] * y[0]
68+
for t in range(2, T - 100):
69+
y[t] += true_rho[0] * y[t - 1] + true_rho[1] * y[t - 2]
6570
6671
y = y[-5000:]
6772
plt.plot(y, alpha=0.8)
@@ -71,7 +76,9 @@ plt.ylabel("$y$");
7176

7277
+++ {"ein.tags": ["worksheet-0"], "slideshow": {"slide_type": "-"}}
7378

74-
This generative process is quite straight forward to implement in PyMC3:
79+
Let's start by trying to fit the wrong model! Assume that we do no know the generative model and so simply fit an AR(1) model for simplicity.
80+
81+
This generative process is quite straight forward to implement in PyMC. Since we wish to include an intercept term in the AR process, we must set `constant=True` otherwise PyMC will assume that we want an AR2 process when `rho` is of size 2. Also, by default a $N(0, 100)$ distribution will be used as the prior for the initial value. We can override this by passing a distribution (not a full RV) to the `init_dist` argument.
7582

7683
```{code-cell} ipython3
7784
---
@@ -81,41 +88,39 @@ slideshow:
8188
---
8289
with pm.Model() as ar1:
8390
# assumes 95% of prob mass is between -2 and 2
84-
theta = pm.Normal("theta", 0.0, 1.0)
91+
rho = pm.Normal("rho", mu=0.0, sigma=1.0, shape=2)
8592
# precision of the innovation term
86-
tau = pm.Exponential("tau", 0.5)
87-
# process mean
88-
center = pm.Normal("center", mu=0.0, sigma=1.0)
93+
tau = pm.Exponential("tau", lam=0.5)
8994
90-
likelihood = pm.AR1("y", k=theta, tau_e=tau, observed=y - center)
95+
likelihood = pm.AR(
96+
"y", rho=rho, tau=tau, constant=True, init_dist=pm.Normal.dist(0, 10), observed=y
97+
)
9198
92-
trace = pm.sample(1000, tune=2000, init="advi+adapt_diag", random_seed=RANDOM_SEED)
93-
idata = az.from_pymc3(trace)
99+
idata = pm.sample(1000, tune=2000, random_seed=RANDOM_SEED)
94100
```
95101

96-
We can see that even though the sample data did not start at zero, the true center of zero is captured rightly inferred by the model, as you can see in the trace plot below. Likewise, the model captured the true values of the autocorrelation parameter 𝜃 and the innovation term $\epsilon_t$ (`tau` in the model) -- 0.95 and 1 respectively).
102+
We can see that even though we assumed the wrong model, the parameter estimates are actually not that far from the true values.
97103

98104
```{code-cell} ipython3
99105
az.plot_trace(
100106
idata,
101107
lines=[
102-
("theta", {}, true_theta),
108+
("rho", {}, [true_center, true_rho[0]]),
103109
("tau", {}, true_sigma**-2),
104-
("center", {}, true_center),
105110
],
106111
);
107112
```
108113

109114
+++ {"ein.tags": ["worksheet-0"], "slideshow": {"slide_type": "-"}}
110115

111116
## Extension to AR(p)
112-
We can instead estimate an AR(2) model using PyMC3.
117+
Now let's fit the correct underlying model, an AR(2):
113118

114119
$$
115-
y_t = \theta_1 y_{t-1} + \theta_2 y_{t-2} + \epsilon_t.
120+
y_t = \rho_0 + \rho_1 y_{t-1} + \rho_2 y_{t-2} + \epsilon_t.
116121
$$
117122

118-
The `AR` distribution infers the order of the process thanks to the size the of `rho` argmument passed to `AR`.
123+
The `AR` distribution infers the order of the process thanks to the size the of `rho` argmument passed to `AR` (including the mean).
119124

120125
We will also use the standard deviation of the innovations (rather than the precision) to parameterize the distribution.
121126

@@ -126,61 +131,85 @@ slideshow:
126131
slide_type: '-'
127132
---
128133
with pm.Model() as ar2:
129-
theta = pm.Normal("theta", 0.0, 1.0, shape=2)
134+
rho = pm.Normal("rho", 0.0, 1.0, shape=3)
130135
sigma = pm.HalfNormal("sigma", 3)
131-
likelihood = pm.AR("y", theta, sigma=sigma, observed=y)
136+
likelihood = pm.AR(
137+
"y", rho=rho, sigma=sigma, constant=True, init_dist=pm.Normal.dist(0, 10), observed=y
138+
)
132139
133-
trace = pm.sample(
140+
idata = pm.sample(
134141
1000,
135142
tune=2000,
136143
random_seed=RANDOM_SEED,
137144
)
138-
idata = az.from_pymc3(trace)
139145
```
140146

147+
The posterior plots show that we have correctly inferred the generative model parameters.
148+
141149
```{code-cell} ipython3
142150
az.plot_trace(
143151
idata,
144152
lines=[
145-
("theta", {"theta_dim_0": 0}, true_theta),
146-
("theta", {"theta_dim_0": 1}, 0.0),
153+
("rho", {}, (true_center,) + true_rho),
147154
("sigma", {}, true_sigma),
148155
],
149156
);
150157
```
151158

152159
+++ {"ein.tags": ["worksheet-0"], "slideshow": {"slide_type": "-"}}
153160

154-
You can also pass the set of AR parameters as a list.
161+
You can also pass the set of AR parameters as a list, if they are not identically distributed.
155162

156163
```{code-cell} ipython3
157164
---
158165
ein.tags: [worksheet-0]
159166
slideshow:
160167
slide_type: '-'
161168
---
169+
import pytensor.tensor as pt
170+
162171
with pm.Model() as ar2_bis:
163-
beta0 = pm.Normal("theta0", mu=0.0, sigma=1.0)
164-
beta1 = pm.Uniform("theta1", -1, 1)
172+
rho0 = pm.Normal("rho0", mu=0.0, sigma=5.0)
173+
rho1 = pm.Uniform("rho1", -1, 1)
174+
rho2 = pm.Uniform("rho2", -1, 1)
165175
sigma = pm.HalfNormal("sigma", 3)
166-
likelhood = pm.AR("y", [beta0, beta1], sigma=sigma, observed=y)
176+
likelihood = pm.AR(
177+
"y",
178+
rho=pt.stack([rho0, rho1, rho2]),
179+
sigma=sigma,
180+
constant=True,
181+
init_dist=pm.Normal.dist(0, 10),
182+
observed=y,
183+
)
167184
168-
trace = pm.sample(
185+
idata = pm.sample(
169186
1000,
170187
tune=2000,
188+
target_accept=0.9,
171189
random_seed=RANDOM_SEED,
172190
)
173-
idata = az.from_pymc3(trace)
174191
```
175192

176193
```{code-cell} ipython3
177194
az.plot_trace(
178195
idata,
179-
lines=[("theta0", {}, true_theta), ("theta1", {}, 0.0), ("sigma", {}, true_sigma)],
196+
lines=[
197+
("rho0", {}, true_center),
198+
("rho1", {}, true_rho[0]),
199+
("rho2", {}, true_rho[1]),
200+
("sigma", {}, true_sigma),
201+
],
180202
);
181203
```
182204

205+
## Authors
206+
* authored by Ed Herbst in August, 2016 ([pymc#1546](https://github.com/pymc-devs/pymc/pull/2285))
207+
* updated Chris Fonnesbeck in January, 2023 ([pymc-examples#493](https://github.com/pymc-devs/pymc-examples/pull/494))
208+
183209
```{code-cell} ipython3
184210
%load_ext watermark
185-
%watermark -n -u -v -iv -w
211+
%watermark -n -u -v -iv -w -p pytensor,aeppl,xarray
186212
```
213+
214+
:::{include} ../page_footer.md
215+
:::

0 commit comments

Comments
 (0)