Skip to content

Commit 87316da

Browse files
authored
Update VI notebooks to v5 (#497)
* Updated ADVI minibatch and API quickstart * notebook: GLM robust (run with pymc v5) (#499) * run with pymc v5 * add myst file * notebook: glm hierarchical, run pymc v5 (#498) * run pymc v5 * add myst file * Updated ADVI minibatch and API quickstart * Added headers and footers to quickstart notebook * Added headers and footers to quickstart notebook * Update empirical approximation to v5 * Added entry to update history in empirical VI notebook * Minor language tweak; remove gaussian mixture ADVI example * Removed instances of pymc3
1 parent d26bfb8 commit 87316da

8 files changed

+1299
-2323
lines changed

examples/variational_inference/GLM-hierarchical-advi-minibatch.ipynb

+144-80
Large diffs are not rendered by default.

examples/variational_inference/GLM-hierarchical-advi-minibatch.myst.md

+34-16
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3
8+
display_name: pie
99
language: python
1010
name: python3
1111
---
@@ -22,22 +22,22 @@ kernelspec:
2222
Unlike Gaussian mixture models, (hierarchical) regression models have independent variables. These variables affect the likelihood function, but are not random variables. When using mini-batch, we should take care of that.
2323

2424
```{code-cell} ipython3
25-
%env THEANO_FLAGS=device=cpu, floatX=float32, warn_float64=ignore
25+
%env PYTENSOR_FLAGS=device=cpu, floatX=float32, warn_float64=ignore
2626
2727
import os
2828
2929
import arviz as az
3030
import matplotlib.pyplot as plt
3131
import numpy as np
3232
import pandas as pd
33-
import pymc3 as pm
33+
import pymc as pm
34+
import pytensor
35+
import pytensor.tensor as pt
3436
import seaborn as sns
35-
import theano
36-
import theano.tensor as tt
3737
3838
from scipy import stats
3939
40-
print(f"Running on PyMC3 v{pm.__version__}")
40+
print(f"Running on PyMC v{pm.__version__}")
4141
```
4242

4343
```{code-cell} ipython3
@@ -67,9 +67,9 @@ coords = {"counties": data.county.unique()}
6767
Here, `log_radon_idx_t` is a dependent variable, while `floor_idx_t` and `county_idx_t` determine the independent variable.
6868

6969
```{code-cell} ipython3
70-
log_radon_idx_t = pm.Minibatch(log_radon_idx, 100)
71-
floor_idx_t = pm.Minibatch(floor_idx, 100)
72-
county_idx_t = pm.Minibatch(county_idx, 100)
70+
log_radon_idx_t = pm.Minibatch(log_radon_idx, batch_size=100)
71+
floor_idx_t = pm.Minibatch(floor_idx, batch_size=100)
72+
county_idx_t = pm.Minibatch(county_idx, batch_size=100)
7373
```
7474

7575
```{code-cell} ipython3
@@ -125,8 +125,9 @@ Then, run ADVI with mini-batch.
125125

126126
```{code-cell} ipython3
127127
with hierarchical_model:
128-
approx = pm.fit(100000, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])
129-
idata_advi = az.from_pymc3(approx.sample(500))
128+
approx = pm.fit(100_000, callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)])
129+
130+
idata_advi = approx.sample(500)
130131
```
131132

132133
Check the trace of ELBO and compare the result with MCMC.
@@ -135,6 +136,20 @@ Check the trace of ELBO and compare the result with MCMC.
135136
plt.plot(approx.hist);
136137
```
137138

139+
We can extract the covariance matrix from the mean field approximation and use it as the scaling matrix for the NUTS algorithm.
140+
141+
```{code-cell} ipython3
142+
scaling = approx.cov.eval()
143+
```
144+
145+
Also, we can generate samples (one for each chain) to use as the starting points for the sampler.
146+
147+
```{code-cell} ipython3
148+
n_chains = 4
149+
sample = approx.sample(return_inferencedata=False, size=n_chains)
150+
start_dict = list(sample[i] for i in range(n_chains))
151+
```
152+
138153
```{code-cell} ipython3
139154
# Inference button (TM)!
140155
with pm.Model(coords=coords):
@@ -155,15 +170,15 @@ with pm.Model(coords=coords):
155170
radon_like = pm.Normal("radon_like", mu=radon_est, sigma=eps, observed=log_radon_idx)
156171
157172
# essentially, this is what init='advi' does
158-
step = pm.NUTS(scaling=approx.cov.eval(), is_cov=True)
159-
hierarchical_trace = pm.sample(
160-
2000, step, start=approx.sample()[0], progressbar=True, return_inferencedata=True
161-
)
173+
step = pm.NUTS(scaling=scaling, is_cov=True)
174+
hierarchical_trace = pm.sample(draws=2000, step=step, chains=n_chains, initvals=start_dict)
162175
```
163176

164177
```{code-cell} ipython3
165178
az.plot_density(
166-
[idata_advi, hierarchical_trace], var_names=["~alpha", "~beta"], data_labels=["ADVI", "NUTS"]
179+
data=[idata_advi, hierarchical_trace],
180+
var_names=["~alpha", "~beta"],
181+
data_labels=["ADVI", "NUTS"],
167182
);
168183
```
169184

@@ -173,3 +188,6 @@ az.plot_density(
173188
%load_ext watermark
174189
%watermark -n -u -v -iv -w -p xarray
175190
```
191+
192+
:::{include} ../page_footer.md
193+
:::

examples/variational_inference/empirical-approx-overview.ipynb

+170-156
Large diffs are not rendered by default.

examples/variational_inference/empirical-approx-overview.myst.md

+38-24
Original file line numberDiff line numberDiff line change
@@ -5,32 +5,40 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python PyMC3 (Dev)
8+
display_name: pie
99
language: python
10-
name: pymc3-dev-py38
10+
name: python3
1111
---
1212

13+
(empirical-approx-overview)=
14+
1315
# Empirical Approximation overview
1416

15-
For most models we use sampling MCMC algorithms like Metropolis or NUTS. In PyMC3 we got used to store traces of MCMC samples and then do analysis using them. There is a similar concept for the variational inference submodule in PyMC3: *Empirical*. This type of approximation stores particles for the SVGD sampler. There is no difference between independent SVGD particles and MCMC samples. *Empirical* acts as a bridge between MCMC sampling output and full-fledged VI utils like `apply_replacements` or `sample_node`. For the interface description, see [variational_api_quickstart](variational_api_quickstart.ipynb). Here we will just focus on `Emprical` and give an overview of specific things for the *Empirical* approximation
17+
For most models we use sampling MCMC algorithms like Metropolis or NUTS. In PyMC we got used to store traces of MCMC samples and then do analysis using them. There is a similar concept for the variational inference submodule in PyMC: *Empirical*. This type of approximation stores particles for the SVGD sampler. There is no difference between independent SVGD particles and MCMC samples. *Empirical* acts as a bridge between MCMC sampling output and full-fledged VI utils like `apply_replacements` or `sample_node`. For the interface description, see [variational_api_quickstart](variational_api_quickstart.ipynb). Here we will just focus on `Emprical` and give an overview of specific things for the *Empirical* approximation.
18+
19+
:::{post} Jan 13, 2023
20+
:tags: variational inference, approximation
21+
:category: advaned, how-to
22+
:author: Maxim Kochurov, Raul Maldonado, Chris Fonnesbeck
23+
:::
1624

1725
```{code-cell} ipython3
1826
import arviz as az
1927
import matplotlib.pyplot as plt
2028
import numpy as np
21-
import pymc3 as pm
22-
import theano
29+
import pymc as pm
30+
import pytensor
31+
import seaborn as sns
2332
2433
from pandas import DataFrame
2534
26-
print(f"Running on PyMC3 v{pm.__version__}")
35+
print(f"Running on PyMC v{pm.__version__}")
2736
```
2837

2938
```{code-cell} ipython3
3039
%config InlineBackend.figure_format = 'retina'
3140
az.style.use("arviz-darkgrid")
3241
np.random.seed(42)
33-
pm.set_tt_rng(42)
3442
```
3543

3644
## Multimodal density
@@ -42,12 +50,14 @@ mu = pm.floatX([-0.3, 0.5])
4250
sd = pm.floatX([0.1, 0.1])
4351
4452
with pm.Model() as model:
45-
x = pm.NormalMixture("x", w=w, mu=mu, sigma=sd, dtype=theano.config.floatX)
46-
trace = pm.sample(50000)
53+
x = pm.NormalMixture("x", w=w, mu=mu, sigma=sd)
54+
trace = pm.sample(50_000, return_inferencedata=False)
4755
```
4856

4957
```{code-cell} ipython3
50-
az.plot_trace(trace);
58+
with model:
59+
idata = pm.to_inference_data(trace)
60+
az.plot_trace(idata);
5161
```
5262

5363
Great. First having a trace we can create `Empirical` approx
@@ -65,7 +75,7 @@ with model:
6575
approx
6676
```
6777

68-
This type of approximation has it's own underlying storage for samples that is `theano.shared` itself
78+
This type of approximation has it's own underlying storage for samples that is `pytensor.shared` itself
6979

7080
```{code-cell} ipython3
7181
approx.histogram
@@ -93,10 +103,6 @@ Sampling from posterior is done uniformly with replacements. Call `approx.sample
93103
new_trace = approx.sample(50000)
94104
```
95105

96-
```{code-cell} ipython3
97-
%timeit new_trace = approx.sample(50000)
98-
```
99-
100106
After sampling function is compiled sampling bacomes really fast
101107

102108
```{code-cell} ipython3
@@ -112,7 +118,8 @@ mu = pm.floatX([0.0, 0.0])
112118
cov = pm.floatX([[1, 0.5], [0.5, 1.0]])
113119
with pm.Model() as model:
114120
pm.MvNormal("x", mu=mu, cov=cov, shape=2)
115-
trace = pm.sample(1000)
121+
trace = pm.sample(1000, return_inferencedata=False)
122+
idata = pm.to_inference_data(trace)
116123
```
117124

118125
```{code-cell} ipython3
@@ -124,17 +131,12 @@ with model:
124131
az.plot_trace(approx.sample(10000));
125132
```
126133

127-
```{code-cell} ipython3
128-
import seaborn as sns
129-
```
130-
131134
```{code-cell} ipython3
132135
kdeViz_df = DataFrame(
133-
data=approx.sample(1000)["x"], columns=["First Dimension", "Second Dimension"]
136+
data=approx.sample(1000).posterior["x"].squeeze(),
137+
columns=["First Dimension", "Second Dimension"],
134138
)
135-
```
136139
137-
```{code-cell} ipython3
138140
sns.kdeplot(data=kdeViz_df, x="First Dimension", y="Second Dimension")
139141
plt.show()
140142
```
@@ -152,7 +154,7 @@ Now we can estimate the same covariance using `Empirical`
152154
print(approx.cov)
153155
```
154156

155-
That's a tensor itself
157+
That's a tensor object, which we need to evaluate.
156158

157159
```{code-cell} ipython3
158160
print(approx.cov.eval())
@@ -164,7 +166,19 @@ Estimations are very close and differ due to precision error. We can get the mea
164166
print(approx.mean.eval())
165167
```
166168

169+
## Authors
170+
171+
* Authored by Maxim Kochurov ([pymc#2389](https://github.com/pymc-devs/pymc/pull/2389]))
172+
* Updated by Maxim Kochurov ([pymc#2416](https://github.com/pymc-devs/pymc/pull/2416))
173+
* Updated by Raul Maldonado ([pymc-examples#21](https://github.com/pymc-devs/pymc-examples/pull/21))
174+
* Updated by Chris Fonnesbeck ([pymc-examples#429](https://github.com/pymc-devs/pymc-examples/pull/497))
175+
176+
## Watermark
177+
167178
```{code-cell} ipython3
168179
%load_ext watermark
169180
%watermark -n -u -v -iv -w
170181
```
182+
183+
:::{include} ../page_footer.md
184+
:::

examples/variational_inference/gaussian-mixture-model-advi.ipynb

-1,027
This file was deleted.

0 commit comments

Comments
 (0)