20x Performance degradation when using theano shared variables #3818

jvans1 · 2020-02-27T17:37:51Z

Description of your problem

Hi,

I noticed a pretty significant performance hit when using theano shared variables. Please correct me if I'm doing something wrong. If this is a bug, I am happy to dig into this a bit more if someone can perhaps point me in the right direction

Please provide a minimal, self-contained, and reproducible example.

import pymc3 as pm
import numpy as np
Y = 95
N = 100
with pm.Model() as binomial_model1:
    pct = pm.Beta("pct", alpha=2, beta=2)
    pm.Binomial("obs", n=N, p=pct, observed=Y)
    binomial_traces1 = pm.sample(2000, tune=500, cores=2)

%%timeit
pm.sample_posterior_predictive(binomial_traces1, samples=5000, model=binomial_model1, progressbar=False)

This returns:
1.66 s ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

When I do the same thing but used theano shared variables I see the perf hit:

Y = 95
N = 100
with pm.Model() as binomial_model2:
    Ys = pm.Data('Ys', Y)
    ns = pm.Data('Ns', N)
    pct = pm.Beta("pct", alpha=2, beta=2)
    pm.Binomial("obs", n=ns, p=pct, observed=Ys)
    binomial_traces2 = pm.sample(2000, tune=500, cores=2)

%%timeit
pm.sample_posterior_predictive(binomial_traces2, samples=5000, model=binomial_model2, progressbar=False)

This results in:

31.7 s ± 498 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

My notebook for this is here

Versions and main components

PyMC3 Version: 3.8
Theano Version: 1.0.4
Python Version: 3.7.4
Operating system: Ubuntu 18.04
How did you install PyMC3: conda

The text was updated successfully, but these errors were encountered:

rpgoldman · 2020-02-27T23:08:08Z

Interestingly, I ran this notebook on my laptop and compared the two versions on my fast_sample_posterior_predictive (see #3597), which is vectorized. I also see a slowdown, but the overall time cost is not as bad:

Without pm.Data:

1.66 s ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -- sample_posterior_predictive
1.54 ms ± 91.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) -- fast spp

With pm.Data:

31.5 s ± 1.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.1 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I don't know theano very well. Am I right in believing that this means it does not do constant folding?

twiecki · 2020-02-28T07:20:18Z

I would suspect that is expected, there are many optimizations that could be made if you know the length of an array. In Theano, they can't anticipate when a shared array will change its length. However, in PyMC3 we actually can as we know it will stay constant during inference. Not sure if there is a way to exploit that, however. We could ask the Theano guys.

michaelosthege mentioned this issue Oct 26, 2021

Tweak pm.Data to create or auto-replace by TensorConstant #5105

Closed

ricardoV94 closed this as completed Jan 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

20x Performance degradation when using theano shared variables #3818

20x Performance degradation when using theano shared variables #3818

jvans1 commented Feb 27, 2020

rpgoldman commented Feb 27, 2020

Uh oh!

twiecki commented Feb 28, 2020

Uh oh!

Uh oh!

20x Performance degradation when using theano shared variables #3818

20x Performance degradation when using theano shared variables #3818

Comments

jvans1 commented Feb 27, 2020

Description of your problem

Versions and main components

rpgoldman commented Feb 27, 2020

Uh oh!

twiecki commented Feb 28, 2020

Uh oh!