Skip to content

20x Performance degradation when using theano shared variables #3818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jvans1 opened this issue Feb 27, 2020 · 2 comments
Closed

20x Performance degradation when using theano shared variables #3818

jvans1 opened this issue Feb 27, 2020 · 2 comments

Comments

@jvans1
Copy link

jvans1 commented Feb 27, 2020

Description of your problem

Hi,

I noticed a pretty significant performance hit when using theano shared variables. Please correct me if I'm doing something wrong. If this is a bug, I am happy to dig into this a bit more if someone can perhaps point me in the right direction

Please provide a minimal, self-contained, and reproducible example.

import pymc3 as pm
import numpy as np
Y = 95
N = 100
with pm.Model() as binomial_model1:
    pct = pm.Beta("pct", alpha=2, beta=2)
    pm.Binomial("obs", n=N, p=pct, observed=Y)
    binomial_traces1 = pm.sample(2000, tune=500, cores=2)
%%timeit
pm.sample_posterior_predictive(binomial_traces1, samples=5000, model=binomial_model1, progressbar=False)

This returns:
1.66 s ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

When I do the same thing but used theano shared variables I see the perf hit:

Y = 95
N = 100
with pm.Model() as binomial_model2:
    Ys = pm.Data('Ys', Y)
    ns = pm.Data('Ns', N)
    pct = pm.Beta("pct", alpha=2, beta=2)
    pm.Binomial("obs", n=ns, p=pct, observed=Ys)
    binomial_traces2 = pm.sample(2000, tune=500, cores=2)
%%timeit
pm.sample_posterior_predictive(binomial_traces2, samples=5000, model=binomial_model2, progressbar=False)

This results in:

31.7 s ± 498 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

My notebook for this is here

Versions and main components

  • PyMC3 Version: 3.8
  • Theano Version: 1.0.4
  • Python Version: 3.7.4
  • Operating system: Ubuntu 18.04
  • How did you install PyMC3: conda
@rpgoldman
Copy link
Contributor

Interestingly, I ran this notebook on my laptop and compared the two versions on my fast_sample_posterior_predictive (see #3597), which is vectorized. I also see a slowdown, but the overall time cost is not as bad:

Without pm.Data:

1.66 s ± 15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -- sample_posterior_predictive
1.54 ms ± 91.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) -- fast spp

With pm.Data:

31.5 s ± 1.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
13.1 ms ± 412 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I don't know theano very well. Am I right in believing that this means it does not do constant folding?

@twiecki
Copy link
Member

twiecki commented Feb 28, 2020

I would suspect that is expected, there are many optimizations that could be made if you know the length of an array. In Theano, they can't anticipate when a shared array will change its length. However, in PyMC3 we actually can as we know it will stay constant during inference. Not sure if there is a way to exploit that, however. We could ask the Theano guys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants