Skip to content

Arbitrary Deterministic without Theano function? #507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyleabeauchamp opened this issue Mar 9, 2014 · 28 comments
Closed

Arbitrary Deterministic without Theano function? #507

kyleabeauchamp opened this issue Mar 9, 2014 · 28 comments

Comments

@kyleabeauchamp
Copy link
Contributor

I'm trying to build a model that performs a calculation that is unavailable in Theano. In pymc2, I could do this via a deterministic that acted on the numpy form of a random variable (e.g. x.value).

In pymc3, is there any way to do this? For example, I need to calculate a matrix exponential, but there is no Theano function for matrix exponential. In pymc2, I could get around this by using a scipy.linalg function.

Am I right in concluding that pymc3 can only support operations that are combinations of theano functions?

To me, one of the key advantages of PYMC2 was that I could plug-in nearly arbitrary code for use in an MCMC framework. I realize that Theano / auto-gradient is required for using HMC / NUTS samplers on arbitrary problems. However, it would be awesome if there were still an easy way to call arbitrary functions / libraries, even if it means "falling back" on old-school step methods.

@kyleabeauchamp
Copy link
Contributor Author

If this is possible, we probably should get an example. The current examples of deterministics are very minimal.

@kyleabeauchamp
Copy link
Contributor Author

Another reason for supporting easy plug-in code is that hand-tuned cython / SSE will probably outperform theano in many cases. This would give users a very clear path to optimizing their code--find the rate limiting step and optimize it.

@kyleabeauchamp
Copy link
Contributor Author

I suppose one way to do this is via extending Theano (via theano.Op). I'm looking at this now.

@jsalvatier
Copy link
Member

I agree that we need to support this. I think the theano.Op way is the
right way, I think. Right now it's a little cumbersome, so we need to
figure out how to make it simpler. You can see
herehttps://github.com/pymc-devs/pymc/blob/master/pymc/distributions/special.pyfor
a unary example. One difficulty is providing derivatives if necessary.

On Sun, Mar 9, 2014 at 3:12 PM, kyleabeauchamp [email protected]:

I suppose one way to do this is via extending Theano (via theano.Op). I'm
looking at this now.

Reply to this email directly or view it on GitHubhttps://github.com//issues/507#issuecomment-37141614
.

@jsalvatier
Copy link
Member

I'd love to hear suggestions for how to make it easier and or syntax
suggestions

On Sun, Mar 9, 2014 at 9:48 PM, John Salvatier [email protected] wrote:

I agree that we need to support this. I think the theano.Op way is the
right way, I think. Right now it's a little cumbersome, so we need to
figure out how to make it simpler. You can see herehttps://github.com/pymc-devs/pymc/blob/master/pymc/distributions/special.pyfor a unary example. One difficulty is providing derivatives if necessary.

On Sun, Mar 9, 2014 at 3:12 PM, kyleabeauchamp [email protected]:

I suppose one way to do this is via extending Theano (via theano.Op). I'm
looking at this now.

Reply to this email directly or view it on GitHubhttps://github.com//issues/507#issuecomment-37141614
.

@twiecki
Copy link
Member

twiecki commented Mar 10, 2014

Yeah, I think that is required and we should just raise an error if you use a non-auto-diffable logp with a gradient-based sampler. Writing an OP for each operator seems a little cumbersome. Don't they support blackbox OPs that are doing arbitrary transforms on the inputs and get called like functions without having to define the OP?

@jsalvatier
Copy link
Member

I think not really, but we can probably pressure theano into doing so.

On Mon, Mar 10, 2014 at 10:14 AM, Thomas Wiecki [email protected]:

Yeah, I think that is required and we should just raise an error if you
use a non-auto-diffable logp with a gradient-based sampler. Writing an OP
for each operator seems a little cumbersome. Don't they support blackbox
OPs that are doing arbitrary transforms on the inputs and get called like
functions without having to define the OP?

Reply to this email directly or view it on GitHubhttps://github.com//issues/507#issuecomment-37207934
.

@fonnesbeck
Copy link
Member

In the long-term it probably makes sense to refactor PyMC to have an interface that allows us to swap out sampling engines so that we are not completely tied to Theano. That way, one could conceivably write step methods that use Theano, SymPy, or some other engine for doing sampling. In the shorter term, talking to the Theano folks to see what they might be able to do makes sense. I wonder if any of them will be at SciPy 2014.

@kyleabeauchamp
Copy link
Contributor Author

I agree that being completely tied to Theano is undesirable. For example, many python users have become quite good at the following optimization pipeline:

1.  Python / Numpy
2.  Optimize in Cython using  C / SSE as needed

A first step towards this goal might be to create a PYMC base class that allows gradients to be calculated either manually (via subclass) or automagically (via theano). Then, Theano would just provide one way to get derivatives.

@twiecki
Copy link
Member

twiecki commented Mar 30, 2014

I opened a theano issue about this here:
Theano/Theano#1784

@nouiz
Copy link
Contributor

nouiz commented Apr 10, 2014

This is to tell that I closed this issue as I just merge a PR that should allow that.

@fonnesbeck
Copy link
Member

@noulz Can you link to it here? Thanks!

@nouiz
Copy link
Contributor

nouiz commented Apr 10, 2014

@twiecki
Copy link
Member

twiecki commented Apr 11, 2014

Kudos to the theano guys to making this happen so quickly!
I suppose now we just need a good example of how to do that in PyMC3.

@fonnesbeck
Copy link
Member

We should be able to make the Deterministic function use it automatically, no?

@jsalvatier
Copy link
Member

Yes definitely
On Apr 11, 2014 8:12 AM, "Chris Fonnesbeck" [email protected]
wrote:

We should be able to make the Deterministic function use it
automatically, no?


Reply to this email directly or view it on GitHubhttps://github.com//issues/507#issuecomment-40214538
.

@twiecki
Copy link
Member

twiecki commented Apr 11, 2014

Right, but we also want to support non-theano-likelihoods.

On Fri, Apr 11, 2014 at 11:51 AM, John Salvatier
[email protected]:

Yes definitely
On Apr 11, 2014 8:12 AM, "Chris Fonnesbeck" [email protected]
wrote:

We should be able to make the Deterministic function use it
automatically, no?


Reply to this email directly or view it on GitHub<
https://github.com/pymc-devs/pymc/issues/507#issuecomment-40214538>
.


Reply to this email directly or view it on GitHubhttps://github.com//issues/507#issuecomment-40218980
.

Thomas Wiecki
PhD candidate, Brown University
Quantitative Researcher, Quantopian Inc, Boston

@jsalvatier
Copy link
Member

That should actually be as easy as adding an as_op decorator to the
likelihood method on the distribution.
On Apr 11, 2014 8:54 AM, "Thomas Wiecki" [email protected] wrote:

Right, but we also want to support non-theano-likelihoods.

On Fri, Apr 11, 2014 at 11:51 AM, John Salvatier
[email protected]:

Yes definitely
On Apr 11, 2014 8:12 AM, "Chris Fonnesbeck" [email protected]
wrote:

We should be able to make the Deterministic function use it
automatically, no?


Reply to this email directly or view it on GitHub<
https://github.com/pymc-devs/pymc/issues/507#issuecomment-40214538>
.


Reply to this email directly or view it on GitHub<
https://github.com/pymc-devs/pymc/issues/507#issuecomment-40218980>
.

Thomas Wiecki
PhD candidate, Brown University
Quantitative Researcher, Quantopian Inc, Boston


Reply to this email directly or view it on GitHubhttps://github.com//issues/507#issuecomment-40219280
.

@maahn
Copy link
Contributor

maahn commented Aug 6, 2014

Could you give a minimal example for that? Thanks!

@nouiz
Copy link
Contributor

nouiz commented Aug 6, 2014

Here is the doc we have about as_op:

http://deeplearning.net/software/theano/library/compile/ops.html?highlight=as_op#theano.compile.ops.as_op

On Wed, Aug 6, 2014 at 5:29 AM, maahnman [email protected] wrote:

Could you give a minimal example for that? Thanks!


Reply to this email directly or view it on GitHub
#507 (comment).

@maahn
Copy link
Contributor

maahn commented Aug 6, 2014

Thanks for the hint, I got the coalmine example working with the decorator:

from pymc import *
import theano.tensor as t
from numpy import arange, array, ones, concatenate, empty
from numpy.random import randint

__all__ = ['disasters_data', 'switchpoint', 'early_mean', 'late_mean', 'rate',
            'disasters']

# Time series of recorded coal mining disasters in the UK from 1851 to 1962
disasters_data = array([4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
                            3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
                            2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3, 0, 0,
                            1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
                            0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
                            3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
                            0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1])
years = len(disasters_data)

#here is the trick
@theano.compile.ops.as_op(itypes=[t.lscalar, t.dscalar, t.dscalar],otypes=[t.dvector])
def rate(switchpoint,early_mean, late_mean):
    ''' Concatenate Poisson means '''
    out = empty(years)
    out[:switchpoint] = early_mean
    out[switchpoint:] = late_mean
    return out


with Model() as model:

    # Prior for distribution of switchpoint location
    switchpoint = DiscreteUniform('switchpoint', lower=0, upper=years)
    # Priors for pre- and post-switch mean number of disasters
    early_mean = Exponential('early_mean', lam=1.)
    late_mean = Exponential('late_mean', lam=1.)

    # Allocate appropriate Poisson rates to years before and after current
    # switchpoint location
    idx = arange(years)
    #theano style:
    #rate = switch(switchpoint >= idx, early_mean, late_mean)
    #non-theano style
    rate = rate(switchpoint, early_mean, late_mean)

    # Data likelihood
    disasters = Poisson('disasters', rate, observed=disasters_data)

    # Initial values for stochastic nodes
    start = {'early_mean': 2., 'late_mean': 3.}

    # Use slice sampler for means
    step1 = Slice([early_mean, late_mean])
    # Use Metropolis for switchpoint, since it accomodates discrete variables
    step2 = Metropolis([switchpoint])

    tr = sample(1000, tune=500, start=start, step=[step1, step2])
    traceplot(tr)

However, the reason why I'm interested in pymc3 is multiprocessing, but as soon as I add njobs=4 to sample(), I get

PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

It happens only if I have to use the as_op decorator and it looks like argsample() of sampling.py is to blame even though the function is defined at top level to be able to pickle it?!

@nouiz
Copy link
Contributor

nouiz commented Aug 6, 2014

I know there is some problem with pickle and decorator. @abergeron, you
explained it to me, but I don't recall it enough. You said that the problem
was that the decorator don't handle the name space correctly. Can you look
at as_op to see if it handle it correctly?

Fred

On Wed, Aug 6, 2014 at 11:38 AM, maahnman [email protected] wrote:

Thanks for the hint, I got the coalmine example working with the decorator:

from pymc import *
import theano.tensor as t
from numpy import arange, array, ones, concatenate, empty
from numpy.random import randint

all = ['disasters_data', 'switchpoint', 'early_mean', 'late_mean', 'rate',
'disasters']

Time series of recorded coal mining disasters in the UK from 1851 to 1962

disasters_data = array([4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1])
years = len(disasters_data)

#here is the trick
@theano.compile.ops.as_op(itypes=[t.lscalar, t.dscalar, t.dscalar],otypes=[t.dvector])
def rate(switchpoint,early_mean, late_mean):
''' Concatenate Poisson means '''
out = empty(years)
out[:switchpoint] = early_mean
out[switchpoint:] = late_mean
return out

with Model() as model:

# Prior for distribution of switchpoint location
switchpoint = DiscreteUniform('switchpoint', lower=0, upper=years)
# Priors for pre- and post-switch mean number of disasters
early_mean = Exponential('early_mean', lam=1.)
late_mean = Exponential('late_mean', lam=1.)

# Allocate appropriate Poisson rates to years before and after current
# switchpoint location
idx = arange(years)
#theano style:
#rate = switch(switchpoint >= idx, early_mean, late_mean)
#non-theano style
rate = rate(switchpoint, early_mean, late_mean)

# Data likelihood
disasters = Poisson('disasters', rate, observed=disasters_data)

# Initial values for stochastic nodes
start = {'early_mean': 2., 'late_mean': 3.}

# Use slice sampler for means
step1 = Slice([early_mean, late_mean])
# Use Metropolis for switchpoint, since it accomodates discrete variables
step2 = Metropolis([switchpoint])

tr = sample(1000, tune=500, start=start, step=[step1, step2])
traceplot(tr)

However, the reason why I'm interested in pymc3 is multiprocessing, but as
soon as I add njobs=4 to sample(), I get

PicklingError: Can't pickle <type 'function'>: attribute lookup
builtin.function failed

It happens only if I have to use the as_op decorator and it looks like
argsample() of sampling.py is to blame even though the function is defined
at top level to be able to pickle it?!


Reply to this email directly or view it on GitHub
#507 (comment).

@abergeron
Copy link
Contributor

For as_op it's a bit different since it makes a class, not a function. But it's still not impossible.

@abergeron
Copy link
Contributor

If somebody wants to write tests for that code, we can include it in theano. It should make ops created from as_op pickleable.

@nouiz
Copy link
Contributor

nouiz commented Aug 7, 2014

There is a PR to Theano from @abergeron with a fix:

Theano/Theano#2021

On Wed, Aug 6, 2014 at 12:39 PM, abergeron [email protected] wrote:

If somebody wants to write tests for that code, we can include it in
theano. It should make ops created from as_op pickleable.


Reply to this email directly or view it on GitHub
#507 (comment).

@twiecki
Copy link
Member

twiecki commented Aug 13, 2014

@maahnman The pickle PR seems merged. Can you try if that fixes the psample issue?

@maahn
Copy link
Contributor

maahn commented Aug 14, 2014

It works, thanks for the quick fix!

@twiecki
Copy link
Member

twiecki commented Aug 15, 2014

Great, do you want to do a PR with an example (like the one you already have)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants