Skip to content

PicklingError with njobs>1 using DensityDist #1995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
junpenglao opened this issue Apr 5, 2017 · 6 comments
Closed

PicklingError with njobs>1 using DensityDist #1995

junpenglao opened this issue Apr 5, 2017 · 6 comments

Comments

@junpenglao
Copy link
Member

While testing the model in #1994, I ran into a PicklingError on multiple jobs:
The model:

import numpy as np
import pymc3 as pm
import theano.tensor as tt
#np.random.seed(42)
theta_true = (25, 0.5)
xdata = 100 * np.random.random(20)
ydata = theta_true[0] + theta_true[1] * xdata

# add scatter to points
xdata = np.random.normal(xdata, 10)
ydata = np.random.normal(ydata, 10)

with pm.Model() as model1:
#    alpha = pm.Uniform('intercept', -100, 100)
    alpha = pm.Normal('intercept', mu=0, sd=100)
    # Create custom densities, you must supply logp
    beta = pm.DensityDist('beta', lambda value: -1.5 * tt.log(1 + value**2), testval=0)
    eps = pm.DensityDist('eps', lambda value: -tt.log(tt.abs_(value)), testval=1)
    
    mu_print = T.printing.Print('beta')(beta)
    
    # Create likelihood
    like = pm.Normal('y_est', mu=alpha + beta * xdata, sd=eps, observed=ydata)
    
    trace = pm.sample(1e3, njobs=2) # Make sure not to draw too many samples

error output:


  File "<ipython-input-17-ae0b5548717c>", line 26, in <module>
    trace = pm.sample(1e3, njobs=2) # Make sure not to draw too many samples

  File "/usr/local/lib/python3.5/dist-packages/pymc3/sampling.py", line 193, in sample
    return sample_func(**sample_args)

  File "/usr/local/lib/python3.5/dist-packages/pymc3/sampling.py", line 370, in _mp_sample
    **kwargs) for i in range(njobs))

  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 789, in __call__
    self.retrieve()

  File "/usr/local/lib/python3.5/dist-packages/joblib/parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))

  File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value

  File "/usr/lib/python3.5/multiprocessing/pool.py", line 385, in _handle_tasks
    put(task)

  File "/usr/local/lib/python3.5/dist-packages/joblib/pool.py", line 371, in send
    CustomizablePickler(buffer, self._reducers).dump(obj)

PicklingError: Can't pickle <function <lambda> at 0x7fd2e473af28>: attribute lookup <lambda> on __main__ failed

No problem if njobs=1

@ferrine
Copy link
Member

ferrine commented Apr 5, 2017

Multiprocessing requires everything passed to thread to be picklable. Lambdas defined in local namespace are not picklable. Hopefully it will work if you define density beforehand

@junpenglao
Copy link
Member Author

I see. Yep defining the function beforehand works.

# add scatter to points
xdata = np.random.normal(xdata, 10)
ydata = np.random.normal(ydata, 10)

def loglike1(value):
    return -1.5 * tt.log(1 + value**2)
def loglike2(value):
    return -tt.log(tt.abs_(value))

with pm.Model() as model1:
#    alpha = pm.Uniform('intercept', -100, 100)
    alpha = pm.Normal('intercept', mu=0, sd=100)
    
    # Create custom densities, you must supply logp
    beta = pm.DensityDist('beta', loglike1, testval=0)
    eps = pm.DensityDist('eps', loglike2, testval=1)
    
    # Create likelihood
    like = pm.Normal('y_est', mu=alpha + beta * xdata, sd=eps, observed=ydata)
    
    trace = pm.sample(1e3, njobs=2) # Make sure not to draw too many samples

Maybe this should be in the doc strings of DensityDist and somewhere in the doc? #1968

@rahuldave
Copy link

rahuldave commented Apr 5, 2017

Totally agreed. I had not known that lambdas are not picklable! (And have been using python for years)!

@Joshuaalbert
Copy link

To open an old thread. Lambda are pickleable with the package dill. It brings pickles to lambdas and many other things. https://github.com/uqfoundation/dill

@ColCarroll
Copy link
Member

Problem is that pickling comes up when using multiprocessing through joblib, so we don't control it in this project. We used to have raw Multiprocessing code here, but this lowers the LOC to maintain.

It looked like from some comments that importing dill before joblib might help, but I can't make that work locally.

Here's some code if you want to try to get it:

import dill
import joblib

def square(x):
    return x * x

function_jobs = (joblib.delayed(square)(j) for j in range(10))
lambda_jobs = (joblib.delayed(lambda x: x * x)(j) for j in range(10))

joblib.Parallel(n_jobs=4)(function_jobs)  # works
joblib.Parallel(n_jobs=4)(lambda_jobs)  # PicklingError!

For what it is worth, I ran into this over the weekend when I needed a parameterized density distribution (subfunctions are also not pickleable). I had to write a theano.Op, which was bad, but not terrible.

@Joshuaalbert
Copy link

Might be work asking https://github.com/mmckerns if he has any advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants