Skip to content

GPU support - examples #2033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 of 14 tasks
springcoil opened this issue Apr 14, 2017 · 25 comments
Closed
6 of 14 tasks

GPU support - examples #2033

springcoil opened this issue Apr 14, 2017 · 25 comments
Labels

Comments

@springcoil
Copy link
Contributor

springcoil commented Apr 14, 2017

I ran the latest master Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5110) (on AWS p2.xlarge).

I used ami-e904398f (on a spot instance) and installed conda and then installed pymc3/ theano to latest master.

I ran a few experiments after installing the latest master. I'll document here what I see.

I've used the examples folder for these benchmarks. A tick confirms it runs properly

  • arma_example
  • baseball
  • lasso_missing
  • lightspeed_example
  • censored_data
  • gelman_schools
  • GHME_2013
  • custom_dists
  • disaster model arbitrary deterministic
  • edward_beta_bernoulli
  • factor_potential
  • gelman_bioassay
  • LKJ Correlation
  • simple test
@springcoil
Copy link
Contributor Author

Error on lasso_missing

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000
         Iterations: 0
         Function evaluations: 2
         Gradient evaluations: 1
Traceback (most recent call last):
  File "lasso_missing.py", line 43, in <module>
    start = pm.find_MAP()
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/tuning/starting.py", line 168, in find_MAP
    specific_errors)
ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support. max: {'p_mother_logodds_': array(0.0, dtype=float32), 'p_disab_logodds_': array(0.0, dtype=float32), 'mother_imp_missing': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0]), 'siblings_imp_missing': array([], dtype=int64), 'disability_imp_missing': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 's_log_': array(1.6094379425048828, dtype=float32), 'sib_mean_log_': array(-0.3665129244327545, dtype=float32), 'beta': array([ 0.1,  0.1,  0.1,  0.1,  0.1,  0.1,  0.1], dtype=float32)} logp: array(nan) dlogp: array([  8.98254013e+01,  -5.85000000e+01,  -9.50000000e+00,
         5.58700703e+04,   6.49721985e+02,   3.32593994e+02,
         7.25313965e+02,   1.26741997e+02,   3.48175547e+04,
         3.19725983e+02,   2.59157990e+02], dtype=float32)Check that 1) you don't have hierarchical parameters, these will lead to points with infinite density. 2) your distribution logp's are properly specified. Specific issues:
beta.logp bad: nan```

@springcoil
Copy link
Contributor Author

Arma works it seems!

@springcoil
Copy link
Contributor Author

Error in lightspeed example too. It seems to be very similar to the error in the baseball example. Perhaps @kyleabeauchamp or @nouiz can comment.

    return TransformedDistribution.dist(dist, self)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/distributions/distribution.py", line 48, in dist
    dist.__init__(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/distributions/transforms.py", line 61, in __init__
    v = forward(FreeRV(name='v', distribution=dist))
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/model.py", line 800, in __init__
    self.logp_elemwiset = distribution.logp(self)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/distributions/continuous.py", line 152, in logp
    value >= lower, value <= upper)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/tensor/var.py", line 69, in __ge__
    rval = theano.tensor.basic.ge(self, other)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/gof/op.py", line 625, in __call__
    storage_map[ins] = [self._get_test_value(ins)]
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/gof/op.py", line 562, in _get_test_value
    ret = v.type.filter(v.tag.test_value)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/tensor/type.py", line 173, in filter
    raise TypeError(err_msg, data)
TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
TensorType(float32, scalar) cannot store accurately value 5331.81038143, it would be represented as 5331.810546875. If you do not mind this precision loss, you can: 1) explicitly convert your data to a numpy array of dtype float32, or 2) set "allow_input_downcast=True" when calling "function".
5331.81038143```

@junpenglao
Copy link
Member

the lasso_missing is the start = pm.find_MAP(). Since we now have ADVI init we should use that instead.

@springcoil
Copy link
Contributor Author

Yeah we should - wanna open a PR?

@springcoil
Copy link
Contributor Author

For GHME I get Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5110) QXcbConnection: Could not connect to display :0 - which I think is a Linux Debian error not a PyMC3 error. But I'll note it.

@springcoil
Copy link
Contributor Author

springcoil commented Apr 14, 2017

Error on custom_dists.

Warning: Desired error not necessarily achieved due to precision loss.
         Current function value: 10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104.000000
         Iterations: 0
         Function evaluations: 2
         Gradient evaluations: 1
Traceback (most recent call last):
  File "custom_dists.py", line 38, in <module>
    start = pymc3.find_MAP()
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/tuning/starting.py", line 168, in find_MAP
    specific_errors)
ValueError: Optimization error: max, logp or dlogp at max have non-finite values. Some values may be outside of distribution support. max: {'intercept_interval_': array(0.0, dtype=float32), 'slope': array(0.0, dtype=float32), 'sigma': array(1.0, dtype=float32)} logp: array(nan) dlogp: array([ 46097.95703125,  47503.90234375,  47796.69921875], dtype=float32)Check that 1) you don't have hierarchical parameters, these will lead to points with infinite density. 2) your distribution logp's are properly specified. Specific issues:
intercept_interval_.logp bad: nan``` 



Cc @junpenglao 

@springcoil
Copy link
Contributor Author

springcoil commented Apr 14, 2017

Disaster model fails too

    rate = rateFunc(switchpoint, early_mean, late_mean)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/gof/op.py", line 615, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/gof/op.py", line 966, in make_node
    (str(self.itypes), str([inp.type for inp in inputs])))
TypeError: We expected inputs of types '[TensorType(int64, scalar), TensorType(float64, scalar), TensorType(float64, scalar)]' but got types '[TensorType(int64, scalar), TensorType(float32, scalar), TensorType(float32, scalar)]'```

@junpenglao
Copy link
Member

junpenglao commented Apr 14, 2017

I know @ferrine is working on changing the ADVI init to the new ADVI API - maybe we should wait after that?

@springcoil springcoil changed the title GPU support - examples WIP - GPU support - examples Apr 14, 2017
@springcoil springcoil changed the title WIP - GPU support - examples GPU support - examples Apr 14, 2017
@springcoil
Copy link
Contributor Author

Error on the Edward example File "edward_beta_bernoulli.py", line 34, in <module> inference = ed.KLqp({'p': qp}, data, model) File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/edward/inferences/klqp.py", line 54, in __init__ super(KLqp, self).__init__(*args, **kwargs) File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/edward/inferences/variational_inference.py", line 23, in __init__ super(VariationalInference, self).__init__(*args, **kwargs) TypeError: __init__() takes from 1 to 3 positional arguments but 4 were given

@springcoil
Copy link
Contributor Author

Failure from factor_potential - however I think that's API changes too. /home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/sampling.py:231: UserWarning: Instantiated step methods cannot be automatically initialized. init argument ignored. warnings.warn('Instantiated step methods cannot be automatically ' 100%|█████████████████████████████████████| 3000/3000 [00:00<00:00, 4612.06it/s]

@springcoil
Copy link
Contributor Author

Agreed @junpenglao about the API changes by @ferrine. I just thought this was a good few tests to do - just to check it on the GPU given the recent work by @nouiz and @twiecki and @kyleabeauchamp

@springcoil
Copy link
Contributor Author

springcoil commented Apr 14, 2017

Warning on LKJ and gelman_bioassay

         Current function value: 2489.000503
         Iterations: 34
         Function evaluations: 118
         Gradient evaluations: 107
/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/sampling.py:231: UserWarning: Instantiated step methods cannot be automatically initialized. init argument ignored.
  warnings.warn('Instantiated step methods cannot be automatically '
100%|███████████████████████████████████████| 1000/1000 [00:50<00:00, 12.22it/s]/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/step_methods/hmc/nuts.py:237: UserWarning: Step size tuning was enabled throughout the whole trace. You might want to specify the number of tuning steps.
  warnings.warn('Step size tuning was enabled throughout the whole '
/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2889: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)``` 

@junpenglao
Copy link
Member

Edward had some API change (quite major following the tensorflow 1.0.0 release). I am not sure if they are still supporting pymc3 models.

@springcoil
Copy link
Contributor Author

Yeah that's a valid point.

@springcoil
Copy link
Contributor Author

Error on simpletest.

    total_size=total_size, model=self)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/model.py", line 800, in __init__
    self.logp_elemwiset = distribution.logp(self)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/distributions/transforms.py", line 74, in logp
    return (self.dist.logp(self.transform_used.backward(x)) +
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/distributions/transforms.py", line 102, in backward
    return invlogit(x, 0.0)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/pymc3/math.py", line 34, in invlogit
    return (1 - 2 * eps) / (1 + tt.exp(-x)) + eps
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/tensor/var.py", line 41, in __neg__
    return theano.tensor.basic.neg(self)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/gof/op.py", line 625, in __call__
    storage_map[ins] = [self._get_test_value(ins)]
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/gof/op.py", line 562, in _get_test_value
    ret = v.type.filter(v.tag.test_value)
  File "/home/ubuntu/miniconda3/envs/bunnies/lib/python3.5/site-packages/theano/tensor/type.py", line 173, in filter
    raise TypeError(err_msg, data)
TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
TensorType(float32, scalar) cannot store accurately value 0.597836916767, it would be represented as 0.5978369116783142. If you do not mind this precision loss, you can: 1) explicitly convert your data to a numpy array of dtype float32, or 2) set "allow_input_downcast=True" when calling "function".
0.597836916767``` 

Seems to be similar to some of the other errors. 

@twiecki
Copy link
Member

twiecki commented Apr 14, 2017

@junpenglao That's what this is testing, it is testing our own wrappers for Edward (live in external subdir).

@twiecki
Copy link
Member

twiecki commented Apr 14, 2017

@springcoil Do you adjust the floats to 32 and ints to 16 everywhere before you run the model?

@springcoil
Copy link
Contributor Author

springcoil commented Apr 14, 2017

I just shut down the AMI - it could well be in some of the settings do that maybe in one of the settings files. I just ran things as they were calling python

Maybe there are better ways to do this.

@springcoil
Copy link
Contributor Author

I'll try to reproduce this experiment with the same AMI - and change the settings. Does anyone have any settings files I should change like is it it the .theanorc ?

@twiecki
Copy link
Member

twiecki commented Apr 14, 2017

No, any data you pass in should be pm.Normal(..., observed=x.astype(theano.config.floatX).

@springcoil
Copy link
Contributor Author

Ok I'll edit a few of the examples files to do that. And test it.

@junpenglao
Copy link
Member

@twiecki But the Edward API is changed, it was inference = ed.KLqp(latent_vars, data, model) before for inference but now it's just inference = ed.KLqp(latent_vars, data). I will give it a go to see if I can update it. #2034

@springcoil
Copy link
Contributor Author

springcoil commented Apr 14, 2017

GHME_2013 and gelman bioassay example works with one fix of the matplotlibrc file - and the recent fix I got merged in. #2035

@springcoil
Copy link
Contributor Author

There seem to be a few errors relating to bound which I've not had time to investigate. And other errors relating to pm.find_MAP. I think this will be a work in progress - but it was worth investigating :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants