Weird turbulence cause by generative discrete RV #1990

junpenglao · 2017-04-04T11:34:48Z

In Stan, there is an option to write a generated quantities block for sample generation. Doing the similar in pymc3, however, seems to introduce weird turbulence to the sampler, especially if the generated RV is discrete.

Consider the following simple sample:

import numpy as np
import pymc3 as pm
import theano.tensor as tt  

# Data
x = np.array([1.1, 1.9, 2.3, 1.8])
n = len(x)

sigmoid = lambda x: 1/(1 + tt.exp(-x))
np.random.seed(42)
with pm.Model() as model1:
    # prior
    mu = pm.Normal('mu', mu=0, tau=.001)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    # observed
    xi = pm.Normal('xi', mu=mu, tau=1/(sigma**2), observed=x)
    # inference
    trace = pm.sample(1000, njobs=5, tune=500, init=None)

pm.traceplot(trace, varnames=['mu', 'sigma_interval_'])

The result is fairly normal:

Assigned NUTS to mu
Assigned NUTS to sigma_interval_
100%|██████████| 1000/1000 [00:01<00:00, 855.81it/s]

If now I add an addition RV node to the graph:

np.random.seed(42)
with pm.Model() as model1:
    # prior
    mu = pm.Normal('mu', mu=0, tau=.001)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    # observed
    xi = pm.Normal('xi', mu=mu, tau=1/(sigma**2), observed=x)
    # generation 
    p = pm.Deterministic('p', sigmoid(mu))
    count = pm.Binomial('count', n=10, p=p, shape=10)
    # inference
    trace = pm.sample(1000, njobs=5, tune=500, init=None)

The output trace is quite unstable and converge much slower:

Assigned NUTS to mu
Assigned NUTS to sigma_interval_
Assigned Metropolis to count
100%|██████████| 1000/1000 [00:06<00:00, 153.00it/s]

if the added RV is continuous, the effect seems to be minimal:

np.random.seed(42)
with pm.Model() as model1:
    # prior
    mu = pm.Normal('mu', mu=0, tau=.001)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    # observed
    xi = pm.Normal('xi', mu=mu, tau=1/(sigma**2), observed=x)
    # generation 
    p = pm.Deterministic('p', sigmoid(mu))
    xi_ = pm.Normal('xi_', mu=p, shape=10)
    # inference
    trace = pm.sample(1000, njobs=5, tune=500, init=None)

Assigned NUTS to mu
Assigned NUTS to sigma_interval_
Assigned NUTS to xi_
100%|██████████| 1000/1000 [00:07<00:00, 128.04it/s]

The text was updated successfully, but these errors were encountered:

junpenglao · 2017-04-04T11:35:15Z

@twiecki following our discussion on twitter.

twiecki · 2017-04-06T10:15:24Z

Interesting. I think the stan guys have warned for a while against mixing NUTS/HMC and other samples. In the cases I tested it worked well but maybe that's the underlying reason here. @bob-carpenter is this surprising to you?

@junpenglao You could alternatively solve this with sample_ppc.

junpenglao · 2017-04-06T11:28:09Z

@twiecki thanks, the problem with sample_ppc is that currently it only sample prediction from an observed RV. So my solution now is to sample from the trace and generate a new RV (eg, see cell 7)

Another easy solution is to plug in a theano random node just to sample from it:
instead of

    count = pm.Binomial('count', n=10, p=p, shape=10)

do:

    rng = tt.shared_randomstreams.RandomStreams()
    count_ = rng.binomial(n=10, p=p, size=(10,))
    count = pm.Deterministic('count', count_)

Only NUTS is used for sample and return no problem.

I think the stan guys have warned for a while against mixing NUTS/HMC and other samples.

They are right. In my experience, it either failed (completely stall) or unstable if the discrete RV depends on other continuous RV which is sample by NUTS. But if the discrete RV is on the top of the hierarchy mix sampler is fine:
http://nbviewer.jupyter.org/github/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3/blob/master/CaseStudies/NumberConceptDevelopment.ipynb

Spaak · 2017-04-06T11:35:10Z

FWIW, in my model, I'm sampling from a Bernoulli RV using BinaryGibbsMetropolis, with all the other RVs sampled using NUTS. The Bernoulli RV is not the top of the hierarchy, the probability is sampled according to a Beta prior (also using NUTS). This seems to work well, I'm getting pretty healthy traces in most cases.

bob-carpenter · 2017-04-06T14:52:47Z

In Stan, the generated quantities block does not feed back into the model block, so adding samples conditioned on parameters there won't affect sampling of the parameters. I don't know PyMC3 or Theano, but I think that's what @junpenglao is saying above.

The concern with mixing discrete and continuous sampling is that the change in discrete parameters will affect the continuous distribution's geometry so that the adaptation may be inappropriate. We also don't know how many iterations we have to run to get a decent sample when the discrete parameters change. We know that HMC is hypersensitive to its tuning parameters (mass matrix and step size). We haven't evaluated any of this, so I'm curious what the PyMC3 users have found. The way to evaluate is to simulate data and look at posterior coverage; there's a nice paper by Cook, Gelman, and Rubin on testing models.

junpenglao · 2017-06-20T14:55:51Z

FYI, this is a wrong way to do sample generation in PyMC3, as putting additional stochastic RVs in the model would mean the logp of that RV is also added to the model logp, which might result in bias in the estimation.

bob-carpenter · 2017-06-20T16:29:08Z

@junpenglao It will work if the generated data is conditionally independent, as it often is. For example, if I generate a new observation y_new from normal(mu, sigma) where mu and sigma are parameters in the model. It won't cause bias, because y_new is conditionally independent.

If you're in a regression context, though, and also have x_new predictors, then adding a new y_new generated from normal(x_new * beta, sigma), then you get the effect of "semi-supervised learning". This is the right thing to do in a Bayesian setting for prediction if x_new is generated from the same data-generating process as the original x.

junpenglao · 2017-06-20T20:01:48Z

Thanks for the detail explanation Bob! I agree with you that should be the case in general. But I am still a bit unsure, if the logp of the y_new in your example above is added to the model logp, would that have no effect on the sampling at all?

bob-carpenter · 2017-06-21T05:31:29Z

Right. You need to look at the marginals. Just take a simple case:

mu ~ normal(0, 2);
sigma ~ lognormal(0, 1);
y ~ normal(mu, sigma);
y_new ~ normal(mu, sigma);

and assume y is observed and y_new is a parameter. Then all you need to do to show that adding y_new has no effect on mu and sigma is to note that the posterior for the model above factors

p(y_new, mu, sigma | y)
  = p(y_new | mu, sigma) * p(mu, sigma | y)
  = p(y_new | mu, sigma) * p(y | mu, sigma) * p(mu, sigma) / p(y)

You can't make this clean factorization when there's a predictor x_new being multiplied by beta, because the posterior you get is p(y_new, beta, sigma | x, x_new) and you can't factor out that x_new because it's only being used to condition.

P.S. Of course, you can also verify it in this case by running it. You should get the same marginal posterior for (mu, sigma) with or without y_new included in the model.

junpenglao · 2017-06-21T06:09:56Z

I see. Thanks! So in this case the issue is more because of the mixing sampler then.

This reverts commit 57726c3.

junpenglao mentioned this issue Apr 26, 2017

DOCS: explain Compound Step in sampling #2083

Closed

junpenglao mentioned this issue May 29, 2017

WIP: replace Hessian scaling guessing by an identity matrix #2232

Merged

twiecki pushed a commit that referenced this issue Jun 20, 2017

Add warning related to issue #1990 (#2335)

57726c3

junpenglao pushed a commit that referenced this issue Jun 21, 2017

Revert "Add warning related to issue #1990 (#2335)"

527f770

This reverts commit 57726c3.

junpenglao mentioned this issue Jun 21, 2017

Revert "Add warning related to issue #1990" #2338

Merged

twiecki pushed a commit that referenced this issue Jun 21, 2017

Revert "Add warning related to issue #1990 (#2335)"

b2ad15f

This reverts commit 57726c3.

twiecki closed this as completed Dec 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Weird turbulence cause by generative discrete RV #1990

Weird turbulence cause by generative discrete RV #1990

junpenglao commented Apr 4, 2017 •

edited

Loading

junpenglao commented Apr 4, 2017

Uh oh!

twiecki commented Apr 6, 2017 •

edited

Loading

Uh oh!

junpenglao commented Apr 6, 2017 •

edited

Loading

Uh oh!

Spaak commented Apr 6, 2017

Uh oh!

bob-carpenter commented Apr 6, 2017

Uh oh!

junpenglao commented Jun 20, 2017

Uh oh!

bob-carpenter commented Jun 20, 2017

Uh oh!

junpenglao commented Jun 20, 2017

Uh oh!

bob-carpenter commented Jun 21, 2017 •

edited

Loading

Uh oh!

junpenglao commented Jun 21, 2017

Uh oh!

Uh oh!

Weird turbulence cause by generative discrete RV #1990

Weird turbulence cause by generative discrete RV #1990

Comments

junpenglao commented Apr 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

junpenglao commented Apr 4, 2017

Uh oh!

twiecki commented Apr 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junpenglao commented Apr 6, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Spaak commented Apr 6, 2017

Uh oh!

bob-carpenter commented Apr 6, 2017

Uh oh!

junpenglao commented Jun 20, 2017

Uh oh!

bob-carpenter commented Jun 20, 2017

Uh oh!

junpenglao commented Jun 20, 2017

Uh oh!

bob-carpenter commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junpenglao commented Jun 21, 2017

Uh oh!

junpenglao commented Apr 4, 2017 •

edited

Loading

twiecki commented Apr 6, 2017 •

edited

Loading

junpenglao commented Apr 6, 2017 •

edited

Loading

bob-carpenter commented Jun 21, 2017 •

edited

Loading