Skip to content

Weird turbulence cause by generative discrete RV #1990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
junpenglao opened this issue Apr 4, 2017 · 10 comments
Closed

Weird turbulence cause by generative discrete RV #1990

junpenglao opened this issue Apr 4, 2017 · 10 comments

Comments

@junpenglao
Copy link
Member

junpenglao commented Apr 4, 2017

In Stan, there is an option to write a generated quantities block for sample generation. Doing the similar in pymc3, however, seems to introduce weird turbulence to the sampler, especially if the generated RV is discrete.

Consider the following simple sample:

import numpy as np
import pymc3 as pm
import theano.tensor as tt  

# Data
x = np.array([1.1, 1.9, 2.3, 1.8])
n = len(x)

sigmoid = lambda x: 1/(1 + tt.exp(-x))
np.random.seed(42)
with pm.Model() as model1:
    # prior
    mu = pm.Normal('mu', mu=0, tau=.001)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    # observed
    xi = pm.Normal('xi', mu=mu, tau=1/(sigma**2), observed=x)
    # inference
    trace = pm.sample(1000, njobs=5, tune=500, init=None)

pm.traceplot(trace, varnames=['mu', 'sigma_interval_'])

The result is fairly normal:

Assigned NUTS to mu
Assigned NUTS to sigma_interval_
100%|██████████| 1000/1000 [00:01<00:00, 855.81it/s]

figure_1

If now I add an addition RV node to the graph:

np.random.seed(42)
with pm.Model() as model1:
    # prior
    mu = pm.Normal('mu', mu=0, tau=.001)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    # observed
    xi = pm.Normal('xi', mu=mu, tau=1/(sigma**2), observed=x)
    # generation 
    p = pm.Deterministic('p', sigmoid(mu))
    count = pm.Binomial('count', n=10, p=p, shape=10)
    # inference
    trace = pm.sample(1000, njobs=5, tune=500, init=None)

The output trace is quite unstable and converge much slower:

Assigned NUTS to mu
Assigned NUTS to sigma_interval_
Assigned Metropolis to count
100%|██████████| 1000/1000 [00:06<00:00, 153.00it/s]

figure_2

if the added RV is continuous, the effect seems to be minimal:

np.random.seed(42)
with pm.Model() as model1:
    # prior
    mu = pm.Normal('mu', mu=0, tau=.001)
    sigma = pm.Uniform('sigma', lower=0, upper=10)
    # observed
    xi = pm.Normal('xi', mu=mu, tau=1/(sigma**2), observed=x)
    # generation 
    p = pm.Deterministic('p', sigmoid(mu))
    xi_ = pm.Normal('xi_', mu=p, shape=10)
    # inference
    trace = pm.sample(1000, njobs=5, tune=500, init=None)
Assigned NUTS to mu
Assigned NUTS to sigma_interval_
Assigned NUTS to xi_
100%|██████████| 1000/1000 [00:07<00:00, 128.04it/s]

figure_3

@junpenglao
Copy link
Member Author

@twiecki following our discussion on twitter.

@twiecki
Copy link
Member

twiecki commented Apr 6, 2017

Interesting. I think the stan guys have warned for a while against mixing NUTS/HMC and other samples. In the cases I tested it worked well but maybe that's the underlying reason here. @bob-carpenter is this surprising to you?

@junpenglao You could alternatively solve this with sample_ppc.

@junpenglao
Copy link
Member Author

junpenglao commented Apr 6, 2017

@twiecki thanks, the problem with sample_ppc is that currently it only sample prediction from an observed RV. So my solution now is to sample from the trace and generate a new RV (eg, see cell 7)

Another easy solution is to plug in a theano random node just to sample from it:
instead of

    count = pm.Binomial('count', n=10, p=p, shape=10)

do:

    rng = tt.shared_randomstreams.RandomStreams()
    count_ = rng.binomial(n=10, p=p, size=(10,))
    count = pm.Deterministic('count', count_)

Only NUTS is used for sample and return no problem.
figure_3

I think the stan guys have warned for a while against mixing NUTS/HMC and other samples.

They are right. In my experience, it either failed (completely stall) or unstable if the discrete RV depends on other continuous RV which is sample by NUTS. But if the discrete RV is on the top of the hierarchy mix sampler is fine:
http://nbviewer.jupyter.org/github/junpenglao/Bayesian-Cognitive-Modeling-in-Pymc3/blob/master/CaseStudies/NumberConceptDevelopment.ipynb

@Spaak
Copy link
Member

Spaak commented Apr 6, 2017

FWIW, in my model, I'm sampling from a Bernoulli RV using BinaryGibbsMetropolis, with all the other RVs sampled using NUTS. The Bernoulli RV is not the top of the hierarchy, the probability is sampled according to a Beta prior (also using NUTS). This seems to work well, I'm getting pretty healthy traces in most cases.

@bob-carpenter
Copy link

In Stan, the generated quantities block does not feed back into the model block, so adding samples conditioned on parameters there won't affect sampling of the parameters. I don't know PyMC3 or Theano, but I think that's what @junpenglao is saying above.

The concern with mixing discrete and continuous sampling is that the change in discrete parameters will affect the continuous distribution's geometry so that the adaptation may be inappropriate. We also don't know how many iterations we have to run to get a decent sample when the discrete parameters change. We know that HMC is hypersensitive to its tuning parameters (mass matrix and step size). We haven't evaluated any of this, so I'm curious what the PyMC3 users have found. The way to evaluate is to simulate data and look at posterior coverage; there's a nice paper by Cook, Gelman, and Rubin on testing models.

@junpenglao
Copy link
Member Author

FYI, this is a wrong way to do sample generation in PyMC3, as putting additional stochastic RVs in the model would mean the logp of that RV is also added to the model logp, which might result in bias in the estimation.

@bob-carpenter
Copy link

@junpenglao It will work if the generated data is conditionally independent, as it often is. For example, if I generate a new observation y_new from normal(mu, sigma) where mu and sigma are parameters in the model. It won't cause bias, because y_new is conditionally independent.

If you're in a regression context, though, and also have x_new predictors, then adding a new y_new generated from normal(x_new * beta, sigma), then you get the effect of "semi-supervised learning". This is the right thing to do in a Bayesian setting for prediction if x_new is generated from the same data-generating process as the original x.

@junpenglao
Copy link
Member Author

Thanks for the detail explanation Bob! I agree with you that should be the case in general. But I am still a bit unsure, if the logp of the y_new in your example above is added to the model logp, would that have no effect on the sampling at all?

twiecki pushed a commit that referenced this issue Jun 20, 2017
@bob-carpenter
Copy link

bob-carpenter commented Jun 21, 2017

Right. You need to look at the marginals. Just take a simple case:

mu ~ normal(0, 2);
sigma ~ lognormal(0, 1);
y ~ normal(mu, sigma);
y_new ~ normal(mu, sigma);

and assume y is observed and y_new is a parameter. Then all you need to do to show that adding y_new has no effect on mu and sigma is to note that the posterior for the model above factors

p(y_new, mu, sigma | y)
  = p(y_new | mu, sigma) * p(mu, sigma | y)
  = p(y_new | mu, sigma) * p(y | mu, sigma) * p(mu, sigma) / p(y)

You can't make this clean factorization when there's a predictor x_new being multiplied by beta, because the posterior you get is p(y_new, beta, sigma | x, x_new) and you can't factor out that x_new because it's only being used to condition.

P.S. Of course, you can also verify it in this case by running it. You should get the same marginal posterior for (mu, sigma) with or without y_new included in the model.

@junpenglao
Copy link
Member Author

I see. Thanks! So in this case the issue is more because of the mixing sampler then.

junpenglao pushed a commit that referenced this issue Jun 21, 2017
twiecki pushed a commit that referenced this issue Jun 21, 2017
@twiecki twiecki closed this as completed Dec 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants