-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Weird turbulence cause by generative discrete RV #1990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@twiecki following our discussion on twitter. |
Interesting. I think the stan guys have warned for a while against mixing NUTS/HMC and other samples. In the cases I tested it worked well but maybe that's the underlying reason here. @bob-carpenter is this surprising to you? @junpenglao You could alternatively solve this with |
@twiecki thanks, the problem with Another easy solution is to plug in a theano random node just to sample from it: count = pm.Binomial('count', n=10, p=p, shape=10) do: rng = tt.shared_randomstreams.RandomStreams()
count_ = rng.binomial(n=10, p=p, size=(10,))
count = pm.Deterministic('count', count_) Only NUTS is used for sample and return no problem.
They are right. In my experience, it either failed (completely stall) or unstable if the discrete RV depends on other continuous RV which is sample by NUTS. But if the discrete RV is on the top of the hierarchy mix sampler is fine: |
FWIW, in my model, I'm sampling from a Bernoulli RV using BinaryGibbsMetropolis, with all the other RVs sampled using NUTS. The Bernoulli RV is not the top of the hierarchy, the probability is sampled according to a Beta prior (also using NUTS). This seems to work well, I'm getting pretty healthy traces in most cases. |
In Stan, the generated quantities block does not feed back into the model block, so adding samples conditioned on parameters there won't affect sampling of the parameters. I don't know PyMC3 or Theano, but I think that's what @junpenglao is saying above. The concern with mixing discrete and continuous sampling is that the change in discrete parameters will affect the continuous distribution's geometry so that the adaptation may be inappropriate. We also don't know how many iterations we have to run to get a decent sample when the discrete parameters change. We know that HMC is hypersensitive to its tuning parameters (mass matrix and step size). We haven't evaluated any of this, so I'm curious what the PyMC3 users have found. The way to evaluate is to simulate data and look at posterior coverage; there's a nice paper by Cook, Gelman, and Rubin on testing models. |
FYI, this is a wrong way to do sample generation in PyMC3, as putting additional stochastic RVs in the model would mean the logp of that RV is also added to the model logp, which might result in bias in the estimation. |
@junpenglao It will work if the generated data is conditionally independent, as it often is. For example, if I generate a new observation If you're in a regression context, though, and also have |
Thanks for the detail explanation Bob! I agree with you that should be the case in general. But I am still a bit unsure, if the logp of the |
Right. You need to look at the marginals. Just take a simple case:
and assume
You can't make this clean factorization when there's a predictor P.S. Of course, you can also verify it in this case by running it. You should get the same marginal posterior for |
I see. Thanks! So in this case the issue is more because of the mixing sampler then. |
Uh oh!
There was an error while loading. Please reload this page.
In
Stan
, there is an option to write agenerated quantities
block for sample generation. Doing the similar in pymc3, however, seems to introduce weird turbulence to the sampler, especially if the generated RV is discrete.Consider the following simple sample:
The result is fairly normal:
If now I add an addition RV node to the graph:
The output trace is quite unstable and converge much slower:
if the added RV is continuous, the effect seems to be minimal:
The text was updated successfully, but these errors were encountered: