-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Issue with clustering example in chapter 3 #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm glad you pointed this out. I have been worried about this example exactly for this non-identifiablity reason. Ironically, it is quite an awful example to demonstrate simple MCMC convergence. I like your proposed solution, do you mean something like the following?
|
Yes, that is roughly what I had in mind. What I actually implemented was
which is essentially the same as your proposal. The crucial feature is that center_0 is always less than center_1 because distance is a positive random variable. Using the above I did not have to change anything else in your code. |
I was thinking about this, could the problem also be resolved by specifying more accurate priors? It was naive of me to specify both priors to be Normal(150, \sigma). Instead I could be more specific about the priors and set one to be Normal( 120, \sigma) and the other Normal( 200, \sigma). This, while technically not removing the identifiablity problem, does bias the inference away from it. Technically, my original "hunch" was that there were two present, and by specifying identical priors, I was discarding that hunch. I ran some code, and the chains do seem to settle nicer. What are your thoughts? |
It depends on whether we are talking about a practical matter or a matter of principle. In practice the MCMC nicely stays close to one of the quasistationary states on the timescale for which we run the simulation. In principle however, this is not close to the true posterior distribution and if the Markov chain was allowed to run for very long times, eventually by chance it would switch over to the other quasistationary state. Then, after an even longer time, it would switch back again. In the really long run it would have switched forth and back enough for the simulation to approximate the true posterior. If you start with an asymmetric prior then the true posterior will no longer be exactly symmetric, but it will still show some vestige of the symmetry and will not be well approximated by the quasistationary distribution of the Markov chain. |
This is the same intuition I had. With the asymmetric prior, the resulting posterior is asymmetric (still bimodal but not equal in height.), so one mode will dominate the (short-run) MCMC traces. By establishing specific priors, we can emphasize which mode we wish to dominate. |
There is a subtlety in the clustering example in chapter 3 that should probably be pointed out to the reader, as it is quite instructive. There is a symmetry in the model. Exchanging cluster 0 and cluster 1 and at the same time exchanging p and 1-p leads to an equivalent configuration with the same probability. The prior clearly has that symmetry. The true posterior will have the same symmetry. The distribution obtained by MCMC is very far from symmetric. Markov chain has been trapped in a metastable state. That is good for us, because that distribution is easier to interpret. The true posterior distribution with its bimodal marginals would probably confuse the reader.
We can either point out the above to the reader, or we can change the example to a model that does not have the symmetry. For example one could introduce another positive hyperparameter and set the mean of the second cluster to be different from the mean of the first cluster by this parameter. An exponential prior could be chosen for that new parameter. I implemented that and the convergence of the MCMC is better.
The text was updated successfully, but these errors were encountered: