Skip to content

Add doc for modelling tips #2080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 28, 2017
Merged

Add doc for modelling tips #2080

merged 3 commits into from
Apr 28, 2017

Conversation

junpenglao
Copy link
Member

A general walkthrough of porting Winbugs/JAGS/STAN code into PyMC3 using a CAR example, with some tips etc.

A general walkthrough of porting Winbugs/JAGS/STAN code into PyMC3 using a CAR example, with some tips etc.
@junpenglao
Copy link
Member Author

Comments welcome @twiecki @aseyboldt @denadai2. Also closing #2022

@denadai2
Copy link
Contributor

Great notebook. Just a question: I find your final code slower (with my data) than #2066 (comment). Is it normal?

Moreover: why did you remove bound? Is there any reason?

thx

@junpenglao
Copy link
Member Author

@denadai2 thanks for your comments!

I find your final code slower (with my data) than #2066 (comment). Is it normal?

I made some modification, and also it is likely not optimised for general usage yet. BTW, in the 2066 comment there is no tau? How are you comparing the two?

why did you remove bound? Is there any reason?

It's just for simplifying the code. The bounds are necessary to check the conditions if you are building a more general function.

@junpenglao
Copy link
Member Author

@denadai2

I find your final code slower (with my data) than #2066 (comment). Is it normal?

Maybe it's the eigenvalue computation below?

        # eigenvalues of D^−1/2 * W * D^−1/2
        Dinv_sqrt = np.diag(1 / np.sqrt(D))
        DWD = np.matmul(np.matmul(Dinv_sqrt, W), Dinv_sqrt)
        self.lam = scipy.linalg.eigvalsh(DWD)

This part actually can be move outside of the function totally, it will speed up the computation a bit more.

@aseyboldt
Copy link
Member

@junpenglao Haven't read all of it yet, but it looks nice. :-)
I'd guess that the speed difference is related to the parametrization, not raw speed. I left out tau and sd mainly out of laziness, and since most of the time you can just scale the values afterwards. In many cases that avoids correlations in the posterior (it will be slower in some cases, however). The rescaling happened here:

mu = tt.exp(b0 + b1 * X['mdist_daily'] + sd * phi)

A few other remarks:

  • Is there a special reason you are using tau instead of sd? Doesn't really matter, but personally I find sd easier to think about.
  • I'm not really a great fan of Normals with a large sd as prior. I had a couple of bad experiences with those. Gelman often recommends Cauchy or StudentT.
  • There seem to be a lot of correlations in the posterior. That probably slows down NUTS quite a bit. As a debugging tool and guide for reparametrization I often look at the singular value decomposition of the standardized samples – basically the eigenvalues of the correlation matrix. If the problem is high dimensional you can use stuff from scipy.sparse.linalg to only compute the largest svdvals:
from scipy import linalg, sparse

vals = np.array([model2.dict_to_array(v) for v in trace2[1000:]]).T
vals[:] -= vals.mean(axis=1)[:, None]
vals[:] /= vals.std(axis=1)[:, None]

U, S, Vh = sparse.linalg.svds(vals, k=20)

Then look at plt.plot(S) to see if any principal components stick out, and check which variables are involved by looking at the singular vectors: plt.plot(U[:, -1] ** 2). You can get the indices by looking at model2.bijection.ordering.vmap.

@junpenglao
Copy link
Member Author

@aseyboldt Thanks a lot for your comments!!!
There are no particular reason of using tau and the choice of prior, mostly it is just to match the original code in Winbugs or Stan.
Actually, your remarks are perfect for pointing out that merely porting the code from one to the other is not always the best practice. The aims are not just to run the code, but to make sure the model is appropriate as it returns correct estimation efficiently. Would you mind I edit and add your remarks into the notebook?

@denadai2
Copy link
Contributor

After @aseyboldt's comment I have to study a lot about reparametrization (future guide plz!).

thx

@twiecki twiecki merged commit 6901ad4 into pymc-devs:master Apr 28, 2017
@twiecki
Copy link
Member

twiecki commented Apr 28, 2017

Thanks @junpenglao!

@junpenglao
Copy link
Member Author

thanks @twiecki. I think we can close #2022 as well now.
Also, is there anything need to be done for the pymc3 website to update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants