Add doc for modelling tips #2080

junpenglao · 2017-04-25T22:24:26Z

A general walkthrough of porting Winbugs/JAGS/STAN code into PyMC3 using a CAR example, with some tips etc.

junpenglao · 2017-04-25T22:26:20Z

Comments welcome @twiecki @aseyboldt @denadai2. Also closing #2022

denadai2 · 2017-04-26T13:06:11Z

Great notebook. Just a question: I find your final code slower (with my data) than #2066 (comment). Is it normal?

Moreover: why did you remove bound? Is there any reason?

thx

junpenglao · 2017-04-26T13:18:21Z

@denadai2 thanks for your comments!

I find your final code slower (with my data) than #2066 (comment). Is it normal?

I made some modification, and also it is likely not optimised for general usage yet. BTW, in the 2066 comment there is no tau? How are you comparing the two?

why did you remove bound? Is there any reason?

It's just for simplifying the code. The bounds are necessary to check the conditions if you are building a more general function.

junpenglao · 2017-04-26T13:25:10Z

@denadai2

I find your final code slower (with my data) than #2066 (comment). Is it normal?

Maybe it's the eigenvalue computation below?

        # eigenvalues of D^−1/2 * W * D^−1/2
        Dinv_sqrt = np.diag(1 / np.sqrt(D))
        DWD = np.matmul(np.matmul(Dinv_sqrt, W), Dinv_sqrt)
        self.lam = scipy.linalg.eigvalsh(DWD)

This part actually can be move outside of the function totally, it will speed up the computation a bit more.

aseyboldt · 2017-04-26T15:54:37Z

@junpenglao Haven't read all of it yet, but it looks nice. :-)
I'd guess that the speed difference is related to the parametrization, not raw speed. I left out tau and sd mainly out of laziness, and since most of the time you can just scale the values afterwards. In many cases that avoids correlations in the posterior (it will be slower in some cases, however). The rescaling happened here:

mu = tt.exp(b0 + b1 * X['mdist_daily'] + sd * phi)

A few other remarks:

Is there a special reason you are using tau instead of sd? Doesn't really matter, but personally I find sd easier to think about.
I'm not really a great fan of Normals with a large sd as prior. I had a couple of bad experiences with those. Gelman often recommends Cauchy or StudentT.
There seem to be a lot of correlations in the posterior. That probably slows down NUTS quite a bit. As a debugging tool and guide for reparametrization I often look at the singular value decomposition of the standardized samples – basically the eigenvalues of the correlation matrix. If the problem is high dimensional you can use stuff from scipy.sparse.linalg to only compute the largest svdvals:

from scipy import linalg, sparse

vals = np.array([model2.dict_to_array(v) for v in trace2[1000:]]).T
vals[:] -= vals.mean(axis=1)[:, None]
vals[:] /= vals.std(axis=1)[:, None]

U, S, Vh = sparse.linalg.svds(vals, k=20)

Then look at plt.plot(S) to see if any principal components stick out, and check which variables are involved by looking at the singular vectors: plt.plot(U[:, -1] ** 2). You can get the indices by looking at model2.bijection.ordering.vmap.

junpenglao · 2017-04-26T16:14:57Z

@aseyboldt Thanks a lot for your comments!!!
There are no particular reason of using tau and the choice of prior, mostly it is just to match the original code in Winbugs or Stan.
Actually, your remarks are perfect for pointing out that merely porting the code from one to the other is not always the best practice. The aims are not just to run the code, but to make sure the model is appropriate as it returns correct estimation efficiently. Would you mind I edit and add your remarks into the notebook?

denadai2 · 2017-04-26T16:37:52Z

After @aseyboldt's comment I have to study a lot about reparametrization (future guide plz!).

thx

twiecki · 2017-04-28T11:19:45Z

Thanks @junpenglao!

junpenglao · 2017-04-28T12:02:18Z

thanks @twiecki. I think we can close #2022 as well now.
Also, is there anything need to be done for the pymc3 website to update?

Add doc for modelling tips

9b646cc

A general walkthrough of porting Winbugs/JAGS/STAN code into PyMC3 using a CAR example, with some tips etc.

PEP8 and add doc to examples.rst

61bbe9c

Add remarks from @aseyboldt

146faf0

twiecki merged commit 6901ad4 into pymc-devs:master Apr 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add doc for modelling tips #2080

Add doc for modelling tips #2080

junpenglao commented Apr 25, 2017

junpenglao commented Apr 25, 2017

denadai2 commented Apr 26, 2017

junpenglao commented Apr 26, 2017

junpenglao commented Apr 26, 2017

aseyboldt commented Apr 26, 2017

junpenglao commented Apr 26, 2017

denadai2 commented Apr 26, 2017

twiecki commented Apr 28, 2017

junpenglao commented Apr 28, 2017

Add doc for modelling tips #2080

Add doc for modelling tips #2080

Conversation

junpenglao commented Apr 25, 2017

junpenglao commented Apr 25, 2017

denadai2 commented Apr 26, 2017

junpenglao commented Apr 26, 2017

junpenglao commented Apr 26, 2017

aseyboldt commented Apr 26, 2017

junpenglao commented Apr 26, 2017

denadai2 commented Apr 26, 2017

twiecki commented Apr 28, 2017

junpenglao commented Apr 28, 2017