-
-
Notifications
You must be signed in to change notification settings - Fork 269
Add example on using BVAR model for economists #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@ricardoV94 has done this on the PyMC Labs website https://www.pymc-labs.io/blog-posts/bayesian-vector-autoregression/. Is this issue now irrelevant? |
If this is something that we believe is useful and important for users to be part of the docs then we should adapt the notebook and include it to the collection here so it is discoverable by users (who will use the search bar, tags... and generally won't know about the pymc labs blog) and is kept updated as new versions of pymc are released (I assume the blog post won't be updated recurrently but left frozen instead). The adapted notebook should obviously link to the original notebook and give proper attribution which should also drive some traffic to pymc labs blog too I think. If this is a one-off more obscure functionality that you don't think is needed as part of the pymc examples collection then it's fine as is and there is no need to reopen the issue |
tagging @ricardoV94 |
Oh I got confused with the repos, should definitely add it here too. |
Not an economist, but I think this is a cool kind of model is under served in the python infrastructure. I've used the statsmodels version to interrogate questions about the lagged relationship between app performance and customer NPS. It'd be great to see an authoritative example of how to conduct such an analysis in pymc. I think a good write could help popularise this kind of model. Even the statsmodels docs are sparse , how to do X rather than why we might want to do X. I'm happy to help adapting the pymc labs blog post if that's something you want to pursue? |
Ha, the pre-commit checks haven't scared you away! Go for it 👍🏻 |
Will do a bit of research later this week, and try to pick it up this weekend. |
So I looked into this a little bit yday and I have a few questions re: (a) the implementation in the pymc labs post and (b) bayesian vars in the literature. So on (a) I noticed that pymc lab used a normal likelihood. Most packages and the literature seem interested in the covariance matrix to do downstream analysis of impulse response functions. Any reason we didn't use MvNormal? The covariance matrix and moving average representation of AR models should be at least mentioned. On (b) I found a nice write up by Jim Savage of how Bayesian VARs are helpful for forecasting due to the regularisation of hierarchical priors over short time series e.g. the Minnesota prior. I think for the write up it would be good to demonstrate this. Am working on trying to convert a stan model, but struggling a bit with shape handling at the moment. Current idea is to (1) show classical example with statsmodels, (2) show pymc lab model can estimate the same. (3) discuss moving average representation of VARs as beyond scope of post but crucial to know (4) demo hierarchical bvar as argument for why the bayesian part is important. Any thoughts or questions? I'll try to share an example or the Bayesian hierarchical var in stan later. |
Looking at this Stan implementation of a VAR(1) hierarchicall bayesian model: https://rpubs.com/jimsavage/hierarchical_var We can see a panoply of priors and hyper priors of various dimensions. I can fit the model to data and retrieve similar parameter estimates to those reported. Broadly the model seems sensible, but i don't know how to translate the auto-regressive step where they seem to just apply an element wise multiplication of the lags with a matrix of beta parameters. Estimating the model I recover the dimensions for the priors and hyperpriors in stan: and i can specify a similar set of priors in pymc with the same dimensions the key dimension here is country in the economic data. There are 8 in our data set and three variables. Which gives a promising looking graph: but the autoregressive component where they multiply the lags by the beta components will fail in pymc because we can't braodcast:
and i'm unsure how to best replicate the "double" indexing into country and observation that they are able to do in how they specify the stan likelihood. I feel like i'm missing something crucial, so any tips appreciated. |
I have something ready to go on this, I would just need to clean it up a bit and re-focus it away from some of the tangential issues. Thoughts? https://github.com/jessegrabowski/VAR-Blog/blob/main/Big%20VAR%20Blog%20Post.ipynb |
That's amazing work @jessegrabowski , you're 10 steps ahead of me on this! Those unit-circle plots in particular are amazing! I'm happy to bow out if you want to do the write up for this example? But i'd love to see the argument made for why Bayesian VARs in particular are useful as per the example of a hierarchical case discussed in the post above. |
@NathanielF @jessegrabowski perhaps it would be possible to make a short series of 2-3 pymc-examples that builds on BVAR? |
Yeah, it would be good to work out a hierarchical model. It's just not as obvious how to actually implement it without resorting to looping over likelihood functions. I think it could be conveniently represented as a 3d convolution, but I haven't spent time carefully thinking about it. I also agree it would be nice to show specialized priors, like Minnesota. I tried to implement one (Heaps) in this notebook, but it doesn't sample. For economists, it's important to note that VAR models are just a specific case of a linear state space. Ideally, VAR would just fall out of a good implementation of general LSS, which PyMC doesn't have. This is also true because what a research wants to do with a VAR requires a lot of supplementary code: stability analysis, impulse response function, and forecasting. An LSS module would be a good place to put a shared API for all of these. |
Admittedly everythign in that last comment was not related to the issue. I'd be happy to write a smaller series, I'd also be happy to collaborate @NathanielF, if you're interested in some help. |
I'd love some help on this and i'm glad you said it's "not obvious" because i was struggling a bit with it. I'm going to try again to work on the Hierarchical model from Jim Savage's post, and if it's ok i might crib some of your code to experiment with? The dataset is quite nice and interesting in itself. It has GDP in recent years and Ireland is an outlier. The idea would be show how even with these relatively short timeseries a hierarchical prior can estimate the relation between GDP and consumption say without being inherently biased by the Irish corporate tax receipts. I also agree it would be best to have an LSS api which would abstract away some the implementation details, but i think a good argument for working on the implementation is to show how powerful the tool can be. So whether we get the hierarchical version up and running or not, you should definitely go ahead and publish a version of what you have. |
Yeah, I agree that using hierarchical prior to try to estimate VARs for many countries simultaneously is an open field of research. I've basically never seen a paper that does it for a VAR, let alone more exotic structural time series models (DSGE or Structural Gravity are two big examples, but basically any general equilibrium framework) . I'm extremely interested in this research question, so I'm very keen to team up on this. It would especially be interesting for obtaining estimates of countries with low-data availability (developing economies), or for adjusting untrustworthy data coming out of authoritarian regimes. Please steal my code and play around. I noticed that my preprocessing tools were not in the repo so I updated that. You should be able to clone it and run the notebook now. Let me know if anything is missing/wrong. Also calling data from 1970-2022 "relatively short" cuts deep -- that's basically infinity data for macro T__T |
That's a fair point. The lower data countries would be cool to try incorporate. I'll try play around with it again this weekend. Maybe aiming just to understand your code better, and loop back here when i'm trying the hierarchical version again. Will open a branch for working on the hierarchical BVAR and share with you when (IF) i have something worth talking about. |
By the way, the function you use at.nn.conv2d has disappeared from the aesera docs. |
Yeah it's scheduled for depreciation per aesara-devs/aesara#1188 I was a bit disappointed to see it go. The alternative is just to use loops like @ricardoV94 does in the PyMC Labs blog post. No idea if there is any performance gap between the two. |
We can always add it to |
…example Signed-off-by: Nathaniel <[email protected]>
I've added a draft pull request here to get the conversation started: #456 The basic functionality works to build a hierarchical model on the macro-economic data series, but it takes quite a long time to run and i haven't experimented enough with finding good priors for the hyperparameters to make it fit faster or efficiently. I've abstracted a little the beta matrix step, but i'm using a for loop to create the country spefiic parameters. Any tips or suggestions welcome. As i mention in the pull request i see this as building on the work you've both done already, so if we were aiming to get two separate BVAR examples up, it might be worth cross referencing to @jessegrabowski 's work if possible. |
Just realised i hadn't shown an example in the notebook of a full hierarchical fit using the multivariate normal likelihood. It samples slowly and without many divergences but the convergence metrics are pretty poor. It could be that i've misspecified the model, and it could be that i just didn't sample enough but it took ages...... and it was awful. The sampling with the normal likelihood was much faster ~1hour and seem to estimate the series' well. Curious if you have thoughts on whether it is the model specification the priors or something else contributing to such poor fit. |
I made some comments on the notebook (which is great btw), but they are all very "fluffy". I'll try to have a look at the actual hierarchical implementation tonight. I played around with it a bit over the weekend, but hit some snags on tangential issues (missing data in multivariate distributions...) I think just having a first notebook with a simple, one-country BVAR that goes into some design choices present in BVAR that aren't in the usual ML implementation (different priors on the Gamma matrix, diagonal covariance vs full covariance on the likelihood) would be interesting. Then a second notebook with some "interesting stuff you can do with the posterior", including IRF, stability, and forecasting, and a third notebook that tackles hierarchical? Maybe could also include the estimates produced by |
Thanks for the comments. Yes the statsmodels baseline makes sense. I like the three part approach, and i definitely see the hierarchical stuff as a last add on if we can get it running smoothly. For a simple one country VAR i think the implementation i have there should be able to handle it fairly efficiently. I'd need to do some more research on how to pull out the IRF stuff, so happy to follow your lead there. On the hierarchical stuff, have a look and let me know. It's currently Waaaaayyy too slow for any kind of concise pymc example. So for next steps, let's have a think about the hierarchical version after you've had a chance to review . Then maybe divvy up work on notebook (1) and (2), and come to work together on (3) when they're both done?! |
I'm happy to put together notebook (2) entirely, building on your code/data from (1). Maybe we can fit the VAR on Ireland in part (1) and I can do the IRFs and what-not in part (2), picking up from your fitted model? Just keeping the 3 variables the same for simplicity's sake? It would have a nice demo of saving/loading a fitting model using netCDF en passant. |
That works for me! Not sure exactly what the policy is for saving an inference data object... but i guess it can be considered data so within the examples/data folder. I'll try on work on (1) over the weekend, won't get a chance until at least Sunday. |
…nd fake data and irish example compared to statsmodels. Included example of small hierarchical case. Signed-off-by: Nathaniel <[email protected]>
…imated hierarchical model using blackjax nuts sampler Signed-off-by: Nathaniel <[email protected]>
…ies to hierarchical model fit Signed-off-by: Nathaniel <[email protected]>
… fit. Fixed some typos too. Signed-off-by: Nathaniel <[email protected]>
… and improved discussion. Signed-off-by: Nathaniel <[email protected]>
…image for full model Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Just giving this a nudge here: #456 Have a pull request ready for review on the Bayesian VAR modelling as part of a proposed series with @jessegrabowski. I think the first in the series has everything I wanted in it...but am open to push back? Would just like to progress it a bit more. Not sure who is best placed to review, but @lucianopaz looked my previous timeseries post and I think @ricardoV94 wrote the original pymc labs blog post...? No pressure, I'm sure everyone is quite busy, just didn't want this to slip through the cracks. Thanks in advance. |
Hi @NathanielF thanks for the ping and the awesome work. I'll try to review tomorrow. Super exciting! |
Giving this another nudge just in case Friday is a good day for you @ricardoV94? Thanks in advance. |
Gonna try again tomorrow. Very much sorry for the delay :( |
Thank you! Even if you can't tomorrow..... i'd love to get it over the line before Christmas if possible. |
…uest review Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
…improved prior of model for statsmodels comparison Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
* [Hierarchical BVAR #286] initial draft of Hierarchical BVAR example Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] focused discussion on simple VARs and fake data and irish example compared to statsmodels. Included example of small hierarchical case. Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] added more explanatory text and estimated hierarchical model using blackjax nuts sampler Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] improved visuals and add all countries to hierarchical model fit Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] fixed issue with hierarchical model fit. Fixed some typos too. Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] added forest plots, corrected typos and improved discussion. Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] added image and scrollable view of image for full model Signed-off-by: Nathaniel <[email protected]> * Update bayesian_var_model.myst.md change filepath to full_model.png * [Hierarchical BVAR #286] reverting full model image Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] updated with feedback from pull request review Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] tidying bibtex file Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] trying bibtex tidy again Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] added new line to end of bib file Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] removed tabs from bib entry Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] indent with 2 spaces bibtex Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] think i got it this time bibtex Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] updated with requested changes and improved prior of model for statsmodels comparison Signed-off-by: Nathaniel <[email protected]> * [Hierarchical BVAR #286] corrected hierarchical model tag Signed-off-by: Nathaniel <[email protected]> Signed-off-by: Nathaniel <[email protected]>
Notebook proposal
Title: BVAR model for economists
Why should this notebook be added to pymc-examples?
Economists seem to like BVAR and are looking for this example as a starting point. They are also doing a lot of Bayesian modeling but rarely using PPLs.
The text was updated successfully, but these errors were encountered: