Skip to content

Add example on using BVAR model for economists #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
twiecki opened this issue Mar 5, 2022 · 32 comments · Fixed by #456
Closed

Add example on using BVAR model for economists #286

twiecki opened this issue Mar 5, 2022 · 32 comments · Fixed by #456
Labels
proposal New notebook proposal still up for discussion

Comments

@twiecki
Copy link
Member

twiecki commented Mar 5, 2022

Notebook proposal

Title: BVAR model for economists

Why should this notebook be added to pymc-examples?

Economists seem to like BVAR and are looking for this example as a starting point. They are also doing a lot of Bayesian modeling but rarely using PPLs.

@twiecki twiecki added the proposal New notebook proposal still up for discussion label Mar 5, 2022
@drbenvincent
Copy link
Contributor

@ricardoV94 has done this on the PyMC Labs website https://www.pymc-labs.io/blog-posts/bayesian-vector-autoregression/. Is this issue now irrelevant?

@twiecki twiecki closed this as completed May 21, 2022
@OriolAbril
Copy link
Member

If this is something that we believe is useful and important for users to be part of the docs then we should adapt the notebook and include it to the collection here so it is discoverable by users (who will use the search bar, tags... and generally won't know about the pymc labs blog) and is kept updated as new versions of pymc are released (I assume the blog post won't be updated recurrently but left frozen instead). The adapted notebook should obviously link to the original notebook and give proper attribution which should also drive some traffic to pymc labs blog too I think.

If this is a one-off more obscure functionality that you don't think is needed as part of the pymc examples collection then it's fine as is and there is no need to reopen the issue

@drbenvincent
Copy link
Contributor

tagging @ricardoV94

@twiecki
Copy link
Member Author

twiecki commented May 22, 2022

Oh I got confused with the repos, should definitely add it here too.

@twiecki twiecki reopened this May 22, 2022
@NathanielF
Copy link
Contributor

Not an economist, but I think this is a cool kind of model is under served in the python infrastructure. I've used the statsmodels version to interrogate questions about the lagged relationship between app performance and customer NPS. It'd be great to see an authoritative example of how to conduct such an analysis in pymc. I think a good write could help popularise this kind of model. Even the statsmodels docs are sparse , how to do X rather than why we might want to do X.

I'm happy to help adapting the pymc labs blog post if that's something you want to pursue?

@drbenvincent
Copy link
Contributor

Ha, the pre-commit checks haven't scared you away!

Go for it 👍🏻

@NathanielF
Copy link
Contributor

Will do a bit of research later this week, and try to pick it up this weekend.

@NathanielF
Copy link
Contributor

NathanielF commented Oct 27, 2022

So I looked into this a little bit yday and I have a few questions re: (a) the implementation in the pymc labs post and (b) bayesian vars in the literature.

So on (a) I noticed that pymc lab used a normal likelihood. Most packages and the literature seem interested in the covariance matrix to do downstream analysis of impulse response functions. Any reason we didn't use MvNormal? The covariance matrix and moving average representation of AR models should be at least mentioned.

On (b) I found a nice write up by Jim Savage of how Bayesian VARs are helpful for forecasting due to the regularisation of hierarchical priors over short time series e.g. the Minnesota prior.

I think for the write up it would be good to demonstrate this. Am working on trying to convert a stan model, but struggling a bit with shape handling at the moment.

Current idea is to (1) show classical example with statsmodels, (2) show pymc lab model can estimate the same. (3) discuss moving average representation of VARs as beyond scope of post but crucial to know (4) demo hierarchical bvar as argument for why the bayesian part is important.

Any thoughts or questions?

I'll try to share an example or the Bayesian hierarchical var in stan later.

@NathanielF
Copy link
Contributor

NathanielF commented Oct 27, 2022

Looking at this Stan implementation of a VAR(1) hierarchicall bayesian model:

https://rpubs.com/jimsavage/hierarchical_var

We can see a panoply of priors and hyper priors of various dimensions. I can fit the model to data and retrieve similar parameter estimates to those reported. Broadly the model seems sensible, but i don't know how to translate the auto-regressive step where they seem to just apply an element wise multiplication of the lags with a matrix of beta parameters.

Estimating the model I recover the dimensions for the priors and hyperpriors in stan:

image

and i can specify a similar set of priors in pymc with the same dimensions the key dimension here is country in the economic data. There are 8 in our data set and three variables.

image

Which gives a promising looking graph:
image

image

but the autoregressive component where they multiply the lags by the beta components will fail in pymc because we can't braodcast:

alpha_8_3 + beta_8_3_3*y_lagged_370_3

and i'm unsure how to best replicate the "double" indexing into country and observation that they are able to do in how they specify the stan likelihood. I feel like i'm missing something crucial, so any tips appreciated.

@ricardoV94
Copy link
Member

CC @jessegrabowski

@jessegrabowski
Copy link
Member

I have something ready to go on this, I would just need to clean it up a bit and re-focus it away from some of the tangential issues. Thoughts?

https://github.com/jessegrabowski/VAR-Blog/blob/main/Big%20VAR%20Blog%20Post.ipynb

@NathanielF
Copy link
Contributor

That's amazing work @jessegrabowski , you're 10 steps ahead of me on this! Those unit-circle plots in particular are amazing!

I'm happy to bow out if you want to do the write up for this example? But i'd love to see the argument made for why Bayesian VARs in particular are useful as per the example of a hierarchical case discussed in the post above.

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 28, 2022

@NathanielF @jessegrabowski perhaps it would be possible to make a short series of 2-3 pymc-examples that builds on BVAR?

@jessegrabowski
Copy link
Member

But i'd love to see the argument made for why Bayesian VARs in particular are useful as per the example of a hierarchical case discussed in the post above.

Yeah, it would be good to work out a hierarchical model. It's just not as obvious how to actually implement it without resorting to looping over likelihood functions. I think it could be conveniently represented as a 3d convolution, but I haven't spent time carefully thinking about it.

I also agree it would be nice to show specialized priors, like Minnesota. I tried to implement one (Heaps) in this notebook, but it doesn't sample.

For economists, it's important to note that VAR models are just a specific case of a linear state space. Ideally, VAR would just fall out of a good implementation of general LSS, which PyMC doesn't have. This is also true because what a research wants to do with a VAR requires a lot of supplementary code: stability analysis, impulse response function, and forecasting. An LSS module would be a good place to put a shared API for all of these.

@jessegrabowski
Copy link
Member

Admittedly everythign in that last comment was not related to the issue. I'd be happy to write a smaller series, I'd also be happy to collaborate @NathanielF, if you're interested in some help.

@NathanielF
Copy link
Contributor

I'd love some help on this and i'm glad you said it's "not obvious" because i was struggling a bit with it. I'm going to try again to work on the Hierarchical model from Jim Savage's post, and if it's ok i might crib some of your code to experiment with? The dataset is quite nice and interesting in itself. It has GDP in recent years and Ireland is an outlier.

image

The idea would be show how even with these relatively short timeseries a hierarchical prior can estimate the relation between GDP and consumption say without being inherently biased by the Irish corporate tax receipts.

I also agree it would be best to have an LSS api which would abstract away some the implementation details, but i think a good argument for working on the implementation is to show how powerful the tool can be. So whether we get the hierarchical version up and running or not, you should definitely go ahead and publish a version of what you have.

@jessegrabowski
Copy link
Member

jessegrabowski commented Oct 28, 2022

Yeah, I agree that using hierarchical prior to try to estimate VARs for many countries simultaneously is an open field of research. I've basically never seen a paper that does it for a VAR, let alone more exotic structural time series models (DSGE or Structural Gravity are two big examples, but basically any general equilibrium framework) . I'm extremely interested in this research question, so I'm very keen to team up on this. It would especially be interesting for obtaining estimates of countries with low-data availability (developing economies), or for adjusting untrustworthy data coming out of authoritarian regimes.

Please steal my code and play around. I noticed that my preprocessing tools were not in the repo so I updated that. You should be able to clone it and run the notebook now. Let me know if anything is missing/wrong.

Also calling data from 1970-2022 "relatively short" cuts deep -- that's basically infinity data for macro T__T

@NathanielF
Copy link
Contributor

NathanielF commented Oct 28, 2022

That's a fair point. The lower data countries would be cool to try incorporate. I'll try play around with it again this weekend. Maybe aiming just to understand your code better, and loop back here when i'm trying the hierarchical version again. Will open a branch for working on the hierarchical BVAR and share with you when (IF) i have something worth talking about.

@NathanielF
Copy link
Contributor

By the way, the function you use at.nn.conv2d has disappeared from the aesera docs.

@jessegrabowski
Copy link
Member

Yeah it's scheduled for depreciation per aesara-devs/aesara#1188

I was a bit disappointed to see it go. The alternative is just to use loops like @ricardoV94 does in the PyMC Labs blog post. No idea if there is any performance gap between the two.

@ricardoV94
Copy link
Member

We can always add it to pymc.math ;)

NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 1, 2022
@NathanielF
Copy link
Contributor

I've added a draft pull request here to get the conversation started: #456

The basic functionality works to build a hierarchical model on the macro-economic data series, but it takes quite a long time to run and i haven't experimented enough with finding good priors for the hyperparameters to make it fit faster or efficiently. I've abstracted a little the beta matrix step, but i'm using a for loop to create the country spefiic parameters. Any tips or suggestions welcome.

As i mention in the pull request i see this as building on the work you've both done already, so if we were aiming to get two separate BVAR examples up, it might be worth cross referencing to @jessegrabowski 's work if possible.

@NathanielF
Copy link
Contributor

Just realised i hadn't shown an example in the notebook of a full hierarchical fit using the multivariate normal likelihood. It samples slowly and without many divergences but the convergence metrics are pretty poor. It could be that i've misspecified the model, and it could be that i just didn't sample enough but it took ages...... and it was awful.

image

image

The sampling with the normal likelihood was much faster ~1hour and seem to estimate the series' well. Curious if you have thoughts on whether it is the model specification the priors or something else contributing to such poor fit.

@jessegrabowski
Copy link
Member

I made some comments on the notebook (which is great btw), but they are all very "fluffy". I'll try to have a look at the actual hierarchical implementation tonight. I played around with it a bit over the weekend, but hit some snags on tangential issues (missing data in multivariate distributions...)

I think just having a first notebook with a simple, one-country BVAR that goes into some design choices present in BVAR that aren't in the usual ML implementation (different priors on the Gamma matrix, diagonal covariance vs full covariance on the likelihood) would be interesting. Then a second notebook with some "interesting stuff you can do with the posterior", including IRF, stability, and forecasting, and a third notebook that tackles hierarchical?

Maybe could also include the estimates produced by statsmodels.api.tsm.VARMAX as a baseline?

@NathanielF
Copy link
Contributor

Thanks for the comments. Yes the statsmodels baseline makes sense. I like the three part approach, and i definitely see the hierarchical stuff as a last add on if we can get it running smoothly.

For a simple one country VAR i think the implementation i have there should be able to handle it fairly efficiently.

I'd need to do some more research on how to pull out the IRF stuff, so happy to follow your lead there.

On the hierarchical stuff, have a look and let me know. It's currently Waaaaayyy too slow for any kind of concise pymc example.

So for next steps, let's have a think about the hierarchical version after you've had a chance to review . Then maybe divvy up work on notebook (1) and (2), and come to work together on (3) when they're both done?!

@jessegrabowski
Copy link
Member

jessegrabowski commented Nov 2, 2022

I'm happy to put together notebook (2) entirely, building on your code/data from (1). Maybe we can fit the VAR on Ireland in part (1) and I can do the IRFs and what-not in part (2), picking up from your fitted model? Just keeping the 3 variables the same for simplicity's sake?

It would have a nice demo of saving/loading a fitting model using netCDF en passant.

@NathanielF
Copy link
Contributor

That works for me! Not sure exactly what the policy is for saving an inference data object... but i guess it can be considered data so within the examples/data folder. I'll try on work on (1) over the weekend, won't get a chance until at least Sunday.

NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 6, 2022
…nd fake data and irish example compared to statsmodels. Included example of small hierarchical case.

Signed-off-by: Nathaniel <[email protected]>
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 18, 2022
…imated hierarchical model using blackjax nuts sampler

Signed-off-by: Nathaniel <[email protected]>
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 18, 2022
…ies to hierarchical model fit

Signed-off-by: Nathaniel <[email protected]>
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 19, 2022
… fit. Fixed some typos too.

Signed-off-by: Nathaniel <[email protected]>
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 20, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 21, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Nov 21, 2022
@NathanielF
Copy link
Contributor

Just giving this a nudge here: #456

Have a pull request ready for review on the Bayesian VAR modelling as part of a proposed series with @jessegrabowski.

I think the first in the series has everything I wanted in it...but am open to push back? Would just like to progress it a bit more. Not sure who is best placed to review, but @lucianopaz looked my previous timeseries post and I think @ricardoV94 wrote the original pymc labs blog post...?

No pressure, I'm sure everyone is quite busy, just didn't want this to slip through the cracks.

Thanks in advance.

@ricardoV94
Copy link
Member

Hi @NathanielF thanks for the ping and the awesome work. I'll try to review tomorrow. Super exciting!

@NathanielF
Copy link
Contributor

Giving this another nudge just in case Friday is a good day for you @ricardoV94? Thanks in advance.

@ricardoV94
Copy link
Member

Gonna try again tomorrow. Very much sorry for the delay :(

@NathanielF
Copy link
Contributor

Thank you! Even if you can't tomorrow..... i'd love to get it over the line before Christmas if possible.

NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 9, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 9, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 9, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 9, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 9, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 9, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 9, 2022
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 10, 2022
…improved prior of model for statsmodels comparison

Signed-off-by: Nathaniel <[email protected]>
NathanielF added a commit to NathanielF/pymc-examples that referenced this issue Dec 12, 2022
@OriolAbril OriolAbril linked a pull request Dec 13, 2022 that will close this issue
3 tasks
OriolAbril pushed a commit that referenced this issue Dec 13, 2022
* [Hierarchical BVAR #286] initial draft of Hierarchical BVAR example

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] focused discussion on simple VARs and fake data and irish example compared to statsmodels. Included example of small hierarchical case.

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] added more explanatory text and estimated hierarchical model using blackjax nuts sampler

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] improved visuals and add all countries to hierarchical model fit

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] fixed issue with hierarchical model fit. Fixed some typos too.

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] added forest plots, corrected typos and improved discussion.

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] added image and scrollable view of image for full model

Signed-off-by: Nathaniel <[email protected]>

* Update bayesian_var_model.myst.md

change filepath to full_model.png

* [Hierarchical BVAR #286] reverting full model image

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] updated with feedback from pull request review

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] tidying bibtex file

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] trying bibtex tidy again

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] added new line to end of bib file

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] removed tabs from bib entry

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] indent with 2 spaces bibtex

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] think i got it this time bibtex

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] updated with requested changes and improved prior of model for statsmodels comparison

Signed-off-by: Nathaniel <[email protected]>

* [Hierarchical BVAR #286] corrected hierarchical model tag

Signed-off-by: Nathaniel <[email protected]>

Signed-off-by: Nathaniel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal New notebook proposal still up for discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants