Skip to content

Add Besag-York-Mollie notebook #566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 8, 2023
Merged

Conversation

daniel-saunders-phil
Copy link
Contributor

@daniel-saunders-phil daniel-saunders-phil commented Aug 20, 2023

I've been working on a notebook that demonstrates how to build and interpret the Besag-York-Mollie model. The BYM model is a popular choice for spatial data, especially in epidemiology. PyMC just gained an ICAR distribution, which is a key ingredient in the BYM model. There is a well-developed literature on Bayesian approaches to BYM, written largely by folks in the Stan and r-INLA community:

By contrast, there is very little tutorial material written for Python users who want to do Bayesian spatial modeling. So this notebook would help rectify that and maybe make PyMC appealing to a new segment of the data science community.

To do:

  • add correct cross-references
  • add bibliography
  • double check style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
  • update extra installs with notebook metadata
  • Spell check
  • Fix error with watermark at end of notebook
  • Remove placeholder file for importing ICAR once it comes out with a PyMC release.
  • Add example that shows how to use predictor variables alongside the BYM model.

📚 Documentation preview 📚: https://pymc-examples--566.org.readthedocs.build/en/566/

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 21, 2023

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2023-08-21T19:00:01Z
----------------------------------------------------------------

Line #10.    from icardist import ICAR

Can we change this to a PyMC import now that it's merged?


daniel-saunders-phil commented on 2023-08-22T00:38:37Z
----------------------------------------------------------------

I was going to just wait until the next release. Do you know if it's possible to install versions directly from github without them being released? I searched a bit for instructions but came up empty handed.

bwengals commented on 2023-08-24T18:39:51Z
----------------------------------------------------------------

assuming you have a conda env that you're working out of where you've already installed pymc, you can pip install from a specific git repo branch. Add --no-deps at the end though, this will make pip only install the pymc code and not try and mess with dependencies.

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 21, 2023

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2023-08-21T19:00:02Z
----------------------------------------------------------------

"so it would receive a large penalty from each OF them"

also, missing a closing $ in \phis


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 21, 2023

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2023-08-21T19:00:03Z
----------------------------------------------------------------

In the equation maybe just use a greek letter instead of scaling-factor?

The last paragraph is a duplicate of the third-to-last paragraph.

Again missing a closing $ on \phi


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 21, 2023

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2023-08-21T19:00:04Z
----------------------------------------------------------------

maybe add a plt.tight_layout() call to make the axis labels readable


@review-notebook-app
Copy link

review-notebook-app bot commented Aug 21, 2023

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2023-08-21T19:00:05Z
----------------------------------------------------------------

What if there are covariates? Do we smooth then? e.g. if a covariate might explain why a tract differs from its neighbors


daniel-saunders-phil commented on 2023-08-21T21:50:04Z
----------------------------------------------------------------

Yeah that's a good question. My hunch is that everything proceeds the same. The ICAR distribution would capture all the unexplained spatial covariance. If the covariates explain some of the spatial variance, that's great. But I won't think that we require in a change in how we approach the BYM component of the model. The Stan case study (published paper version) includes four covariates and they just stick them into the linear model like:

y = b0 + b1 * x1 + b2 * x2 + ... + bym_components

I omitted adding predictors in my case study to keep things short but maybe its worth including one predictor variable to give us an opportunity to address it.

bwengals commented on 2023-08-24T18:46:38Z
----------------------------------------------------------------

I think it'd be helpful to include other covariates. If you don't have other covariates and your goal is just smoothing, there are tons of other ways to do that. Once you're trying to factor out the effects of other covariates then things like ICAR and GPs become a lot more useful.

Copy link
Contributor Author

Yeah that's a good question. My hunch is that everything proceeds the same. The ICAR distribution would capture all the unexplained spatial covariance. If the covariates explain some of the spatial variance, that's great. But I won't think that we require in a change in how we approach the BYM component of the model. The Stan case study (published paper version) includes four covariates and they just stick them into the linear model like:

y = b0 + b1 * x1 + b2 * x2 + ... + bym_components

I omitted adding predictors in my case study to keep things short but maybe its worth including one predictor variable to give us an opportunity to address it.


View entire conversation on ReviewNB

Copy link
Contributor Author

I was going to just wait until the next release. Do you know if it's possible to install versions directly from github without them being released? I searched a bit for instructions but came up empty handed.


View entire conversation on ReviewNB

@fonnesbeck
Copy link
Member

fonnesbeck commented Aug 23, 2023

@daniel-saunders-phil to answer your install question, since the PR was merged into main, then installing from pip via:

pip install -U git+https://github.com/pymc-devs/pymc.git

should do the trick.

@daniel-saunders-phil daniel-saunders-phil marked this pull request as ready for review August 23, 2023 23:03
Copy link
Collaborator

assuming you have a conda env that you're working out of where you've already installed pymc, you can pip install from a specific git repo branch. Add --no-deps at the end though, this will make pip only install the pymc code and not try and mess with dependencies.


View entire conversation on ReviewNB

Copy link
Collaborator

I think it'd be helpful to include other covariates. If you don't have other covariates and your goal is just smoothing, there are tons of other ways to do that. Once you're trying to factor out the effects of other covariates then things like ICAR and GPs become a lot more useful.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented Aug 24, 2023

View / edit / reply to this conversation on ReviewNB

bwengals commented on 2023-08-24T18:47:00Z
----------------------------------------------------------------

while naive BYM models are often not.

Might be better to say "original" instead of naive BYM model. Alternatively you could make a distinction between BYM1 and BYM2


@daniel-saunders-phil
Copy link
Contributor Author

I think it'd be helpful to include other covariates. If you don't have other covariates and your goal is just smoothing, there are tons of other ways to do that. Once you're trying to factor out the effects of other covariates then things like ICAR and GPs become a lot more useful.

Okie I'll add a covariates. If both you and Chris are interested in seeing that example, I can imagine other folks will too. It will take a couple days I reckon. The raw stan data files are a bit of mess 🫠

@daniel-saunders-phil
Copy link
Contributor Author

@fonnesbeck @bwengals I finished adding the covariate business and made a number of other small tweaks. It's ready for another look.

Might be better to say "original" instead of naive BYM model. Alternatively you could make a distinction between BYM1 and BYM2

Fixed. I think we'll keep it short by not detailing the BYM1 vs 2 distinction - it's well covered in the linked papers.

@twiecki twiecki merged commit 670657a into pymc-devs:main Sep 8, 2023
@twiecki
Copy link
Member

twiecki commented Sep 8, 2023

Thanks @daniel-saunders-phil!

@bwengals
Copy link
Collaborator

bwengals commented Sep 8, 2023

Really nice explanation of the scaling factor, TY @daniel-saunders-phil!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants