Skip to content

Re-working as_model #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zaxtax opened this issue Dec 1, 2023 · 10 comments
Closed

Re-working as_model #275

zaxtax opened this issue Dec 1, 2023 · 10 comments

Comments

@zaxtax
Copy link
Contributor

zaxtax commented Dec 1, 2023

Since it's still new. The as_model decorator is nice, but for models where coords would change it could have better ergonomics. I've explored using a helper method add_coords

def add_coords(coords, lengths=None, model=None):
    model = pm.modelcontext(model)
    model.add_coords(coords=coords, lengths=lengths)

@pmx.as_model(name="linreg")
def linreg_model(x, y, coords):
    add_coords(coords)
    a = pm.Normal("a", 0, 3)
    b = pm.Normal("b", 0, 2)
    sigma = pm.HalfNormal("sigma", 2)
    
    pm.Normal("y", a + b * x, sigma, observed=y, dims="year")

m_train = linreg_model(x_train, y_train, coords={"year": [2019, 2020, 2021]})
m_test = linreg_model(x_test, y_test, coords={"year": [2022, 2023]})

But @ricardoV94 pointed out that we could return a function which took an additional coords or name argument instead

import pymc as pm
from functools import wraps

def as_model(*model_args, **model_kwargs):    
    def decorator(f):
        @wraps(f)
        def make_model(*args, **kwargs):
            coords = model_kwargs.pop("coords", {}) | kwargs.pop("coords", {})        
            with pm.Model(*model_args, coords=coords, **model_kwargs) as m:                    
                f(*args, **kwargs)
            return m
        return make_model
    return decorator

@pmx.as_model(name="linreg")
def linreg_model(x, y):
    a = pm.Normal("a", 0, 3)
    b = pm.Normal("b", 0, 2)
    sigma = pm.HalfNormal("sigma", 2)
    
    pm.Normal("y", a + b * x, sigma, observed=y, dims="year")

m_train = linreg_model(x_train, y_train, coords={"year": [2019, 2020, 2021]})
m_test = linreg_model(x_test, y_test, coords={"year": [2022, 2023]})

I'm curious, which do people prefer?

@twiecki
Copy link
Member

twiecki commented Dec 1, 2023

Definitely the second, I don't think a user should have to remember to call the helper. In the previous we also deal with coords transparently.

@theorashid
Copy link
Contributor

I prefer the second. As long as it works for coords_mutable (and whatever other args we need) too.

@ricardoV94
Copy link
Member

ricardoV94 commented Dec 13, 2023

You may have no need for coords_mutable with as_model since you can just build a different model with fixed coords for posterior predictive, as shown above with m_train and m_test

@theorashid
Copy link
Contributor

Hmm, I think I'd rather keep the one model. Unless my current workflow is bad practice or I've missed the point:

m = model(data=data, coords=coords)
idata = pm.sample(
    model=m,
    return_inferencedata=True,
)

idata.extend(pm.sample_posterior_predictive(idata, model=m, predictions=False))

pm.set_data(new_data=data_predict, model=m)
idata.extend(pm.sample_posterior_predictive(idata, model=m, predictions=True))

tbf, I guess replacing the set_data line isn't a killer. Might be easier to keep track of one m though. Coming from R (sorry), I prefer one m: predict(m, newdata = new_data)

@fonnesbeck
Copy link
Member

I'm starting to appreciate @ricardoV94 's approach of simply having a prediction model that uses the trace of the fitted model, and not have to worry about changing coords.

I'm confused by the example above -- m_train and m_test appear to be two entirely separate model instances, with no shared parameters. How is m_test getting the trained parameters?

@ricardoV94
Copy link
Member

How is m_test getting the trained parameters?

That would happen when calling pm.sample_posterior_predictive with the idata from sampling

@fonnesbeck
Copy link
Member

fonnesbeck commented Dec 13, 2023

So the missing steps are:

trace = pm.sample(model=m_train)
pm.sample_posterior_predictive(trace, model=m_test, extend_inferencedata=True)

The nice thing here is that it avoids having to set up placeholder vars when you build a separate prediction model.

Yeah, I like the second one.

@twiecki
Copy link
Member

twiecki commented Dec 14, 2023

Still in favor of the second.

@zaxtax
Copy link
Contributor Author

zaxtax commented Dec 14, 2023

Hmm, I think I'd rather keep the one model. Unless my current workflow is bad practice or I've missed the point:

m = model(data=data, coords=coords)
idata = pm.sample(
    model=m,
    return_inferencedata=True,
)

idata.extend(pm.sample_posterior_predictive(idata, model=m, predictions=False))

pm.set_data(new_data=data_predict, model=m)
idata.extend(pm.sample_posterior_predictive(idata, model=m, predictions=True))

tbf, I guess replacing the set_data line isn't a killer. Might be easier to keep track of one m though. Coming from R (sorry), I prefer one m: predict(m, newdata = new_data)

With either of the above workflows your code can look like this:

m = model(data=data, coords=coords)
idata = pm.sample(model=m)

pm.sample_posterior_predictive(idata, model=m, predictions=False, extend_inferencedata=True)
m_new = model(data=data_predict, coords=new_coords)
pm.sample_posterior_predictive(idata, model=m_new, predictions=True, extend_inferencedata=True)

@zaxtax
Copy link
Contributor Author

zaxtax commented Dec 16, 2023

Thanks everyone for your feedback!

@zaxtax zaxtax closed this as completed Dec 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants