Skip to content

Merging BayesianEstimator into ModelBuilder #165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

michaelraczycki
Copy link
Collaborator

@michaelraczycki michaelraczycki commented May 10, 2023

Complete merge of BayesianEstimator and it's functionalities into ModelBuilder:

Implementation details:
-splitting create_sample_input into 3 new functions:
*generate_model_data - a method that takes over data generation/pre-formatting
*default_sampler_config - property function that allows access to default sampler config
*default_model_config - property function that allows access to default model config

Split as above, allows for simplification of ModelBuilder initialisation in the default mode, allowing user easy access to the data structures, while making all parameters optional maintains full customisation option.

All model-related functions now take X and y as input, to mimic scikit-learn model input parameters. That was intended to allow usage of sklearn Transformers and Pipelines with PyMC models.
For now, data can be provided still on each of the steps that it could have been provided in the past (class init, build_model function). This allows for a live test period, in which it will be determined whether the combined data provisioning system will stay, or we'll move into sklearn-like way.
Fit now takes an optional parameter 'predictor_names' which allows custom naming of the predictors in fit_data, in case the predictors were given in 2d array, in case of providing parameters in pandas Dataframe, the predictor names will be taken from the data frame itself.

@michaelraczycki
Copy link
Collaborator Author

I decided to adapt some of the favourite solution ideas that were introduced with BayesianEstimator class, also I decided to not fully skip their implementation to avoid possible runtime issues that I've encountered in last development.

@michaelraczycki michaelraczycki changed the title splitting create_sample_input, adapting tests splitting create_sample_input, adapting tests in ModelBuilder May 10, 2023
@michaelraczycki
Copy link
Collaborator Author

the pytest exception needed to be added because of numpy 1.24. introducing deprecation warning for bool8, which is used in bokeh library, was causing all test to fail on collection for any envs with the latest numpy

@michaelraczycki
Copy link
Collaborator Author

this PR is needed for #144 , as the ModelBuilder and BasesianEstimator changes will be used in PoC of model builds in DelayedSaturatedMMM in pymc-experimental, therefore the changes need to be released


Examples
--------
>>> import arviz as az
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are pretty weird examples, more like tests. This should give the user info on how to use the method to achieve something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctests provide an option to perform additional checks based on the docstrings, they might look a bit odd, but you're right, this one is not the best. I'll try to refactor it. Maybe putting some comment on the last line that is actually both doctest and the user manual (in this case 280) would help?

>>> idata = az.InferenceData()
>>> self.set_idata_attrs(idata=idata)
>>> assert "id" in idata.attrs
>>> model = YourModel(ModelBuilder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe keep it consistent between MyModel, YourModel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, didn't notice that one. It's fixed now.

fixing indentation issue, adding exception for pytest

adding forgotten decorator to generate_model_data

making doctest more user-manual like, renaming example model for consistency

chaning YourClass to MyClass for consistency
@michaelraczycki michaelraczycki force-pushed the model_builder_compatibility_with_sklearn branch from 6a61dc1 to 539659d Compare May 11, 2023 12:15
@michaelraczycki michaelraczycki requested a review from twiecki May 11, 2023 12:53
@@ -60,6 +60,7 @@ class BayesianEstimator(ModelBuilder):

def __init__(
self,
data: Union[np.ndarray, pd.DataFrame, pd.Series] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you adding this back in?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I want to experiment with adding a no-parameter initialization that will act as a sandbox for the new users, but keeping the data in the constructor helps to reduce the number of adaptations before going into implementing model builder in other classes. Also it has low impact, and should be easy to remove later if we don't like it

@michaelraczycki michaelraczycki requested a review from twiecki May 12, 2023 07:22
@michaelraczycki michaelraczycki changed the title splitting create_sample_input, adapting tests in ModelBuilder Merging BayesianEstimator into ModelBuilder May 16, 2023
@michaelraczycki
Copy link
Collaborator Author

please do not squash merge it, the commits were split on purpose to make it reversible in case it's needed

@michaelraczycki
Copy link
Collaborator Author

@twiecki @ricardoV94 any suggestions to the current state?

.pylintrc Outdated
@@ -46,7 +46,6 @@ enable=import-self,
used-before-assignment,
cell-var-from-loop,
global-variable-undefined,
dangerous-default-value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have time to check the rest yet, but this doesn't seem right

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a problem with pymc-marketing and pymc-expental, as they don't use the same linting rules. So It's something I forgot to 'undo' after installing locally pymc-experimental from my local repo.

@twiecki twiecki merged commit 573e5bc into pymc-devs:main May 26, 2023
@michaelraczycki michaelraczycki deleted the model_builder_compatibility_with_sklearn branch July 5, 2023 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants