-
-
Notifications
You must be signed in to change notification settings - Fork 59
Merging BayesianEstimator into ModelBuilder #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging BayesianEstimator into ModelBuilder #165
Conversation
I decided to adapt some of the favourite solution ideas that were introduced with BayesianEstimator class, also I decided to not fully skip their implementation to avoid possible runtime issues that I've encountered in last development. |
the pytest exception needed to be added because of numpy 1.24. introducing deprecation warning for bool8, which is used in bokeh library, was causing all test to fail on collection for any envs with the latest numpy |
this PR is needed for #144 , as the ModelBuilder and BasesianEstimator changes will be used in PoC of model builds in DelayedSaturatedMMM in pymc-experimental, therefore the changes need to be released |
pymc_experimental/model_builder.py
Outdated
|
||
Examples | ||
-------- | ||
>>> import arviz as az |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are pretty weird examples, more like tests. This should give the user info on how to use the method to achieve something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doctests provide an option to perform additional checks based on the docstrings, they might look a bit odd, but you're right, this one is not the best. I'll try to refactor it. Maybe putting some comment on the last line that is actually both doctest and the user manual (in this case 280) would help?
pymc_experimental/model_builder.py
Outdated
>>> idata = az.InferenceData() | ||
>>> self.set_idata_attrs(idata=idata) | ||
>>> assert "id" in idata.attrs | ||
>>> model = YourModel(ModelBuilder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep it consistent between MyModel
, YourModel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, didn't notice that one. It's fixed now.
fixing indentation issue, adding exception for pytest adding forgotten decorator to generate_model_data making doctest more user-manual like, renaming example model for consistency chaning YourClass to MyClass for consistency
6a61dc1
to
539659d
Compare
@@ -60,6 +60,7 @@ class BayesianEstimator(ModelBuilder): | |||
|
|||
def __init__( | |||
self, | |||
data: Union[np.ndarray, pd.DataFrame, pd.Series] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you adding this back in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I want to experiment with adding a no-parameter initialization that will act as a sandbox for the new users, but keeping the data in the constructor helps to reduce the number of adaptations before going into implementing model builder in other classes. Also it has low impact, and should be easy to remove later if we don't like it
please do not squash merge it, the commits were split on purpose to make it reversible in case it's needed |
@twiecki @ricardoV94 any suggestions to the current state? |
.pylintrc
Outdated
@@ -46,7 +46,6 @@ enable=import-self, | |||
used-before-assignment, | |||
cell-var-from-loop, | |||
global-variable-undefined, | |||
dangerous-default-value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have time to check the rest yet, but this doesn't seem right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a problem with pymc-marketing and pymc-expental, as they don't use the same linting rules. So It's something I forgot to 'undo' after installing locally pymc-experimental from my local repo.
Complete merge of BayesianEstimator and it's functionalities into ModelBuilder:
Implementation details:
-splitting create_sample_input into 3 new functions:
*generate_model_data - a method that takes over data generation/pre-formatting
*default_sampler_config - property function that allows access to default sampler config
*default_model_config - property function that allows access to default model config
Split as above, allows for simplification of ModelBuilder initialisation in the default mode, allowing user easy access to the data structures, while making all parameters optional maintains full customisation option.
All model-related functions now take X and y as input, to mimic scikit-learn model input parameters. That was intended to allow usage of sklearn Transformers and Pipelines with PyMC models.
For now, data can be provided still on each of the steps that it could have been provided in the past (class init, build_model function). This allows for a live test period, in which it will be determined whether the combined data provisioning system will stay, or we'll move into sklearn-like way.
Fit now takes an optional parameter 'predictor_names' which allows custom naming of the predictors in fit_data, in case the predictors were given in 2d array, in case of providing parameters in pandas Dataframe, the predictor names will be taken from the data frame itself.