-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
BART: further changes in sampler #5223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov Report
@@ Coverage Diff @@
## main #5223 +/- ##
==========================================
- Coverage 78.94% 78.93% -0.02%
==========================================
Files 88 88
Lines 14248 14240 -8
==========================================
- Hits 11248 11240 -8
Misses 3000 3000
|
478c949
to
19c97ea
Compare
junpenglao
approved these changes
Nov 29, 2021
twiecki
reviewed
Nov 29, 2021
@@ -95,7 +95,7 @@ This includes API changes we did not warn about since at least `3.11.0` (2021-01 | |||
- New features for BART: | |||
- Added linear response, increased number of trees fitted per step [5044](https://github.com/pymc-devs/pymc3/pull/5044). | |||
- Added partial dependence plots and individual conditional expectation plots [5091](https://github.com/pymc-devs/pymc3/pull/5091). | |||
- Modify how particle weights are computed. This improves accuracy of the modeled function (see [5177](https://github.com/pymc-devs/pymc3/pull/5177)). | |||
- Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local mnima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change
- Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local mnima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)). | |
- Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local minima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)). |
Merged
Closing in favour of #5229. Sorry for the noise! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This introduces several changes in the pgbart sampler. The main motivation was to improve convergence while keeping a good accuracy.
From my tests, using
response="linear"
is less beneficial than before. Nevertheless I decide to keep this feature and runs mores test to determine if we should keep it.A note in case someone wants more info:
Given the other changes introduced in this PR the benefit from re-weighting particles seems negligible. This PR makes BART slower, but accuracy and convergence are good.
Future PRs should explore ways to accelerate BART. Something trivial is finding some intermediate value between 20% to 100% of the trees fitted per step. Additionally there is room to find better hyper-parameters during tuning like the standard deviation for the leaf node values, or when to reset the sampler or the number of trees fitted per step. Also now that there is no particle reweighting, the trees can be build in a more efficient way in one single pass. Another route could be reusing particles.
Also given that the fitted trees are not very deep, we should maybe move away from particles sampler into something like this, or maybe a mix. To clarify we use a prior proposed by Rockova favouring much shallower trees than the original by Chipman. We should call it Bayesian Additive Regression Bushes (BARB) :-) The particle Gibbs sampler shows good performance with deep trees (something we do not have) and/or high dimensional data. So far my tests for "high-dimensional" data has included only examples were many of the variables are actually unrelated to the response variable and we have a mechanism to focus the sampling on the important variables, so for these cases the effective dimensionality is actually reduced.
Just one example
Old

New
