BART: further changes in sampler #5223

aloctavodia · 2021-11-24T14:15:30Z

This introduces several changes in the pgbart sampler. The main motivation was to improve convergence while keeping a good accuracy.

Increase the number of trees fitted per step from 20% to 100%
Reset the sampler after certain number of steps. This prevents the sampler to get stuck in a local minima,
Particles are no longer re-weighted, we just grow n particles as deep as possible (which is actually not very deep).

From my tests, using response="linear" is less beneficial than before. Nevertheless I decide to keep this feature and runs mores test to determine if we should keep it.

A note in case someone wants more info:
Given the other changes introduced in this PR the benefit from re-weighting particles seems negligible. This PR makes BART slower, but accuracy and convergence are good.

Future PRs should explore ways to accelerate BART. Something trivial is finding some intermediate value between 20% to 100% of the trees fitted per step. Additionally there is room to find better hyper-parameters during tuning like the standard deviation for the leaf node values, or when to reset the sampler or the number of trees fitted per step. Also now that there is no particle reweighting, the trees can be build in a more efficient way in one single pass. Another route could be reusing particles.

Also given that the fitted trees are not very deep, we should maybe move away from particles sampler into something like this, or maybe a mix. To clarify we use a prior proposed by Rockova favouring much shallower trees than the original by Chipman. We should call it Bayesian Additive Regression Bushes (BARB) :-) The particle Gibbs sampler shows good performance with deep trees (something we do not have) and/or high dimensional data. So far my tests for "high-dimensional" data has included only examples were many of the variables are actually unrelated to the response variable and we have a mechanism to focus the sampling on the important variables, so for these cases the effective dimensionality is actually reduced.

Just one example

Old

New

codecov · 2021-11-24T14:32:02Z

Codecov Report

Merging #5223 (fe4b769) into main (99ec0ff) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #5223      +/-   ##
==========================================
- Coverage   78.94%   78.93%   -0.02%     
==========================================
  Files          88       88              
  Lines       14248    14240       -8     
==========================================
- Hits        11248    11240       -8     
  Misses       3000     3000

Impacted Files	Coverage Δ
pymc/bart/pgbart.py	`94.73% <100.00%> (-0.42%)`	⬇️
pymc/bart/tree.py	`100.00% <100.00%> (ø)`
pymc/model.py	`83.30% <0.00%> (ø)`
pymc/distributions/continuous.py	`96.60% <0.00%> (+0.02%)`	⬆️
pymc/model_graph.py	`85.10% <0.00%> (+0.32%)`	⬆️
pymc/distributions/multivariate.py	`71.91% <0.00%> (+0.56%)`	⬆️

twiecki · 2021-11-29T16:03:25Z

RELEASE-NOTES.md

@@ -95,7 +95,7 @@ This includes API changes we did not warn about since at least `3.11.0` (2021-01
 - New features for BART:
  - Added linear response, increased number of trees fitted per step [5044](https://github.com/pymc-devs/pymc3/pull/5044).
  - Added partial dependence plots and individual conditional expectation plots [5091](https://github.com/pymc-devs/pymc3/pull/5091).
-  - Modify how particle weights are computed. This improves accuracy of the modeled function (see [5177](https://github.com/pymc-devs/pymc3/pull/5177)).
+  - Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local mnima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)).


Suggested change

- Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local mnima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)).

- Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local minima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)).

aloctavodia · 2021-11-30T08:00:47Z

Closing in favour of #5229. Sorry for the noise!

add reset, remove particles reweight, small refactor and tidy up

19c97ea

aloctavodia force-pushed the bart_reset branch from 478c949 to 19c97ea Compare November 25, 2021 05:39

aloctavodia added 2 commits November 25, 2021 16:05

update release notes

0a96f23

tidy up code

fe4b769

junpenglao approved these changes Nov 29, 2021

View reviewed changes

twiecki reviewed Nov 29, 2021

View reviewed changes

aloctavodia mentioned this pull request Nov 30, 2021

BART: improve sampling #5229

Merged

aloctavodia closed this Nov 30, 2021

aloctavodia deleted the bart_reset branch February 23, 2023 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BART: further changes in sampler #5223

BART: further changes in sampler #5223

Uh oh!

aloctavodia commented Nov 24, 2021 •

edited

Loading

Uh oh!

codecov bot commented Nov 24, 2021 •

edited

Loading

Uh oh!

twiecki Nov 29, 2021

Uh oh!

aloctavodia commented Nov 30, 2021 •

edited

Loading

Uh oh!

Uh oh!

	- Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local mnima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)).
	- Modify PGBART sampler. Particles are not longer reweighted and the trees are reset from time to time to avoid getting trap in a local minima. This improves accuracy of the modeled function and improves convergence (see [5223](https://github.com/pymc-devs/pymc3/pull/5223)).

Uh oh!

BART: further changes in sampler #5223

BART: further changes in sampler #5223

Uh oh!

Conversation

aloctavodia commented Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

twiecki Nov 29, 2021

Choose a reason for hiding this comment

Uh oh!

aloctavodia commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

aloctavodia commented Nov 24, 2021 •

edited

Loading

codecov bot commented Nov 24, 2021 •

edited

Loading

aloctavodia commented Nov 30, 2021 •

edited

Loading