Generalization of logp test comparison to facilitate relative and/or absolute comparison #6159

ccaprani · 2022-09-28T07:23:10Z

Following a discussion here pymc-devs/pymc-extras#84 (comment) with @ricardoV94 , there may be cases when it is better to do a relative and/or absolute check on the comparison between the desired (SciPy) and actual (pymc) values. Currently only an absolute test is done, to 1.5*1e-decimal (see here) precision.

~~This (small) PR maintains current behaviour (i.e. absolute tolerance), but facilitates future expansion to consider both absolute and/or relative comparison.~~

Once/if this PR is accepted, then the atol and rtol arguments can be propagated out to test functions as a kwarg as decimal current is. It could help avoid some of the float32 skipped tests and deal with some other edge cases.

Scope creep means this has been done now!

Implementation

Change from the not-recommended assert_almost_equal to the assert_allclose from numpy.testing.

Maintain current behaviour by using atol=decimal and rtol=0 in comparing the difference between actual and desired to atol + rtol * abs(desired).

Major / Breaking Changes

None

Bugfixes / New features

Facilities a new feature in testing in that future logp could be tested relative and/or absolute.

Docs / Maintenance

N/A

ricardoV94 · 2022-09-28T08:11:48Z

Do you want to try some breaking change, and suggest values that look reasonable as a default? That would be great for maintenance / future devs

ccaprani · 2022-09-28T08:27:40Z

Changing from using an atol to an rtol gives the same test results on test_continuous.py at least:

rtol=10**(-decimal),
atol=0,#1.5 * 10 ** (-decimal),

I haven't got polygamma installed, so I get:

FAILED test_continuous.py::TestMatchesScipy::test_moyal_logcdf - AssertionError: 
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[1.0-0.0-None-0.25] - RuntimeError: polyagamma package is not installed!
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[1.0-z1-None-expected1] - RuntimeError: polyagamma package is not installed!
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[h2-z2-None-expected2] - RuntimeError: polyagamma package is not installed!
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[h3-z3-size3-expected3] - RuntimeError: polyagamma package is not installed!
========================================================================= 5 failed, 268 passed, 2 skipped, 18 warnings in 37.23s ==========================================================================

ricardoV94 · 2022-09-28T08:33:01Z

Would probably need to tweak the decimal value, no? And also important to check for float32.

ricardoV94 · 2022-09-28T08:57:09Z

I suggest you allow the rtol/atol to be configured already so you can unblock your PR on pymc-experimental.

Then we can follow up with a separate PR where we propose changing the default tolerance for the pre-existing tests.

pymc/tests/distributions/util.py

codecov · 2022-09-28T09:14:20Z

Codecov Report

Merging #6159 (7371d2a) into main (f96594b) will decrease coverage by 7.73%.
The diff coverage is 90.47%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6159      +/-   ##
==========================================
- Coverage   94.79%   87.05%   -7.74%     
==========================================
  Files         148      148              
  Lines       27488    27492       +4     
==========================================
- Hits        26058    23934    -2124     
- Misses       1430     3558    +2128

Impacted Files	Coverage Δ
pymc/distributions/continuous.py	`66.66% <ø> (-31.02%)`	⬇️
pymc/tests/distributions/test_continuous.py	`0.00% <ø> (-99.78%)`	⬇️
pymc/tests/distributions/test_multivariate.py	`36.52% <0.00%> (-62.93%)`	⬇️
pymc/tests/distributions/test_logprob.py	`100.00% <100.00%> (ø)`
pymc/tests/distributions/test_timeseries.py	`99.51% <100.00%> (ø)`
pymc/tests/distributions/util.py	`88.07% <100.00%> (-2.79%)`	⬇️
pymc/distributions/multivariate.py	`57.77% <0.00%> (-34.51%)`	⬇️
... and 5 more

ccaprani · 2022-09-28T13:17:47Z

@ricardoV94 so an initial refactor is done to now allow both atol and rtol to be specified.

Given that previous behaviour was an absolute tolerance set at decimals=6 (just considering float64 for now), then this corresponds to 1e-6 as an absolute precision, so I adopted this as default. However, I moved all check_logp and check_logc from an absolute to an rtol check. I also left a default atol of 1e-10, which is necessary in some cases to get current test behaviour.

I confirmed test behaviour is the same, at least for bound, censored, continuous. Running the suite (and I don't have polyagamma isntalled, so that's 5 fails) gives:

50 failed, 2010 passed, 2 skipped, 243 xfailed, 164 warnings in 870.40s (0:14:30)

Here are a few observations:

I moved default tolerances to select_by_precision so they can be set in one place only.
Currently the absolute tolerances for float64 and float32 are set everywhere: suggest this is migrated to avoid duplicating defaults.-
I did not adjust tests that did not use check_logp or check_logc - many functions seem to 'roll their own'
The float32 checks in many cases were set at extremely generous levels (e.g. 1e1!)
test_continuous.py::TestMatchesScipy::test_moyal_logcdf fails, and it's nothing to do with this change.

I think this architecture now allows for a tuning/migration to more flexible and even more stringent test tolerances which can be worked on over time.

ccaprani · 2022-10-27T12:07:22Z

Hi @ricardoV94 just checking in on this as I see with new merges conflicts are arising.

ricardoV94 · 2022-10-28T10:28:33Z

@ccaprani let me see if I can get someone else to weight in :) I'll try to revisit again soon as well!

Armavica · 2022-10-28T16:00:36Z

Wouldn't it be more principled in theory to check logp and logcdf for absolute error, rather than relative?
Unless I am mistaken, MCMC is invariant by addition of any constant to logp. Which means that there shouldn't be any particular meaning given to the value 0 of these quantities, and an error of logp from 0 to 1 is in theory as bad as an error from 1e10 to 1e10 + 1. For this reason, I think that checking for absolute error is mathematically more appropriate in this case (unlike for more "physical" quantities for which 0 has a meaning, and 1 cm ± 1 mm is quite different from 1 m ± 1 mm). However, I can also see the technical float-related issues when logp has extremely high values such as 1e300. In this case, a small rtol can probably help.

This PR proposes a relatively large rtol and a null or tiny atol. But for these reasons I think that I would rather use a fair atol, which would determine the testing interval for most of the cases, and a tiny rtol, which would still dominate for very large values, just for the purpose to overcome float precision issues in these ranges.

ccaprani · 2022-10-29T03:51:23Z

@Armavica Yes; for some of the models & tests done here it makes sense to have an rtol as well as the atol. Indeed, if you look through, many tests currently avoid using check_logp as I suspect there was insufficient flexibility in setting tolerances for passing tests. This PR would give this flexibility and then a discussion about the most suitable values of atol and rtol at least has a purpose.

michaelosthege · 2022-11-04T16:48:26Z

@ccaprani @Armavica @ricardoV94 do you need additional support here?
I'm not familiar with the work here, but the one git conflict seems to be easy to resolve in the GitHub UI for anyone already familiar.

* Initial generalization of logp test comparison * Maintain current behaviour tweak

michaelosthege · 2022-12-10T21:21:13Z

@ccaprani I rebased it but one small set of tests appear to have tolerance issues: https://github.com/pymc-devs/pymc/actions/runs/3665240341/jobs/6196181272

Could you have a look at these?

ccaprani · 2022-12-14T03:35:53Z

@michaelosthege thank you. Not sure why bc111ca came along for the ride but 7371d2a passes all in test_discrete on my Ubuntu 2022 x64 machine with the 'fix'.

However, I'm not sure that we should be adjusting tolerances so casually to get tests to pass. On one hand adjusting until they do makes sense as its backward compatible. On the other, we could introduce some dubious setting. What do you think?

michaelosthege · 2022-12-14T10:09:10Z

@ccaprani

Not sure why bc111ca came along for the ride

Looks like you did some pull of upstream?
If you want to incorporate the latest changes from main it's always best to run git fetch origin && git rebase origin/main (assuming origin points to pymc-devs/pymc).

However, I'm not sure that we should be adjusting tolerances so casually to get tests to pass. On one hand adjusting until they do makes sense as its backward compatible. On the other, we could introduce some dubious setting. What do you think?

We've been doing this for a while, mostly because SciPy/NumPy compute with different floatX settings.
I guess if we gradually get better at seeding, we can make things more robust without needing to change tolerances. That would be much better to actually detect changes to the results..

For now, I think your changes improve the situation quite nicely

ricardoV94 · 2022-12-14T10:22:09Z

These tests are deterministic (in the long run). The only randomness is what subset of parameters are tested each time (the possible values don't change). You should set n_samples=-1 to test all combinations. Feel free to temporarily override the default here to confirm all tests pass and then revert that:

pymc/pymc/tests/distributions/util.py

Line 236 in 8b8e49b

n_samples=100,

As well as in the check_logcdf and check_selfconsistency_discrete_logcdf

ccaprani · 2023-09-07T12:47:56Z

Hi @ricardoV94 . I see a reference to it in #6850 in v5.8.0. Should we revitalize this? Has its time come?!

ricardoV94 · 2023-09-07T14:05:32Z

Hi @ricardoV94 . I see a reference to it in #6850 in v5.8.0. Should we revitalize this? Has its time come?!

Let's get it done!

ccaprani mentioned this pull request Sep 28, 2022

Genextreme pymc-devs/pymc-extras#84

Merged

ricardoV94 added tests maintenance no releasenotes Skipped in automatic release notes generation labels Sep 28, 2022

ricardoV94 reviewed Sep 28, 2022

View reviewed changes

pymc/tests/distributions/util.py Show resolved Hide resolved

ccaprani mentioned this pull request Oct 1, 2022

Update distribution guide #6166

Merged

5 tasks

ccaprani and others added 3 commits December 10, 2022 18:33

Refactor tolerance checks for atol, rtol

d4ad1aa

* Initial generalization of logp test comparison * Maintain current behaviour tweak

Elaborate on tolerances in implementing distributions guide

b9ce733

Fix find & replace accident

9e516ef

michaelosthege force-pushed the test-metric branch from 3687b12 to 9e516ef Compare December 10, 2022 17:34

ccaprani and others added 2 commits December 14, 2022 14:30

Merge branch 'pymc-devs:main' into test-metric

bc111ca

Update atol for platform precision

7371d2a

ricardoV94 mentioned this pull request Jun 12, 2023

Add icdf functions for Lognormal, Half Cauchy and Half Normal distributions #6766

Merged

5 tasks

ricardoV94 removed the no releasenotes Skipped in automatic release notes generation label Aug 24, 2023

Uh oh!

Generalization of logp test comparison to facilitate relative and/or absolute comparison #6159

Are you sure you want to change the base?

Generalization of logp test comparison to facilitate relative and/or absolute comparison #6159

Uh oh!

Conversation

ccaprani commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Major / Breaking Changes

Bugfixes / New features

Docs / Maintenance

Uh oh!

ricardoV94 commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccaprani commented Sep 28, 2022

Uh oh!

ricardoV94 commented Sep 28, 2022

Uh oh!

ricardoV94 commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ccaprani commented Sep 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccaprani commented Oct 27, 2022

Uh oh!

ricardoV94 commented Oct 28, 2022

Uh oh!

Armavica commented Oct 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccaprani commented Oct 29, 2022

Uh oh!

michaelosthege commented Nov 4, 2022

Uh oh!

michaelosthege commented Dec 10, 2022

Uh oh!

ccaprani commented Dec 14, 2022

Uh oh!

michaelosthege commented Dec 14, 2022

Uh oh!

ricardoV94 commented Dec 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ccaprani commented Sep 7, 2023

Uh oh!

ricardoV94 commented Sep 7, 2023

Uh oh!

Uh oh!

ccaprani commented Sep 28, 2022 •

edited

Loading

ricardoV94 commented Sep 28, 2022 •

edited

Loading

ricardoV94 commented Sep 28, 2022 •

edited

Loading

codecov bot commented Sep 28, 2022 •

edited

Loading

ccaprani commented Sep 28, 2022 •

edited

Loading

Armavica commented Oct 28, 2022 •

edited

Loading

ricardoV94 commented Dec 14, 2022 •

edited

Loading