Skip to content

Generalization of logp test comparison to facilitate relative and/or absolute comparison #6159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ccaprani
Copy link

@ccaprani ccaprani commented Sep 28, 2022

Following a discussion here pymc-devs/pymc-extras#84 (comment) with @ricardoV94 , there may be cases when it is better to do a relative and/or absolute check on the comparison between the desired (SciPy) and actual (pymc) values. Currently only an absolute test is done, to 1.5*1e-decimal (see here) precision.

This (small) PR maintains current behaviour (i.e. absolute tolerance), but facilitates future expansion to consider both absolute and/or relative comparison.

Once/if this PR is accepted, then the atol and rtol arguments can be propagated out to test functions as a kwarg as decimal current is. It could help avoid some of the float32 skipped tests and deal with some other edge cases.

  • Scope creep means this has been done now!

Implementation

Change from the not-recommended assert_almost_equal to the assert_allclose from numpy.testing.

Maintain current behaviour by using atol=decimal and rtol=0 in comparing the difference between actual and desired to atol + rtol * abs(desired).

Major / Breaking Changes

  • None

Bugfixes / New features

  • Facilities a new feature in testing in that future logp could be tested relative and/or absolute.

Docs / Maintenance

  • N/A

@ricardoV94
Copy link
Member

ricardoV94 commented Sep 28, 2022

Do you want to try some breaking change, and suggest values that look reasonable as a default? That would be great for maintenance / future devs

@ccaprani
Copy link
Author

Changing from using an atol to an rtol gives the same test results on test_continuous.py at least:

rtol=10**(-decimal),
atol=0,#1.5 * 10 ** (-decimal),

I haven't got polygamma installed, so I get:

FAILED test_continuous.py::TestMatchesScipy::test_moyal_logcdf - AssertionError: 
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[1.0-0.0-None-0.25] - RuntimeError: polyagamma package is not installed!
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[1.0-z1-None-expected1] - RuntimeError: polyagamma package is not installed!
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[h2-z2-None-expected2] - RuntimeError: polyagamma package is not installed!
FAILED test_continuous.py::TestMoments::test_polyagamma_moment[h3-z3-size3-expected3] - RuntimeError: polyagamma package is not installed!
========================================================================= 5 failed, 268 passed, 2 skipped, 18 warnings in 37.23s ==========================================================================

@ricardoV94
Copy link
Member

Would probably need to tweak the decimal value, no? And also important to check for float32.

@ricardoV94
Copy link
Member

ricardoV94 commented Sep 28, 2022

I suggest you allow the rtol/atol to be configured already so you can unblock your PR on pymc-experimental.

Then we can follow up with a separate PR where we propose changing the default tolerance for the pre-existing tests.

@ricardoV94 ricardoV94 added tests maintenance no releasenotes Skipped in automatic release notes generation labels Sep 28, 2022
@codecov
Copy link

codecov bot commented Sep 28, 2022

Codecov Report

Merging #6159 (7371d2a) into main (f96594b) will decrease coverage by 7.73%.
The diff coverage is 90.47%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6159      +/-   ##
==========================================
- Coverage   94.79%   87.05%   -7.74%     
==========================================
  Files         148      148              
  Lines       27488    27492       +4     
==========================================
- Hits        26058    23934    -2124     
- Misses       1430     3558    +2128     
Impacted Files Coverage Δ
pymc/distributions/continuous.py 66.66% <ø> (-31.02%) ⬇️
pymc/tests/distributions/test_continuous.py 0.00% <ø> (-99.78%) ⬇️
pymc/tests/distributions/test_multivariate.py 36.52% <0.00%> (-62.93%) ⬇️
pymc/tests/distributions/test_logprob.py 100.00% <100.00%> (ø)
pymc/tests/distributions/test_timeseries.py 99.51% <100.00%> (ø)
pymc/tests/distributions/util.py 88.07% <100.00%> (-2.79%) ⬇️
pymc/distributions/multivariate.py 57.77% <0.00%> (-34.51%) ⬇️
... and 5 more

@ccaprani
Copy link
Author

ccaprani commented Sep 28, 2022

@ricardoV94 so an initial refactor is done to now allow both atol and rtol to be specified.

Given that previous behaviour was an absolute tolerance set at decimals=6 (just considering float64 for now), then this corresponds to 1e-6 as an absolute precision, so I adopted this as default. However, I moved all check_logp and check_logc from an absolute to an rtol check. I also left a default atol of 1e-10, which is necessary in some cases to get current test behaviour.

I confirmed test behaviour is the same, at least for bound, censored, continuous. Running the suite (and I don't have polyagamma isntalled, so that's 5 fails) gives:

50 failed, 2010 passed, 2 skipped, 243 xfailed, 164 warnings in 870.40s (0:14:30)

Here are a few observations:

  • I moved default tolerances to select_by_precision so they can be set in one place only.
  • Currently the absolute tolerances for float64 and float32 are set everywhere: suggest this is migrated to avoid duplicating defaults.-
  • I did not adjust tests that did not use check_logp or check_logc - many functions seem to 'roll their own'
  • The float32 checks in many cases were set at extremely generous levels (e.g. 1e1!)
  • test_continuous.py::TestMatchesScipy::test_moyal_logcdf fails, and it's nothing to do with this change.

I think this architecture now allows for a tuning/migration to more flexible and even more stringent test tolerances which can be worked on over time.

@ccaprani ccaprani mentioned this pull request Oct 1, 2022
5 tasks
@ccaprani
Copy link
Author

Hi @ricardoV94 just checking in on this as I see with new merges conflicts are arising.

@ricardoV94
Copy link
Member

@ccaprani let me see if I can get someone else to weight in :) I'll try to revisit again soon as well!

@Armavica
Copy link
Member

Armavica commented Oct 28, 2022

Wouldn't it be more principled in theory to check logp and logcdf for absolute error, rather than relative?
Unless I am mistaken, MCMC is invariant by addition of any constant to logp. Which means that there shouldn't be any particular meaning given to the value 0 of these quantities, and an error of logp from 0 to 1 is in theory as bad as an error from 1e10 to 1e10 + 1. For this reason, I think that checking for absolute error is mathematically more appropriate in this case (unlike for more "physical" quantities for which 0 has a meaning, and 1 cm ± 1 mm is quite different from 1 m ± 1 mm). However, I can also see the technical float-related issues when logp has extremely high values such as 1e300. In this case, a small rtol can probably help.

This PR proposes a relatively large rtol and a null or tiny atol. But for these reasons I think that I would rather use a fair atol, which would determine the testing interval for most of the cases, and a tiny rtol, which would still dominate for very large values, just for the purpose to overcome float precision issues in these ranges.

@ccaprani
Copy link
Author

@Armavica Yes; for some of the models & tests done here it makes sense to have an rtol as well as the atol. Indeed, if you look through, many tests currently avoid using check_logp as I suspect there was insufficient flexibility in setting tolerances for passing tests. This PR would give this flexibility and then a discussion about the most suitable values of atol and rtol at least has a purpose.

@michaelosthege
Copy link
Member

@ccaprani @Armavica @ricardoV94 do you need additional support here?
I'm not familiar with the work here, but the one git conflict seems to be easy to resolve in the GitHub UI for anyone already familiar.

@michaelosthege
Copy link
Member

@ccaprani I rebased it but one small set of tests appear to have tolerance issues: https://github.com/pymc-devs/pymc/actions/runs/3665240341/jobs/6196181272

Could you have a look at these?

@ccaprani
Copy link
Author

@michaelosthege thank you. Not sure why bc111ca came along for the ride but 7371d2a passes all in test_discrete on my Ubuntu 2022 x64 machine with the 'fix'.

However, I'm not sure that we should be adjusting tolerances so casually to get tests to pass. On one hand adjusting until they do makes sense as its backward compatible. On the other, we could introduce some dubious setting. What do you think?

@michaelosthege
Copy link
Member

@ccaprani

Not sure why bc111ca came along for the ride

Looks like you did some pull of upstream?
If you want to incorporate the latest changes from main it's always best to run git fetch origin && git rebase origin/main (assuming origin points to pymc-devs/pymc).

However, I'm not sure that we should be adjusting tolerances so casually to get tests to pass. On one hand adjusting until they do makes sense as its backward compatible. On the other, we could introduce some dubious setting. What do you think?

We've been doing this for a while, mostly because SciPy/NumPy compute with different floatX settings.
I guess if we gradually get better at seeding, we can make things more robust without needing to change tolerances. That would be much better to actually detect changes to the results..

For now, I think your changes improve the situation quite nicely

@ricardoV94
Copy link
Member

ricardoV94 commented Dec 14, 2022

These tests are deterministic (in the long run). The only randomness is what subset of parameters are tested each time (the possible values don't change). You should set n_samples=-1 to test all combinations. Feel free to temporarily override the default here to confirm all tests pass and then revert that:

n_samples=100,

As well as in the check_logcdf and check_selfconsistency_discrete_logcdf

@ricardoV94 ricardoV94 removed the no releasenotes Skipped in automatic release notes generation label Aug 24, 2023
@ccaprani
Copy link
Author

ccaprani commented Sep 7, 2023

Hi @ricardoV94 . I see a reference to it in #6850 in v5.8.0. Should we revitalize this? Has its time come?!

@ricardoV94
Copy link
Member

Hi @ricardoV94 . I see a reference to it in #6850 in v5.8.0. Should we revitalize this? Has its time come?!

Let's get it done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants