-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Generalization of logp test comparison to facilitate relative and/or absolute comparison #6159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Do you want to try some breaking change, and suggest values that look reasonable as a default? That would be great for maintenance / future devs |
Changing from using an atol to an rtol gives the same test results on rtol=10**(-decimal),
atol=0,#1.5 * 10 ** (-decimal), I haven't got
|
Would probably need to tweak the decimal value, no? And also important to check for float32. |
I suggest you allow the rtol/atol to be configured already so you can unblock your PR on pymc-experimental. Then we can follow up with a separate PR where we propose changing the default tolerance for the pre-existing tests. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #6159 +/- ##
==========================================
- Coverage 94.79% 87.05% -7.74%
==========================================
Files 148 148
Lines 27488 27492 +4
==========================================
- Hits 26058 23934 -2124
- Misses 1430 3558 +2128
|
@ricardoV94 so an initial refactor is done to now allow both atol and rtol to be specified. Given that previous behaviour was an absolute tolerance set at I confirmed test behaviour is the same, at least for bound, censored, continuous. Running the suite (and I don't have polyagamma isntalled, so that's 5 fails) gives:
Here are a few observations:
I think this architecture now allows for a tuning/migration to more flexible and even more stringent test tolerances which can be worked on over time. |
Hi @ricardoV94 just checking in on this as I see with new merges conflicts are arising. |
@ccaprani let me see if I can get someone else to weight in :) I'll try to revisit again soon as well! |
Wouldn't it be more principled in theory to check This PR proposes a relatively large |
@Armavica Yes; for some of the models & tests done here it makes sense to have an |
@ccaprani @Armavica @ricardoV94 do you need additional support here? |
* Initial generalization of logp test comparison * Maintain current behaviour tweak
3687b12
to
9e516ef
Compare
@ccaprani I rebased it but one small set of tests appear to have tolerance issues: https://github.com/pymc-devs/pymc/actions/runs/3665240341/jobs/6196181272 Could you have a look at these? |
@michaelosthege thank you. Not sure why bc111ca came along for the ride but 7371d2a passes all in However, I'm not sure that we should be adjusting tolerances so casually to get tests to pass. On one hand adjusting until they do makes sense as its backward compatible. On the other, we could introduce some dubious setting. What do you think? |
Looks like you did some pull of upstream?
We've been doing this for a while, mostly because SciPy/NumPy compute with different floatX settings. For now, I think your changes improve the situation quite nicely |
These tests are deterministic (in the long run). The only randomness is what subset of parameters are tested each time (the possible values don't change). You should set pymc/pymc/tests/distributions/util.py Line 236 in 8b8e49b
As well as in the |
Hi @ricardoV94 . I see a reference to it in #6850 in v5.8.0. Should we revitalize this? Has its time come?! |
Let's get it done! |
Following a discussion here pymc-devs/pymc-extras#84 (comment) with @ricardoV94 , there may be cases when it is better to do a relative and/or absolute check on the comparison between the desired (SciPy) and actual (pymc) values. Currently only an absolute test is done, to
1.5*1e-decimal
(see here) precision.This (small) PR maintains current behaviour (i.e. absolute tolerance), but facilitates future expansion to consider both absolute and/or relative comparison.Once/if this PR is accepted, then theatol
andrtol
arguments can be propagated out to test functions as a kwarg asdecimal
current is. It could help avoid some of the float32 skipped tests and deal with some other edge cases.Implementation
Change from the not-recommended assert_almost_equal to the assert_allclose from
numpy.testing
.Maintain current behaviour by using
atol=decimal
andrtol=0
in comparing the difference between actual and desired toatol + rtol * abs(desired)
.Major / Breaking Changes
Bugfixes / New features
Docs / Maintenance