Skip to content

Add Hypothesis Testing #13846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thomasjpfan opened this issue May 9, 2019 · 13 comments
Open

Add Hypothesis Testing #13846

thomasjpfan opened this issue May 9, 2019 · 13 comments
Labels
Hard Hard level of difficulty module:test-suite everything related to our tests

Comments

@thomasjpfan
Copy link
Member

It may be beneficial to add hypothesis testing to some of our tests. Hypothesis testing allows for us to verify mathematical properties such as commutativity:

from hypothesis import given
import hypothesis.strategies as st

@given(st.integers(), st.integers())
def test_ints_are_commutative(x, y):
    assert x + y == y + x

This allows the CI to look for edge cases for us. This would be useful testing metrics. https://hypothesis.readthedocs.io/en/latest/

@rth
Copy link
Member

rth commented May 9, 2019

I think hypothesis is quite interesting, however, in my experience, it makes tests run significantly longer (when compared to pytest.parametrize with a couple of options), and our tests suite is already quite slow.

Also, I really see the usefulness of hypothesis for testing edge cases of functions taking in int, str, etc (as the example above). For scientific computing when the input is arrays it more challenging. For instance, all invariance etc assumptions are valid as long as we don't reach floating point precisions issues, and it's relatively easy to reach those when generating random unbound floats.

This would be useful testing metrics.

What kind of things did you have in mind? Possibly related #8589

@glemaitre
Copy link
Member

This would be useful testing metrics.

This would be interesting to detect the corner case as in #13853.

@jorisvandenbossche I think that you used it in pandas, maybe you could share how it integrated within the test suite. If I recall, you had experienced what @rth is reporting, isn't it?

@rth
Copy link
Member

rth commented May 10, 2019

Indeed, things like https://github.com/pandas-dev/pandas/pull/23127/files and then who knows how long it will take on PyPy or some slow ARM CPU, and whether it would timeout..

@jorisvandenbossche
Copy link
Member

I don't have much to say, as I was not involved in setting it up in pandas (pandas-dev/pandas#22280), and not a user of hypothesis otherwise (apart from reporting the problem @rth linked to above).
Searching the pandas issue tracker, there were quite some issues with flaky hypothesis tests, or at least in the beginning (but those seem to be solved now). But I also don't think we added much more hypothesis-based tests after the initial PR linked above that added a few.

Personally, I think it adds quite some complexity to the testing, also for contributors (yet another thing to deal with). Unless there are some specific cases where it clearly adds value.

@thomasjpfan
Copy link
Member Author

I was thinking of using one instance just to only run Hypothesis tests and no other tests.

@dmyersturnbull
Copy link

Property tests can identify bugs that are otherwise hard to catch. Libraries like ScalaCheck QuickCheck have been pretty successful for this. If performance is an issue, couldn't they only be run with a flag?

@rth
Copy link
Member

rth commented May 14, 2019

If they are not part of our release tests, at least initially, I agree it could be an interesting point to investigate. Though this would add more complexity for new contributors.

@jnothman
Copy link
Member

jnothman commented May 20, 2019 via email

@rth
Copy link
Member

rth commented May 21, 2019

Relatedly, I proposed having a random_seed fixture that was globally set to different values on different testing runs. One benefit would be that we could easily distinguish those tests that are invariant under changing random seed from those that are brittle.

Opened #13913 on this topic.

@Zac-HD
Copy link

Zac-HD commented May 14, 2020

👋 hi all, I'm not much of an ML person but I am particularly interested in open source and in testing scientific code (as my tutorial at SciPy might indicate). So if getting a Hypothesis core dev to review or discuss some tests would ever be useful, ping me and let me know how I can help!

Along with - or before - testing all the mathematical properties you might think of, I highly recommend getting Hypothesis to generate any valid data (i.e. including unusual formats) and just calling your code. No assertions needed, in my experience you'll find a bunch of crashes, quickly improve your input validation, and probably fix some bugs too. Once you have that experience, testing 'metamorphic properties' is ludicrously effective...

@thomasjpfan
Copy link
Member Author

@Zac-HD Thank you for the link to the tutorial and your offer to help out! There are certainty a few interesting places for us to use property-based testing.

@Zac-HD
Copy link

Zac-HD commented Jul 25, 2020

I've since written a paper on testing scientific code (pdf) which is probably relevant to scikit-learn 🙂

The METTLE paper shows off some fancier properties you can test specifically for unsupervised ML systems. Personally I'd start with "does not crash on any combination of valid inputs" though, and make sure the CI config etc. is all working.

@rth
Copy link
Member

rth commented Jul 25, 2020

Thanks for the link @Zac-HD !

Personally I'd start with "does not crash on any combination of valid inputs" though, and make sure the CI config etc. is all working.

Absolutely, that's what we are trying to do in #17441 as a first step. Though we have been adding more invariance tests lately to the common tests as well e.g. #17319 #17176

@thomasjpfan thomasjpfan added module:test-suite everything related to our tests Hard Hard level of difficulty labels Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hard Hard level of difficulty module:test-suite everything related to our tests
Projects
None yet
Development

No branches or pull requests

7 participants