Add initial property-based tests using Hypothesis #22280

Zac-HD · 2018-08-11T13:04:53Z

This pull request has no features or bugfixes for Pandas itself (though it does have a whatsnew entry for contributors), but adds support for and actually adds a few tests using Hypothesis. The new section of the contributing guide below should explain what this means and why it's desirable. There are basically three contributions here:

Contributor documentation and all the dependency setup to support Hypothesis tests in CI.
A pleasantly concise pair of passing tests in test_ticks.py.
Three failing tests in test_offsets_properties.py based on [WIP] implement tests using hypothesis #18761 by @jbrockmendel. I've made these idiomatic from the Hypothesis side, but while it seems likely that at least test_on_offset_implementations might be finding real bugs fixing them is beyond me at the moment. I thought it worth including them anyway, with an xfail decorator.

Future work on Hypothesis tests could focus on such fruitful areas as round-tripping data via any of the serialisation formats Pandas offers (I tried to write a demo of this and couldn't get it to pass), or the melt and merge tests that @TomAugspurger mentioned.

Closes #17978, after which more specific issues could be opened with suggested tests.

codecov · 2018-08-11T14:24:35Z

Codecov Report

Merging #22280 into master will decrease coverage by <.01%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #22280      +/-   ##
==========================================
- Coverage   92.04%   92.03%   -0.01%     
==========================================
  Files         169      169              
  Lines       50776    50780       +4     
==========================================
  Hits        46737    46737              
- Misses       4039     4043       +4

Flag	Coverage Δ
#multiple	`90.44% <0%> (-0.01%)`	⬇️
#single	`42.22% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/util/_tester.py	`23.8% <0%> (-5.61%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55d176d...779b49a. Read the comment docs.

pandas/tests/tseries/offsets/test_offsets_properties.py

jbrockmendel · 2018-08-11T19:14:47Z

This looks really nice, definitely prettier than #18761.

Do the xfailed tests actually fail or is that just being defensive? Ideally we'd like to have strict=True in xfails.

Zac-HD · 2018-08-12T02:35:36Z

Thanks! I have an an unfair advantage in Hypothesis idioms, but I couldn't have written anything like this without your example to follow on the Pandas side 😄

The tests really do fail - I've added strict=True to the xfails and a note to one which is inconsistent. Hopefully someone who knows more about Pandas internals than I do can either improve the tests or actually fix the bugs!

TomAugspurger · 2018-08-12T20:07:07Z

Agreed this looks quite nice, thanks @Zac-HD .

For better or worse, our test dependencies are all optional other than pytest. I think hypothesis should follow that pattern. If others agree with that policy, I suppose the easiest way to do that would be to keep tests using hypothesis in a separate file, and pytest.importorskip('hypothesis'), or add a helper in pandas.util.testing.

I'll look at those xfailing tests later.

Zac-HD · 2018-08-13T06:40:42Z

In principle I don't have any particular objections to making Hypothesis optional and keeping it in particular files - that's how it works for Xarray, and creating a new sub-directory tests/hypothesis/ isn't hard.

The only catch is that they should be executed as part of every test run in CI! Otherwise you miss all the platform-specific issues that it could turn up, including the thing where test_on_offset_implementations has to be marked strict=False because it only fails on some operating systems.

If you feel that property-based tests are going to be valuable - and there are obvious applications to testing e.g. round-tripping serialization that I couldn't get to pass - personally I'd just make Hypothesis a lightweight required dependency for development. Happy to let you decide either way though!

gfyoung · 2018-08-13T06:57:49Z

@TomAugspurger : I think hypothesis could be quite valuable for testing purposes. However, I would play conservative first and make this optional as you suggested. If it proves itself to be very important for testing purposes, then we could consider making it required later on.

Zac-HD · 2018-08-13T07:08:36Z

OK, optional it is then! Do I just list the dependency in ci/requirements-optional-conda.txt? Or is there something more involved?

And just to make this more fun, it seems like the conda-forge package doesn't work for Python3 under OSX... cc @jochym for help please 😨

TomAugspurger · 2018-08-13T10:56:31Z

@Zac-HD namespace collision I think :/ Seems to be hypothesis-python, at least on conda-forge.

Our CI setup is a bit messy, but I think

Include hypothesis-python in ci/environment-dev.yaml
Include a rename in scripts/convert_deps.py from hypothesis-python to hypothesis
Run python scripts/convert_deps.py and stage / commit the files
Update some of the CI files to include hypothesis-python
a. circle-27-compat.yaml
b. circle-36-locale.yaml
c. travis-27.yaml
c. travis-35-osx.yaml
d. travis-37.yaml
e. appveyor-27.yaml
f. appveyor-36.yaml

Zac-HD · 2018-08-13T11:22:20Z

Hmm, I've just checked and I don't think the name is the issue after all (and I can't find a package called hypothesis-python on conda-forge at all).

And it's currently listed in ci/environment-dev.yaml - do you mean moving it to ci/requirements-optional-conda.txt?

TomAugspurger · 2018-08-13T11:28:54Z

Hmm, I've just checked and I don't think the name is the issue after all

Ah, sorry I misread the travis output. Ignore that.

And it's currently listed in ci/environment-dev.yaml

That's perfect. Could you also run python scripts/convert_deps.py and include the output (for our contributors using pip).

doc/source/contributing.rst

jreback · 2018-08-13T11:25:24Z

pandas/tests/tseries/offsets/test_offsets_properties.py

+import pandas as pd
+
+from pandas.tseries.offsets import (
+    Hour, Minute, Second, Milli, Micro, Nano,


these already exist as fixtures, i'd like to avoid specifically importing things like this and use the fixture / generating fixture instead

This is actually a non-trivial trade-off! Because we can't call fixtures outside of a test, we'll still need to import these types to register the strategy for each of them in conftest.py (see next comment).

In the test file, we have two options:

Import them, and let Hypothesis decide which to try.

Use a fixture which creates a test for each, and have Hypothesis try cases for each independently.

(1) is faster at runtime because Hypothesis is running one instead of n tests, while (2) is more explicit if a little less elegant. Your call.

jreback · 2018-08-13T11:28:47Z

pandas/tests/tseries/offsets/test_offsets_properties.py

+               QuarterBegin, QuarterEnd, BQuarterBegin, BQuarterEnd,
+               YearBegin, YearEnd, BYearBegin, BYearEnd]
+
+# ----------------------------------------------------------------


this is the problem I had with the original PR. There is a lot of code to create the things that need testing (e.g. the case generating statements).

Putting all of this inside a specific test file makes this very hard to re-use / document. Is there a standard like a conftest.py where these typically exist? We can create infrastructure to do this, but a) want to share as much of this code as possible, and b) put it in standard, well-defined, easy to discover locations.

Global setup using register_type_strategy could easily be moved to conftest.py (heuristic: "equivalent to defining fixtures"), with a comment at each end explaining where the other part is.

For the gen_* variables, I'd actually leave them right where they are - they look fairly specialised to these tests, and strategies would be idiomatically constructed within the call to @given if they were shorter or unique.
Common bits can be extracted out as they get identified, but TBH we'd love to supply anything useful downstream in hypothesis.extra.pandas so some of that will hopefully vanish again 😉

pandas/tests/tseries/offsets/test_ticks.py

jbrockmendel · 2018-08-13T17:42:29Z

@Zac-HD Is there anything else from #18761 worth trying to salvage?

Zac-HD · 2018-08-14T01:03:06Z

@jbrockmendel - it looks like you had some neat ideas for further tests of e.g. timedeltas, but I thought it would be better for me to stick to the groundwork and leave them for a follow-up PR that could do them justice.

If you or anyone else would like to work on that, I'd be happy to advise on Hypothesis idioms (a standing offer for any open source project).

Zac-HD · 2018-08-15T01:23:34Z

@jreback - I've implemented all the changes from your review. Is there anything else I need to do before this can be merged?

Zac-HD · 2018-08-18T13:22:34Z

Ping @TomAugspurger & @jreback for a final review. It would be great to iterate on this at the PyCon AU sprints if it can be merged by next weekend, so if there's anything left to do I would really like to hear about it soon!

TomAugspurger · 2018-08-18T13:55:21Z

pandas/conftest.py

+# ----------------------------------------------------------------
+# Global setup for tests using Hypothesis
+
+from hypothesis import strategies as st


This will put a hard dependency on hypothesis for testing. Are we OK with that? After some thought, I think it's fine. It's a well-maintained project, and working around it in the test suite seems silly.

If we're ok with that, then @Zac-HD could you update

pandas/util/_tester.py to have a nice message if either pytest or hypothesis is missing?

pandas/ci/check_imports.py to ensure hypothesis is not imported with the main import pandas?

doc/source/whatsnew/0.24.0.txt with a small subsection saying hypothesis is required for running the tests (with a link to the hypothesis docs :)

My read of the reviews so far is that @jreback was in favor of a mandatory dependency (also my recommendation), and you're now in favor too.

I've therefore made the relevant changes and it's all ready to go 🎉

(though one build on Travis has errored out, the tests passed until the timeout)

so, I still think we need to a) remove hypothesis from 1 build (the same one we have removed moto from is good). and use pyimportor.skip('hypthoesis'). The reason is not for our CI really, rather so when a user does pd.test() is doesn't fail, rather it will just skip those tests.

@jreback there are two problems with making Hypothesis optional for pd.test():

It makes adding further Hypothesis tests - eg for serialisation round-trips, timedeltas, or reshaping logic - much harder. They'd have to be in separate files, guard any global setup and configuration, handle import-or-skips, etc.

It forces us to choose to either duplicate tests, or skip them at runtime.

That doesn't make it completely unreasonable, I'd prefer to just have the dependency - and I've been using Pandas for much longer than Hypothesis!

TLDR - what's wrong with putting Hypothesis in the same category as pytest?

I think moto is a bit different, since it's relatively unimportant to mainline pandas, and so is easy to work around.

IMO, hypothesis should be treated the same as pytest.

TomAugspurger · 2018-08-20T19:09:08Z

Merged master. The travis timeout should be fixed now.

jreback · 2018-08-20T19:56:36Z

let me have a look

jreback · 2018-08-20T22:44:45Z

pandas/conftest.py

+# ----------------------------------------------------------------
+# Global setup for tests using Hypothesis
+
+from hypothesis import strategies as st


so, I still think we need to a) remove hypothesis from 1 build (the same one we have removed moto from is good). and use pyimportor.skip('hypthoesis'). The reason is not for our CI really, rather so when a user does pd.test() is doesn't fail, rather it will just skip those tests.

pandas/util/_tester.py

Zac-HD · 2018-08-23T13:36:31Z

@jreback my and Tom's view is that we're best off with Hypothesis as a hard dependency to run tests. Can you live with this and give an approving review? If not, what problems do you see?

(if possible I'd really like to merge this Friday or Saturday so I can talk about it at PyCon Australia and get some follow-up work done at the sprints)

jreback · 2018-08-23T14:04:02Z

it still can be a hard dependency for our testing

but as i said users code will immediately fail if they only have pytest

for user code that’s all that we can / should require

I am not sure why hisbis a problem
you simply put a skip if in the conftest itself
if it’s not installed

TomAugspurger · 2018-08-23T14:09:35Z

What's the issue with having it as a required test dependency? Having to write tests around that constraint seems unnecessarily burdensome on contributors / maintainers.

jreback · 2018-08-23T14:43:23Z

I guess its ok to add as a dep for testing. can you update the install.rst as well.

Zac-HD · 2018-08-24T13:08:43Z

@TomAugspurger - I'm quite certain these build issues are unrelated to the pull. Any idea on how to make it green?

TomAugspurger · 2018-08-24T13:42:08Z

Hopefully fixed by #22499

…

On Fri, Aug 24, 2018 at 8:09 AM Zac Hatfield-Dodds ***@***.***> wrote: @TomAugspurger <https://github.com/TomAugspurger> - I'm quite certain these build issues are unrelated to the pull. Any idea on how to make it green? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22280 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIumRZOu3KXEcutZxC_uidGika65xks5uT_r0gaJpZM4V5KSi> .

These tests are derived from GH18761, by jbrockmendel Co-authored-by: jbrockmendel <[email protected]>

Responding to review from jreback on GH22280.

Zac-HD · 2018-08-24T21:10:22Z

🎉 - it's working! @TomAugspurger, if you (or anyone else!) can hit 'merge' by, uh, 4am UTC that would be in time for my talk at PyCon Australia 😅

TomAugspurger · 2018-08-25T02:05:53Z

Thanks!

h-vetinari · 2018-08-27T08:39:19Z

After rebasing #22236, I'm getting a failure related to this PR in the circleci/py36_locale job (https://circleci.com/gh/pandas-dev/pandas/18093):

=================================== FAILURES ===================================
___________________________ test_tick_add_sub[Nano] ____________________________

cls = <class 'pandas.tseries.offsets.Nano'>

    @pytest.mark.parametrize('cls', tick_classes)
>   @example(n=2, m=3)
    @example(n=800, m=300)
    @example(n=1000, m=5)
    @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
    def test_tick_add_sub(cls, n, m):
E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 2 valid examples in 1.09 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

pandas/tests/tseries/offsets/test_ticks.py:40: FailedHealthCheck
---------------------------------- Hypothesis ----------------------------------
You can add @seed(183585162026757523236170973153487462479) to this test or run pytest with --hypothesis-seed=183585162026757523236170973153487462479 to reproduce this failure.

h-vetinari · 2018-08-27T22:18:31Z

And again, this time in another test, but same build job (https://circleci.com/gh/pandas-dev/pandas/18150):

=================================== FAILURES ===================================
___________________________ test_tick_add_sub[Milli] ___________________________

cls = <class 'pandas.tseries.offsets.Milli'>

    @pytest.mark.parametrize('cls', tick_classes)
>   @example(n=2, m=3)
    @example(n=800, m=300)
    @example(n=1000, m=5)
    @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
    def test_tick_add_sub(cls, n, m):
E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 2 valid examples in 1.14 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

pandas/tests/tseries/offsets/test_ticks.py:40: FailedHealthCheck
---------------------------------- Hypothesis ----------------------------------
You can add @seed(188896409658629737425924686489321258929) to this test or run pytest with --hypothesis-seed=188896409658629737425924686489321258929 to reproduce this failure.

I think this healthcheck is too restrictive. If the server is busy for that single second that the samples are generated, it might already be triggered... The error message already mentions how to disable it, maybe this should be considered as a follow-up...?

Zac-HD · 2018-08-28T00:02:03Z

Ah, looks like a CI clock issue. I'll open a follow-up PR in the next day with the following in conftest.py:

from hypothesis import settings, HealthCheck

settings.default.suppress_health_check = (HealthCheck.TooSlow,)
                                       # HealthCheck.all() to disable all health checks

* BLD: Add Hypothesis to build system * TST: Add Hypothesis tests for ticks, offsets These tests are derived from GH18761, by jbrockmendel Co-authored-by: jbrockmendel <[email protected]> * DOC: Explain Hypothesis in contributing guide * TST: remove pointless loop * TST: Improve integration of Hypothesis Responding to review from jreback on GH22280. * Final review fixes

Zac-HD force-pushed the hypothesis branch 2 times, most recently from cff98d8 to e13decd Compare August 11, 2018 14:24

jbrockmendel reviewed Aug 11, 2018

View reviewed changes

pandas/tests/tseries/offsets/test_offsets_properties.py Outdated Show resolved Hide resolved

Zac-HD force-pushed the hypothesis branch from e13decd to 9a72a1a Compare August 12, 2018 02:08

Zac-HD force-pushed the hypothesis branch from 9a72a1a to cdc4c4c Compare August 12, 2018 03:06

gfyoung added the Testing pandas testing functions or related to the test suite label Aug 13, 2018

jreback requested changes Aug 13, 2018

View reviewed changes

Zac-HD force-pushed the hypothesis branch from 16a3bd9 to f94cf92 Compare August 14, 2018 01:00

TomAugspurger reviewed Aug 18, 2018

View reviewed changes

Zac-HD force-pushed the hypothesis branch from f94cf92 to 95ec029 Compare August 19, 2018 12:13

jreback requested changes Aug 20, 2018

View reviewed changes

jreback mentioned this pull request Aug 20, 2018

TST: add hypothesis-based tests #20590

Closed

4 tasks

jreback added this to the 0.24.0 milestone Aug 20, 2018

Zac-HD force-pushed the hypothesis branch 2 times, most recently from 7c86fac to 835f352 Compare August 24, 2018 03:56

Zac-HD and others added 6 commits August 25, 2018 06:35

BLD: Add Hypothesis to build system

80f126c

TST: Add Hypothesis tests for ticks, offsets

3b3889d

These tests are derived from GH18761, by jbrockmendel Co-authored-by: jbrockmendel <[email protected]>

DOC: Explain Hypothesis in contributing guide

d51cac5

TST: remove pointless loop

ae17d4d

TST: Improve integration of Hypothesis

5c6e2bd

Responding to review from jreback on GH22280.

Final review fixes

779b49a

Zac-HD force-pushed the hypothesis branch from 70510be to 779b49a Compare August 24, 2018 20:35

TomAugspurger merged commit fa47b8d into pandas-dev:master Aug 25, 2018

TomAugspurger mentioned this pull request Sep 4, 2018

Set hypothesis HealthCheck #22593

Closed

topper-123 mentioned this pull request Sep 8, 2018

ENH: better MultiIndex.__repr__ #22511

Merged

3 tasks

jorisvandenbossche mentioned this pull request May 10, 2019

Add Hypothesis Testing scikit-learn/scikit-learn#13846

Open

Uh oh!

Add initial property-based tests using Hypothesis #22280

Add initial property-based tests using Hypothesis #22280

Uh oh!

Conversation

Zac-HD commented Aug 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

jbrockmendel commented Aug 11, 2018

Uh oh!

Zac-HD commented Aug 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Aug 12, 2018

Uh oh!

Zac-HD commented Aug 13, 2018

Uh oh!

gfyoung commented Aug 13, 2018

Uh oh!

Zac-HD commented Aug 13, 2018

Uh oh!

TomAugspurger commented Aug 13, 2018

Uh oh!

Zac-HD commented Aug 13, 2018

Uh oh!

TomAugspurger commented Aug 13, 2018

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jbrockmendel commented Aug 13, 2018

Uh oh!

Zac-HD commented Aug 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Zac-HD commented Aug 15, 2018

Uh oh!

Zac-HD commented Aug 18, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Aug 20, 2018

Uh oh!

jreback commented Aug 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Zac-HD commented Aug 23, 2018

Uh oh!

jreback commented Aug 23, 2018

Uh oh!

TomAugspurger commented Aug 23, 2018

Uh oh!

jreback commented Aug 23, 2018

Uh oh!

Zac-HD commented Aug 24, 2018

Uh oh!

TomAugspurger commented Aug 24, 2018 via email

Zac-HD commented Aug 11, 2018 •

edited

Loading

codecov bot commented Aug 11, 2018 •

edited

Loading

Zac-HD commented Aug 12, 2018 •

edited

Loading

Zac-HD commented Aug 14, 2018 •

edited

Loading

Zac-HD commented Aug 24, 2018 •

edited

Loading

h-vetinari commented Aug 27, 2018 •

edited

Loading

Zac-HD commented Aug 28, 2018 •

edited

Loading