Skip to content

Add initial property-based tests using Hypothesis #22280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 25, 2018

Conversation

Zac-HD
Copy link
Contributor

@Zac-HD Zac-HD commented Aug 11, 2018

This pull request has no features or bugfixes for Pandas itself (though it does have a whatsnew entry for contributors), but adds support for and actually adds a few tests using Hypothesis. The new section of the contributing guide below should explain what this means and why it's desirable. There are basically three contributions here:

  1. Contributor documentation and all the dependency setup to support Hypothesis tests in CI.
  2. A pleasantly concise pair of passing tests in test_ticks.py.
  3. Three failing tests in test_offsets_properties.py based on [WIP] implement tests using hypothesis #18761 by @jbrockmendel. I've made these idiomatic from the Hypothesis side, but while it seems likely that at least test_on_offset_implementations might be finding real bugs fixing them is beyond me at the moment. I thought it worth including them anyway, with an xfail decorator.

Future work on Hypothesis tests could focus on such fruitful areas as round-tripping data via any of the serialisation formats Pandas offers (I tried to write a demo of this and couldn't get it to pass), or the melt and merge tests that @TomAugspurger mentioned.

Closes #17978, after which more specific issues could be opened with suggested tests.

@Zac-HD Zac-HD force-pushed the hypothesis branch 2 times, most recently from cff98d8 to e13decd Compare August 11, 2018 14:24
@codecov
Copy link

codecov bot commented Aug 11, 2018

Codecov Report

Merging #22280 into master will decrease coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22280      +/-   ##
==========================================
- Coverage   92.04%   92.03%   -0.01%     
==========================================
  Files         169      169              
  Lines       50776    50780       +4     
==========================================
  Hits        46737    46737              
- Misses       4039     4043       +4
Flag Coverage Δ
#multiple 90.44% <0%> (-0.01%) ⬇️
#single 42.22% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/util/_tester.py 23.8% <0%> (-5.61%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55d176d...779b49a. Read the comment docs.

@jbrockmendel
Copy link
Member

This looks really nice, definitely prettier than #18761.

Do the xfailed tests actually fail or is that just being defensive? Ideally we'd like to have strict=True in xfails.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 12, 2018

Thanks! I have an an unfair advantage in Hypothesis idioms, but I couldn't have written anything like this without your example to follow on the Pandas side 😄

The tests really do fail - I've added strict=True to the xfails and a note to one which is inconsistent. Hopefully someone who knows more about Pandas internals than I do can either improve the tests or actually fix the bugs!

@TomAugspurger
Copy link
Contributor

Agreed this looks quite nice, thanks @Zac-HD .

For better or worse, our test dependencies are all optional other than pytest. I think hypothesis should follow that pattern. If others agree with that policy, I suppose the easiest way to do that would be to keep tests using hypothesis in a separate file, and pytest.importorskip('hypothesis'), or add a helper in pandas.util.testing.

I'll look at those xfailing tests later.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 13, 2018

In principle I don't have any particular objections to making Hypothesis optional and keeping it in particular files - that's how it works for Xarray, and creating a new sub-directory tests/hypothesis/ isn't hard.

The only catch is that they should be executed as part of every test run in CI! Otherwise you miss all the platform-specific issues that it could turn up, including the thing where test_on_offset_implementations has to be marked strict=False because it only fails on some operating systems.

If you feel that property-based tests are going to be valuable - and there are obvious applications to testing e.g. round-tripping serialization that I couldn't get to pass - personally I'd just make Hypothesis a lightweight required dependency for development. Happy to let you decide either way though!

@gfyoung gfyoung added the Testing pandas testing functions or related to the test suite label Aug 13, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 13, 2018

@TomAugspurger : I think hypothesis could be quite valuable for testing purposes. However, I would play conservative first and make this optional as you suggested. If it proves itself to be very important for testing purposes, then we could consider making it required later on.

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 13, 2018

OK, optional it is then! Do I just list the dependency in ci/requirements-optional-conda.txt? Or is there something more involved?

And just to make this more fun, it seems like the conda-forge package doesn't work for Python3 under OSX... cc @jochym for help please 😨

@TomAugspurger
Copy link
Contributor

@Zac-HD namespace collision I think :/ Seems to be hypothesis-python, at least on conda-forge.

Our CI setup is a bit messy, but I think

  1. Include hypothesis-python in ci/environment-dev.yaml
  2. Include a rename in scripts/convert_deps.py from hypothesis-python to hypothesis
  3. Run python scripts/convert_deps.py and stage / commit the files
  4. Update some of the CI files to include hypothesis-python
    a. circle-27-compat.yaml
    b. circle-36-locale.yaml
    c. travis-27.yaml
    c. travis-35-osx.yaml
    d. travis-37.yaml
    e. appveyor-27.yaml
    f. appveyor-36.yaml

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 13, 2018

Hmm, I've just checked and I don't think the name is the issue after all (and I can't find a package called hypothesis-python on conda-forge at all).

And it's currently listed in ci/environment-dev.yaml - do you mean moving it to ci/requirements-optional-conda.txt?

@TomAugspurger
Copy link
Contributor

Hmm, I've just checked and I don't think the name is the issue after all

Ah, sorry I misread the travis output. Ignore that.

And it's currently listed in ci/environment-dev.yaml

That's perfect. Could you also run python scripts/convert_deps.py and include the output (for our contributors using pip).

import pandas as pd

from pandas.tseries.offsets import (
Hour, Minute, Second, Milli, Micro, Nano,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these already exist as fixtures, i'd like to avoid specifically importing things like this and use the fixture / generating fixture instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a non-trivial trade-off! Because we can't call fixtures outside of a test, we'll still need to import these types to register the strategy for each of them in conftest.py (see next comment).

In the test file, we have two options:

  1. Import them, and let Hypothesis decide which to try.
  2. Use a fixture which creates a test for each, and have Hypothesis try cases for each independently.

(1) is faster at runtime because Hypothesis is running one instead of n tests, while (2) is more explicit if a little less elegant. Your call.

QuarterBegin, QuarterEnd, BQuarterBegin, BQuarterEnd,
YearBegin, YearEnd, BYearBegin, BYearEnd]

# ----------------------------------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the problem I had with the original PR. There is a lot of code to create the things that need testing (e.g. the case generating statements).

Putting all of this inside a specific test file makes this very hard to re-use / document. Is there a standard like a conftest.py where these typically exist? We can create infrastructure to do this, but a) want to share as much of this code as possible, and b) put it in standard, well-defined, easy to discover locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Global setup using register_type_strategy could easily be moved to conftest.py (heuristic: "equivalent to defining fixtures"), with a comment at each end explaining where the other part is.

  • For the gen_* variables, I'd actually leave them right where they are - they look fairly specialised to these tests, and strategies would be idiomatically constructed within the call to @given if they were shorter or unique.
    Common bits can be extracted out as they get identified, but TBH we'd love to supply anything useful downstream in hypothesis.extra.pandas so some of that will hopefully vanish again 😉

@jbrockmendel
Copy link
Member

@Zac-HD Is there anything else from #18761 worth trying to salvage?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 14, 2018

@jbrockmendel - it looks like you had some neat ideas for further tests of e.g. timedeltas, but I thought it would be better for me to stick to the groundwork and leave them for a follow-up PR that could do them justice.

If you or anyone else would like to work on that, I'd be happy to advise on Hypothesis idioms (a standing offer for any open source project).

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 15, 2018

@jreback - I've implemented all the changes from your review. Is there anything else I need to do before this can be merged?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 18, 2018

Ping @TomAugspurger & @jreback for a final review. It would be great to iterate on this at the PyCon AU sprints if it can be merged by next weekend, so if there's anything left to do I would really like to hear about it soon!

# ----------------------------------------------------------------
# Global setup for tests using Hypothesis

from hypothesis import strategies as st
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will put a hard dependency on hypothesis for testing. Are we OK with that? After some thought, I think it's fine. It's a well-maintained project, and working around it in the test suite seems silly.

If we're ok with that, then @Zac-HD could you update

  • pandas/util/_tester.py to have a nice message if either pytest or hypothesis is missing?
  • pandas/ci/check_imports.py to ensure hypothesis is not imported with the main import pandas?
  • doc/source/whatsnew/0.24.0.txt with a small subsection saying hypothesis is required for running the tests (with a link to the hypothesis docs :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read of the reviews so far is that @jreback was in favor of a mandatory dependency (also my recommendation), and you're now in favor too.

I've therefore made the relevant changes and it's all ready to go 🎉

(though one build on Travis has errored out, the tests passed until the timeout)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, I still think we need to a) remove hypothesis from 1 build (the same one we have removed moto from is good). and use pyimportor.skip('hypthoesis'). The reason is not for our CI really, rather so when a user does pd.test() is doesn't fail, rather it will just skip those tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback there are two problems with making Hypothesis optional for pd.test():

  1. It makes adding further Hypothesis tests - eg for serialisation round-trips, timedeltas, or reshaping logic - much harder. They'd have to be in separate files, guard any global setup and configuration, handle import-or-skips, etc.
  2. It forces us to choose to either duplicate tests, or skip them at runtime.

That doesn't make it completely unreasonable, I'd prefer to just have the dependency - and I've been using Pandas for much longer than Hypothesis!

TLDR - what's wrong with putting Hypothesis in the same category as pytest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moto is a bit different, since it's relatively unimportant to mainline pandas, and so is easy to work around.

IMO, hypothesis should be treated the same as pytest.

@TomAugspurger
Copy link
Contributor

Merged master. The travis timeout should be fixed now.

@jreback
Copy link
Contributor

jreback commented Aug 20, 2018

let me have a look

# ----------------------------------------------------------------
# Global setup for tests using Hypothesis

from hypothesis import strategies as st
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, I still think we need to a) remove hypothesis from 1 build (the same one we have removed moto from is good). and use pyimportor.skip('hypthoesis'). The reason is not for our CI really, rather so when a user does pd.test() is doesn't fail, rather it will just skip those tests.

@jreback jreback mentioned this pull request Aug 20, 2018
4 tasks
@jreback jreback added this to the 0.24.0 milestone Aug 20, 2018
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 23, 2018

@jreback my and Tom's view is that we're best off with Hypothesis as a hard dependency to run tests. Can you live with this and give an approving review? If not, what problems do you see?

(if possible I'd really like to merge this Friday or Saturday so I can talk about it at PyCon Australia and get some follow-up work done at the sprints)

@jreback
Copy link
Contributor

jreback commented Aug 23, 2018

it still can be a hard dependency for our testing

but as i said users code will immediately fail if they only have pytest

for user code that’s all that we can / should require

I am not sure why hisbis a problem
you simply put a skip if in the conftest itself
if it’s not installed

@TomAugspurger
Copy link
Contributor

What's the issue with having it as a required test dependency? Having to write tests around that constraint seems unnecessarily burdensome on contributors / maintainers.

@jreback
Copy link
Contributor

jreback commented Aug 23, 2018

I guess its ok to add as a dep for testing. can you update the install.rst as well.

@Zac-HD Zac-HD force-pushed the hypothesis branch 2 times, most recently from 7c86fac to 835f352 Compare August 24, 2018 03:56
@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 24, 2018

@TomAugspurger - I'm quite certain these build issues are unrelated to the pull. Any idea on how to make it green?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 24, 2018 via email

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 24, 2018

🎉 - it's working! @TomAugspurger, if you (or anyone else!) can hit 'merge' by, uh, 4am UTC that would be in time for my talk at PyCon Australia 😅

@TomAugspurger TomAugspurger merged commit fa47b8d into pandas-dev:master Aug 25, 2018
@TomAugspurger
Copy link
Contributor

Thanks!

@h-vetinari
Copy link
Contributor

h-vetinari commented Aug 27, 2018

After rebasing #22236, I'm getting a failure related to this PR in the circleci/py36_locale job (https://circleci.com/gh/pandas-dev/pandas/18093):

=================================== FAILURES ===================================
___________________________ test_tick_add_sub[Nano] ____________________________

cls = <class 'pandas.tseries.offsets.Nano'>

    @pytest.mark.parametrize('cls', tick_classes)
>   @example(n=2, m=3)
    @example(n=800, m=300)
    @example(n=1000, m=5)
    @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
    def test_tick_add_sub(cls, n, m):
E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 2 valid examples in 1.09 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

pandas/tests/tseries/offsets/test_ticks.py:40: FailedHealthCheck
---------------------------------- Hypothesis ----------------------------------
You can add @seed(183585162026757523236170973153487462479) to this test or run pytest with --hypothesis-seed=183585162026757523236170973153487462479 to reproduce this failure.

@h-vetinari
Copy link
Contributor

And again, this time in another test, but same build job (https://circleci.com/gh/pandas-dev/pandas/18150):

=================================== FAILURES ===================================
___________________________ test_tick_add_sub[Milli] ___________________________

cls = <class 'pandas.tseries.offsets.Milli'>

    @pytest.mark.parametrize('cls', tick_classes)
>   @example(n=2, m=3)
    @example(n=800, m=300)
    @example(n=1000, m=5)
    @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
    def test_tick_add_sub(cls, n, m):
E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 2 valid examples in 1.14 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

pandas/tests/tseries/offsets/test_ticks.py:40: FailedHealthCheck
---------------------------------- Hypothesis ----------------------------------
You can add @seed(188896409658629737425924686489321258929) to this test or run pytest with --hypothesis-seed=188896409658629737425924686489321258929 to reproduce this failure.

I think this healthcheck is too restrictive. If the server is busy for that single second that the samples are generated, it might already be triggered... The error message already mentions how to disable it, maybe this should be considered as a follow-up...?

@Zac-HD
Copy link
Contributor Author

Zac-HD commented Aug 28, 2018

Ah, looks like a CI clock issue. I'll open a follow-up PR in the next day with the following in conftest.py:

from hypothesis import settings, HealthCheck

settings.default.suppress_health_check = (HealthCheck.TooSlow,)
                                       # HealthCheck.all() to disable all health checks

@topper-123 topper-123 mentioned this pull request Sep 8, 2018
3 tasks
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
* BLD: Add Hypothesis to build system

* TST: Add Hypothesis tests for ticks, offsets

These tests are derived from GH18761, by jbrockmendel

Co-authored-by: jbrockmendel <[email protected]>

* DOC: Explain Hypothesis in contributing guide

* TST: remove pointless loop

* TST: Improve integration of Hypothesis

Responding to review from jreback on GH22280.

* Final review fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate using Hypothesis for some tests
6 participants