Skip to content

TST: Hypothesis may draw a date outside of date_range's range #24242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
topper-123 opened this issue Dec 11, 2018 · 3 comments · Fixed by #24307
Closed

TST: Hypothesis may draw a date outside of date_range's range #24242

topper-123 opened this issue Dec 11, 2018 · 3 comments · Fixed by #24307
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@topper-123
Copy link
Contributor

topper-123 commented Dec 11, 2018

I've had an error with hypothesis when running. The bug can be reproduced (I think?, I've not used hypothesis before, so maybe this hash is machine-dependent) by doing this pytest pandas/tests/frame/test_apply.py::TestDataFrameAggregate::test_frequency_is_original --hypothesis-seed=316520601087970019200180394352582921839

The error message is this:

2018-12-06T00:54:48.7685364Z 
2018-12-06T00:54:48.7686382Z =================================== FAILURES ===================================
2018-12-06T00:54:48.7688390Z ______________ TestDataFrameAggregate.test_frequency_is_original _______________
2018-12-06T00:54:48.7689359Z [gw0] linux -- Python 3.7.0 /home/vsts/miniconda3/envs/pandas-dev/bin/python
2018-12-06T00:54:48.7689655Z 
2018-12-06T00:54:48.7689936Z self = <pandas.tests.frame.test_apply.TestDataFrameAggregate object at 0x7ff32a5c3320>
2018-12-06T00:54:48.7690172Z 
2018-12-06T00:54:48.7690377Z     @given(index=indices(max_length=5), num_columns=integers(0, 5))
2018-12-06T00:54:48.7690609Z >   @settings(deadline=1000)
2018-12-06T00:54:48.7694033Z     def test_frequency_is_original(self, index, num_columns):
2018-12-06T00:54:48.7694602Z 
2018-12-06T00:54:48.7694893Z pandas/tests/frame/test_apply.py:1160: 
2018-12-06T00:54:48.7695224Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2018-12-06T00:54:48.7695479Z pandas/tests/frame/test_apply.py:836: in indices
2018-12-06T00:54:48.7695755Z     dr = date_range(date, periods=periods, freq=freq)
2018-12-06T00:54:48.7696003Z pandas/core/indexes/datetimes.py:1479: in date_range
2018-12-06T00:54:48.7696239Z     closed=closed, **kwargs)
2018-12-06T00:54:48.7696669Z pandas/core/arrays/datetimes.py:293: in _generate_range
2018-12-06T00:54:48.7696879Z     index = _generate_regular_range(cls, start, end, periods, freq)
2018-12-06T00:54:48.7697124Z pandas/core/arrays/datetimes.py:1716: in _generate_regular_range
2018-12-06T00:54:48.7697338Z     values = np.array([x.value for x in xdr], dtype=np.int64)
2018-12-06T00:54:48.7697545Z pandas/core/arrays/datetimes.py:1716: in <listcomp>
2018-12-06T00:54:48.7697786Z     values = np.array([x.value for x in xdr], dtype=np.int64)
2018-12-06T00:54:48.7698865Z pandas/tseries/offsets.py:2508: in generate_range
2018-12-06T00:54:48.7699470Z     end = start + (periods - 1) * offset
2018-12-06T00:54:48.7700435Z pandas/_libs/tslibs/offsets.pyx:489: in pandas._libs.tslibs.offsets.BaseOffset.__radd__
2018-12-06T00:54:48.7700701Z     return self.__add__(other)
2018-12-06T00:54:48.7700985Z pandas/_libs/tslibs/offsets.pyx:362: in pandas._libs.tslibs.offsets._BaseOffset.__add__
2018-12-06T00:54:48.7701242Z     return self.apply(other)
2018-12-06T00:54:48.7701830Z pandas/tseries/offsets.py:69: in wrapper
2018-12-06T00:54:48.7702065Z     result = func(self, other)
2018-12-06T00:54:48.7702488Z pandas/tseries/offsets.py:527: in apply
2018-12-06T00:54:48.7702884Z     result = other + timedelta(days=7 * weeks + days)
2018-12-06T00:54:48.7703610Z pandas/_libs/tslibs/timestamps.pyx:355: in pandas._libs.tslibs.timestamps._Timestamp.__add__
2018-12-06T00:54:48.7703892Z     result = Timestamp(self.value + nanos,
2018-12-06T00:54:48.7704154Z pandas/_libs/tslibs/timestamps.pyx:736: in pandas._libs.tslibs.timestamps.Timestamp.__new__
2018-12-06T00:54:48.7704430Z     ts = convert_to_tsobject(ts_input, tz, unit, 0, 0, nanosecond or 0)
2018-12-06T00:54:48.7704699Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2018-12-06T00:54:48.7704909Z 
2018-12-06T00:54:48.7705172Z >   obj.value = ts
2018-12-06T00:54:48.7705407Z E   OverflowError: Python int too large to convert to C long
2018-12-06T00:54:48.7705613Z 
2018-12-06T00:54:48.7705877Z pandas/_libs/tslibs/conversion.pyx:297: OverflowError
2018-12-06T00:54:48.7706477Z ---------------------------------- Hypothesis ----------------------------------
2018-12-06T00:54:48.7707486Z You can add @seed(316520601087970019200180394352582921839) to this test or run pytest with --hypothesis-seed=316520601087970019200180394352582921839 to reproduce this failure.
2018-12-06T00:54:48.8610169Z

Reproducing the error

The error stems from a function in pandas/tests/frame/test_apply.py

@composite
def indices(draw, max_length=5):
    date = draw(
        dates(
            min_value=Timestamp.min.ceil("D").to_pydatetime().date(),
            max_value=Timestamp.max.floor("D").to_pydatetime().date(),
        ).map(Timestamp)
    )
    periods = draw(integers(0, max_length))
    freq = draw(sampled_from(list("BDHTS")))
    dr = date_range(date, periods=periods, freq=freq)
    return pd.DatetimeIndex(list(dr))

This function above is used by hypothesis. It causes a failure when calling date_range when date = Timestamp.max.floor("D").to_pydatetime().date() and freq in {'B', 'D'}

For example:

>>> date = pd.Timestamp.max.floor("D").to_pydatetime().date()  # datetime.date(2262, 4, 11)
>>> freq = 'B'
>>> pd.date_range(date, periods=1, freq=freq)
OverflowError: int too big to convert
>>> freq = 'D'
>>> pd.date_range(date, periods=1, freq=freq)
OutOfBoundsDatetime: Cannot generate range with start=9223286400000000000 and periods=1

You'll notice the error types are different for the two cases. Presumably the first example should have returned a OutOfBoundsDatetime also.

@mroeschke
Copy link
Member

IIRC we were just going to replace this hypothesis test a more simple test #23849 (comment).

However, it looks like you also discovered another issue as well. I agree that overflowing dates with 'D' or 'B' with the same error. Could you open up a new issue about that issue?

@mroeschke mroeschke added the Testing pandas testing functions or related to the test suite label Dec 12, 2018
@topper-123
Copy link
Contributor Author

Ok, Done. I'll keep this one open until the issued in the comment in #23849 is closed.

@alimcmaster1
Copy link
Member

Will submit a PR for the above mentioned #23849 today ( apologies for the delay )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants