Skip to content

Add more overflow tests for timedelta64 operations #46854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 33 commits into from
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f0b34b0
add tests for existing behavior
patrickmckenna Apr 22, 2022
287ca88
note spurious overflow for single elem td64 series
patrickmckenna Apr 23, 2022
992ca95
finer-grained testing for td64 sum overflow errors
patrickmckenna Apr 23, 2022
1ea03e6
styling, ex msg fixes
patrickmckenna Apr 24, 2022
2868200
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna Apr 24, 2022
eb1f61b
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna Apr 24, 2022
ac91c58
consolidate, parameterize td64 addition overflow tests
patrickmckenna Apr 25, 2022
b8e4a51
add scalar multiplication tests
patrickmckenna Apr 25, 2022
4c72f1e
add tests for scalar multiplication
patrickmckenna Apr 25, 2022
5ef0a48
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna Apr 25, 2022
aeef81c
mypy, win38 fixes
patrickmckenna Apr 25, 2022
438339d
use box_expected where possible
patrickmckenna Apr 25, 2022
e86d0df
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna Apr 25, 2022
bd62e08
consolidate all new, some old td64 overflow tests in new module
patrickmckenna Apr 26, 2022
1424dee
remove hypothesis overuse
patrickmckenna Apr 26, 2022
3369bbc
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna Apr 27, 2022
6c3b482
address PR code style feedback
patrickmckenna Apr 27, 2022
c14d0c8
put new tests back in pre-existing modules
patrickmckenna Apr 27, 2022
f2e0ba4
dedupe test setup helpers
patrickmckenna Apr 27, 2022
3f89526
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna Apr 27, 2022
e03a902
obey the linting gods
patrickmckenna Apr 28, 2022
6ca60f5
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna Apr 28, 2022
15ef834
adjust for platform/env-specific overflow behavior
patrickmckenna Apr 28, 2022
d4d2160
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna May 3, 2022
13d198e
use newer type hint syntax
patrickmckenna May 3, 2022
b37aa4d
update tests to reflect recent changes
patrickmckenna May 3, 2022
cd59647
remove scalar-box tests, use existing fixtures
patrickmckenna May 3, 2022
05f4137
DRY up some td64 tests
patrickmckenna May 3, 2022
7a6a8cf
platform-specific fixes
patrickmckenna May 3, 2022
7ce2082
Merge branch 'main' into td64-sums
patrickmckenna May 11, 2022
24fbc09
Merge branch 'main' into td64-sums
patrickmckenna May 12, 2022
d6069d3
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna May 13, 2022
2f24ec7
Merge remote-tracking branch 'upstream/main' into td64-sums
patrickmckenna May 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 212 additions & 24 deletions pandas/tests/arithmetic/test_timedelta64.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,20 @@
datetime,
timedelta,
)
from itertools import (
chain,
combinations_with_replacement,
product,
)
from operator import attrgetter
from typing import (
NamedTuple,
Type,
Union,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use type and |?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish, that's a 3.10 feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we do from __future__ import annotations? im sure we use pipes elsewhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I'd tried that and switching to pipes, and it was failing for type aliases like TD64_BOX_TYPE = TimedeltaArray | TimedeltaIndex | Series | DataFrame. But TIL: pipes will work for function annotations (with that import).

type does work. I'll switch to that, and replace the aliases with pipes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I'd tried that and switching to pipes, and it was failing for type aliases like TD64_BOX_TYPE = TimedeltaArray | TimedeltaIndex | Series | DataFrame. But TIL: pipes will work for function annotations (with that import).

type does work. I'll switch to that, and replace the aliases with pipes.

)

from hypothesis import given
import hypothesis.strategies as st
import numpy as np
import pytest

Expand All @@ -31,12 +44,87 @@
Int64Index,
UInt64Index,
)
from pandas.core.arrays import (
DatetimeArray,
TimedeltaArray,
)
from pandas.tests.arithmetic.common import (
assert_invalid_addsub_type,
assert_invalid_comparison,
get_upcast_box,
)

timedelta_types = (Timedelta, TimedeltaArray, TimedeltaIndex, Series, DataFrame)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the newly-added overflow tests are parameterized over Timedelta scalar and container types, on the assumption that similar exceptions should be raised for all. (The fact that they're not can be confusing, as mentioned in #43178.) Is that assumption accurate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that assumption is accurate

id prefer to keep the Timedelta scalar tests in tests.scalar.timedelta, then use the box_with_array fixture for the non-scalar tests. part of the virtue of this is that when writing the non-scalar tests, we assume that the scalar behavior is correct and tested.

timestamp_types = (Timestamp, DatetimeArray, DatetimeIndex, Series, DataFrame)
containers = slice(1, None)
get_item_names = lambda t: "-".join(map(attrgetter("__name__"), t))


class BinaryOpTypes(NamedTuple):
"""
The expected operand and result types for a binary operation.
"""

left: Type
right: Type
result: Type

def __str__(self) -> str:
return get_item_names(self)

def __repr__(self) -> str:
return f"BinaryOpTypes({self})"


positive_tds = st.integers(min_value=1, max_value=Timedelta.max.value).map(Timedelta)

xfail_no_overflow_check = pytest.mark.xfail(reason="No overflow check")


@pytest.fixture(
name="add_sub_types",
scope="module",
params=tuple(combinations_with_replacement(timedelta_types, 2)),
ids=get_item_names,
)
def fixture_add_sub_types(request: pytest.FixtureRequest) -> BinaryOpTypes:
"""
Expected types when adding, subtracting Timedeltas.
"""
return_type = max(request.param, key=lambda t: timedelta_types.index(t))
return BinaryOpTypes(*request.param, return_type)


@pytest.fixture(
name="ts_add_sub_types",
scope="module",
params=tuple(product(timedelta_types, timestamp_types)),
ids=get_item_names,
)
def fixture_ts_add_sub_types(request: pytest.FixtureRequest) -> BinaryOpTypes:
"""
Expected types when adding, subtracting Timedeltas and Timestamps.
"""
type_hierarchy = {
name: i
for i, name in chain(enumerate(timedelta_types), enumerate(timestamp_types))
}
return_type = timestamp_types[max(type_hierarchy[t] for t in request.param)]

return BinaryOpTypes(*request.param, return_type)


def wrap_value(value: Union[Timestamp, Timedelta], type_):
"""
Return value wrapped in a container of given type_, or as-is if type_ is a scalar.
"""
if type_ in (Timedelta, Timestamp):
return value
elif type_ is DataFrame:
return Series(value).to_frame()
else:
return type_(pd.array([value]))


def assert_dtype(obj, expected_dtype):
"""
Expand Down Expand Up @@ -275,6 +363,130 @@ def test_comparisons_coverage(self):
# Timedelta64[ns] dtype Arithmetic Operations


@given(positive_td=positive_tds)
def test_add_raises_expected_error_if_result_would_overflow(
add_sub_types: BinaryOpTypes,
positive_td: Timedelta,
):
left = wrap_value(Timedelta.max, add_sub_types.left)
right = wrap_value(positive_td, add_sub_types.right)

if add_sub_types.result is Timedelta:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be raising a OutOfBoundsTimedelta in all cases? Ref: #34448

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, yes

msg = "|".join(
[
"int too big to convert",
"Python int too large to convert to C long",
]
)
else:
msg = "Overflow in int64 addition"

with pytest.raises(OverflowError, match=msg):
left + right

with pytest.raises(OverflowError, match=msg):
right + left


@xfail_no_overflow_check
@given(positive_td=positive_tds)
def test_sub_raises_expected_error_if_result_would_overflow(
add_sub_types: BinaryOpTypes,
positive_td: Timedelta,
):
left = wrap_value(Timedelta.min, add_sub_types.left)
right = wrap_value(positive_td, add_sub_types.right)

msg = "Overflow in int64 addition"
with pytest.raises(OverflowError, match=msg):
left - right

with pytest.raises(OverflowError, match=msg):
(-1 * right) - abs(left)


@given(td_value=positive_tds)
def test_add_timestamp_raises_expected_error_if_result_would_overflow(
ts_add_sub_types: BinaryOpTypes,
td_value: Timedelta,
):
left = wrap_value(td_value, ts_add_sub_types.left)
right = wrap_value(Timestamp.max, ts_add_sub_types.right)

if ts_add_sub_types.result is Timestamp:
ex = OutOfBoundsDatetime
msg = "Out of bounds nanosecond timestamp"
else:
ex = OverflowError
msg = "Overflow in int64 addition"

with pytest.raises(ex, match=msg):
left + right

with pytest.raises(ex, match=msg):
right + left


@xfail_no_overflow_check
@given(td_value=positive_tds)
def test_sub_timestamp_raises_expected_error_if_result_would_overflow(
ts_add_sub_types: BinaryOpTypes,
td_value: Timedelta,
):
right = wrap_value(td_value, ts_add_sub_types[0])
left = wrap_value(Timestamp.min, ts_add_sub_types[1])

if ts_add_sub_types.result is Timestamp:
ex = OutOfBoundsDatetime
msg = "Out of bounds nanosecond timestamp"
else:
ex = OverflowError
msg = "Overflow in int64 addition"

with pytest.raises(ex, match=msg):
left - right


@given(value=st.floats().filter(lambda f: abs(f) > 1))
def test_scalar_multiplication_raises_expected_error_if_result_would_overflow(
value: float,
):
td = Timedelta.max

msg = "|".join(
[
"cannot convert float infinity to integer",
"Python int too large to convert to C long",
]
)
with pytest.raises(OverflowError, match=msg):
td * value

with pytest.raises(OverflowError, match=msg):
value * td


@xfail_no_overflow_check
@given(value=st.floats().filter(lambda f: abs(f) > 1))
@pytest.mark.parametrize(
argnames="td_type",
argvalues=timedelta_types[containers],
ids=attrgetter("__name__"),
)
def test_container_scalar_multiplication_raises_expected_error_if_result_would_overflow(
value: float,
td_type: Type,
):
td = wrap_value(Timedelta.max, td_type)

msg = "Overflow in int64 addition"
with pytest.raises(OverflowError, match=msg):
td * value

with pytest.raises(OverflowError, match=msg):
value * td


class TestTimedelta64ArithmeticUnsorted:
# Tests moved from type-specific test files but not
# yet sorted/parametrized/de-duplicated
Expand Down Expand Up @@ -687,30 +899,6 @@ def test_tdarr_add_timestamp_nat_masking(self, box_with_array, str_ts):
assert res[1] is NaT

def test_tdi_add_overflow(self):
# See GH#14068
# preliminary test scalar analogue of vectorized tests below
# TODO: Make raised error message more informative and test
with pytest.raises(OutOfBoundsDatetime, match="10155196800000000000"):
pd.to_timedelta(106580, "D") + Timestamp("2000")
with pytest.raises(OutOfBoundsDatetime, match="10155196800000000000"):
Timestamp("2000") + pd.to_timedelta(106580, "D")

_NaT = NaT.value + 1
msg = "Overflow in int64 addition"
with pytest.raises(OverflowError, match=msg):
pd.to_timedelta([106580], "D") + Timestamp("2000")
with pytest.raises(OverflowError, match=msg):
Timestamp("2000") + pd.to_timedelta([106580], "D")
with pytest.raises(OverflowError, match=msg):
pd.to_timedelta([_NaT]) - Timedelta("1 days")
with pytest.raises(OverflowError, match=msg):
pd.to_timedelta(["5 days", _NaT]) - Timedelta("1 days")
with pytest.raises(OverflowError, match=msg):
(
pd.to_timedelta([_NaT, "5 days", "1 hours"])
- pd.to_timedelta(["7 seconds", _NaT, "4 hours"])
)

# These should not overflow!
exp = TimedeltaIndex([NaT])
result = pd.to_timedelta([NaT]) - Timedelta("1 days")
Expand Down
80 changes: 62 additions & 18 deletions pandas/tests/series/test_reductions.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from hypothesis import given
import hypothesis.strategies as st
import numpy as np
import pytest

Expand Down Expand Up @@ -51,29 +53,71 @@ def test_td64_sum_empty(skipna):
assert result == pd.Timedelta(0)


def test_td64_summation_overflow():
# GH#9442
ser = Series(pd.date_range("20130101", periods=100000, freq="H"))
ser[0] += pd.Timedelta("1s 1ms")
@given(
st.integers(
min_value=0,
max_value=10 ** (np.finfo(np.float64).precision),
).map(pd.Timedelta)
)
def test_td64_summation_retains_ns_precision_over_expected_range(value: pd.Timedelta):
result = Series(value).sum()

# mean
result = (ser - ser.min()).mean()
expected = pd.Timedelta((pd.TimedeltaIndex(ser - ser.min()).asi8 / len(ser)).sum())
assert result == value

# the computation is converted to float so
# might be some loss of precision
assert np.allclose(result.value / 1000, expected.value / 1000)

# sum
msg = "overflow in timedelta operation"
with pytest.raises(ValueError, match=msg):
(ser - ser.min()).sum()
@given(
st.integers(
min_value=10 ** (np.finfo(np.float64).precision),
max_value=pd.Timedelta.max.value - 2**9,
)
.filter(lambda i: int(np.float64(i)) != i)
.map(pd.Timedelta)
)
def test_td64_summation_loses_ns_precision_if_float_conversion_rounds(
value: pd.Timedelta,
):
result = Series(value).sum()

assert result != value


@given(
st.integers(
min_value=pd.Timedelta.max.value - 2**9 + 1,
max_value=pd.Timedelta.max.value,
).map(pd.Timedelta)
)
def test_td64_summation_raises_spurious_overflow_error_for_single_elem_series(
value: pd.Timedelta,
):
s = Series(value)

s1 = ser[0:10000]
msg = "int too big to convert|Python int too large to convert to C long"
with pytest.raises(OverflowError, match=msg):
s.sum()


@given(st.integers(min_value=1, max_value=2**10).map(pd.Timedelta))
def test_td64_summation_raises_overflow_error_for_small_overflows(value: pd.Timedelta):
s = Series([pd.Timedelta.max, value])

msg = "int too big to convert|Python int too large to convert to C long"
with pytest.raises(OverflowError, match=msg):
s.sum()


@given(
st.integers(
min_value=2**10 + 1,
max_value=pd.Timedelta.max.value,
).map(pd.Timedelta)
)
def test_td64_summation_raises_value_error_for_most_overflows(value: pd.Timedelta):
s = Series([pd.Timedelta.max, value])

msg = "overflow in timedelta operation"
with pytest.raises(ValueError, match=msg):
(s1 - s1.min()).sum()
s2 = ser[0:1000]
(s2 - s2.min()).sum()
s.sum()


def test_prod_numpy16_bug():
Expand Down