Skip to content

implement Timedelta mod, divmod, rmod, rdivmod, fix and test scalar methods #19365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,7 @@ Datetimelike
- Bug in ``.astype()`` to non-ns timedelta units would hold the incorrect dtype (:issue:`19176`, :issue:`19223`, :issue:`12425`)
- Bug in subtracting :class:`Series` from ``NaT`` incorrectly returning ``NaT`` (:issue:`19158`)
- Bug in :func:`Series.truncate` which raises ``TypeError`` with a monotonic ``PeriodIndex`` (:issue:`17717`)
- Bug :func:`Timedelta.__mod__`, :func:`Timedelta.__divmod__` where operating with timedelta-like or numeric arguments would incorrectly raise ``TypeError`` (:issue:`19365`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this in to a sub-section and give a couple of examples. also add a small example in the timedelta docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it in api changes section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this in to a sub-section and give a couple of examples.

Sure. For the ipython code blocks, do I copy/paste the traceback?


Timezones
^^^^^^^^^
Expand Down
47 changes: 43 additions & 4 deletions pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -492,7 +492,14 @@ def _binary_op_method_timedeltalike(op, name):
if other.dtype.kind not in ['m', 'M']:
# raise rathering than letting numpy return wrong answer
return NotImplemented
return op(self.to_timedelta64(), other)
result = op(self.to_timedelta64(), other)
if other.ndim == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be cleaner to do elif hasattr(other, 'dtype') and other.ndim > 0, and then let the scalar numpy ones take the route of Timedelta below ? (the other = Timedelta(other) some lines below will work for a np.timedelta64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be about a wash because we also need to handle the case where other is a 0-dim datetime64 array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that already catched by the is_datetime64_object(other) above? (not sure what the method exactly does, but from the name I would expect that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And from this it seems it is indeed already working correctly:

In [163]: pd.Timedelta(1) + np.datetime64(1000, 'ns')
Out[163]: Timestamp('1970-01-01 00:00:00.000001001')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah there's some ambiguity there. The 0-dim case to watch out for is (under master, fixed in the PR):

>>> pd.Timedelta(1) + np.array('2016-01-02 03:04:05', dtype='datetime64[ns]')
numpy.datetime64('2016-01-02T03:04:05.000000001')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but not sure that should be fixed?

I could see a reasonable argument that the zero-dim array op should return a zero-dim array (i.e. not a Timedelta like the PR changes it to), but given that it returns a scalar, I think the consistent thing to do is always return a Timestamp/Timedelta.

not sure we should add that much extra lines to just deal with numpy scalars vs numpy 0-dim arrays

As long as we a) aren't making spaghetti and b) are testing these new corner cases, I don't see much downside to fixing these. If there were some kind of LOC budget I'd agree with you that this would be a pretty low priority. (and if it will get this merged I'll revert this part, since this is blocking fixes to Index bugs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. (and if it will get this merged I'll revert this part, since this is blocking fixes to Index bugs)

Can you explain why this part would be reverted again, but is needed to be first merged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor use of pronouns on my part. "This part" refers to the zero-dimensional arrays, and is not the bug that originally motivated this PR, but was found in the process of writing tests for this PR. "this is blocking" referred to the divmod/mod and 1-d array ops, which I need to have here before I can fix e.g. pd.Index([0, 1, 2]) * pd.Timedelta(days=1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any test you added that would catch this 0-dim array case. Would it be ok to just leave that out of this PR, and only fix the numpy timedelta64 scalar case?

We can discuss how many lines of code are added and whether that is worth it, but checking the result type and the potentially converting the result, does add to the code complexity. I would rather add a is_timedelta64_object(other) check to explicitly use the normal timedelta path for those.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm amenable to this suggestion, will update in a bit.

FYI is_timedelta64_object(obj) is equivalent to isinstance(obj, np.timedelta64). It will not catch arrays with timedelta64-dtype.

if other.dtype.kind == 'm':
return Timedelta(result)
if other.dtype.kind == 'M':
from ..tslib import Timestamp
return Timestamp(result)
return result

elif not _validate_ops_compat(other):
return NotImplemented
Expand Down Expand Up @@ -1046,7 +1053,10 @@ class Timedelta(_Timedelta):
def __mul__(self, other):
if hasattr(other, 'dtype'):
# ndarray-like
return other * self.to_timedelta64()
result = other * self.to_timedelta64()
if other.ndim == 0:
return Timedelta(result)
return result

elif other is NaT:
return NaT
Expand All @@ -1061,7 +1071,10 @@ class Timedelta(_Timedelta):

def __truediv__(self, other):
if hasattr(other, 'dtype'):
return self.to_timedelta64() / other
result = self.to_timedelta64() / other
if other.ndim == 0 and result.dtype.kind == 'm':
return Timedelta(result)
return result

elif is_integer_object(other) or is_float_object(other):
# integers or floats
Expand All @@ -1077,7 +1090,10 @@ class Timedelta(_Timedelta):

def __rtruediv__(self, other):
if hasattr(other, 'dtype'):
return other / self.to_timedelta64()
result = other / self.to_timedelta64()
if other.ndim == 0 and result.dtype.kind == 'm':
return Timedelta(result)
return result

elif not _validate_ops_compat(other):
return NotImplemented
Expand All @@ -1096,6 +1112,9 @@ class Timedelta(_Timedelta):
# just defer
if hasattr(other, '_typ'):
# Series, DataFrame, ...
if other._typ == 'dateoffset' and hasattr(other, 'delta'):
# Tick offset
return self // other.delta
return NotImplemented

if hasattr(other, 'dtype'):
Expand Down Expand Up @@ -1128,6 +1147,9 @@ class Timedelta(_Timedelta):
# just defer
if hasattr(other, '_typ'):
# Series, DataFrame, ...
if other._typ == 'dateoffset' and hasattr(other, 'delta'):
# Tick offset
return other.delta // self
return NotImplemented

if hasattr(other, 'dtype'):
Expand All @@ -1149,6 +1171,23 @@ class Timedelta(_Timedelta):
return np.nan
return other.value // self.value

def __mod__(self, other):
# Naive implementation, room for optimization
return self.__divmod__(other)[1]

def __rmod__(self, other):
# Naive implementation, room for optimization
return self.__rdivmod__(other)[1]

def __divmod__(self, other):
# Naive implementation, room for optimization
div = self // other
return div, self - div * other

def __rdivmod__(self, other):
div = other // self
return div, other - div * self


cdef _floordiv(int64_t value, right):
return value // right
Expand Down
184 changes: 183 additions & 1 deletion pandas/tests/scalar/test_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import pytest

import numpy as np
from datetime import timedelta
from datetime import datetime, timedelta

import pandas as pd
import pandas.util.testing as tm
Expand Down Expand Up @@ -128,6 +128,59 @@ def test_unary_ops(self):
assert abs(-td) == td
assert abs(-td) == Timedelta('10d')

def test_mul(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference your PR number under all of these added tests.

# GH#19365
td = Timedelta(minutes=3)

result = td * 2
assert result == Timedelta(minutes=6)

result = td * np.int64(1)
assert isinstance(result, Timedelta)
assert result == td

result = td * 1.5
assert result == Timedelta(minutes=4, seconds=30)

result = td * np.array([3, 4], dtype='int64')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why this is, but I don't really like it. It would be much more friendly to return a TDI, maybe too magical though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah I considered that too, came to the same conclusion. It also would be troublesome if the other array wasn't 1D.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it easy to just return a TDI? this is very unituitive

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you haven't answered my question here. this needs to return a TDI or raise TypeError. return a m8 array is simply not useful.

expected = np.array([9, 12], dtype='m8[m]').astype('m8[ns]')
tm.assert_numpy_array_equal(result, expected)

with pytest.raises(TypeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the error message?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment on why this is prohibiited, any other ops not allowed?

td * pd.Timestamp(2016, 1, 2)

def test_add_datetimelike(self):
# GH#19365
td = Timedelta(10, unit='d')

result = td + datetime(2016, 1, 1)
assert result == pd.Timestamp(2016, 1, 11)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I KNOW we have tests for this, is there a reason you are duplicating? (rather than moving those to here)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_ops_scalar. I have not problem adding tests, but please don't duplicate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I KNOW we have tests for this, is there a reason you are duplicating? (rather than moving those to here)?

ATM there is NO test for Timedelta(10, unit='d') + np.timedelta64(-4, 'D') (try it; it returns a timedelta64). More generally, there are no add/sub tests in test_timedelta.

result = td + pd.Timestamp('2018-01-12 18:09')
assert result == pd.Timestamp('2018-01-22 18:09')

result = td + np.datetime64('2018-01-12')
assert result == pd.Timestamp('2018-01-22')

def test_add_timedeltalike(self):
td = Timedelta(10, unit='d')

result = td + Timedelta(days=10)
assert isinstance(result, Timedelta)
assert result == Timedelta(days=20)

result = td + timedelta(days=9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test for radd as well here

assert isinstance(result, Timedelta)
assert result == Timedelta(days=19)

result = td + pd.offsets.Hour(6)
assert isinstance(result, Timedelta)
assert result == Timedelta(days=10, hours=6)

result = td + np.timedelta64(-4, 'D')
assert isinstance(result, Timedelta)
assert result == Timedelta(days=6)

def test_binary_ops_nat(self):
td = Timedelta(10, unit='d')

Expand All @@ -138,6 +191,10 @@ def test_binary_ops_nat(self):
assert (td // pd.NaT) is np.nan
assert (td // np.timedelta64('NaT')) is np.nan

# GH#19365
assert td - np.timedelta64('NaT') is pd.NaT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test of the reverse?

assert td + np.timedelta64('NaT') is pd.NaT

def test_binary_ops_integers(self):
td = Timedelta(10, unit='d')

Expand Down Expand Up @@ -173,6 +230,10 @@ def test_floordiv(self):
assert -td // scalar.to_pytimedelta() == -2
assert (2 * td) // scalar.to_timedelta64() == 2

# GH#19365
assert td // pd.offsets.Hour(1) == 3
assert td // pd.offsets.Minute(2) == 92

assert td // np.nan is pd.NaT
assert np.isnan(td // pd.NaT)
assert np.isnan(td // np.timedelta64('NaT'))
Expand Down Expand Up @@ -218,6 +279,8 @@ def test_rfloordiv(self):
assert (-td).__rfloordiv__(scalar.to_pytimedelta()) == -2
assert (2 * td).__rfloordiv__(scalar.to_timedelta64()) == 0

assert pd.offsets.Hour(1) // Timedelta(minutes=25) == 2

assert np.isnan(td.__rfloordiv__(pd.NaT))
assert np.isnan(td.__rfloordiv__(np.timedelta64('NaT')))

Expand Down Expand Up @@ -255,6 +318,125 @@ def test_rfloordiv(self):
with pytest.raises(TypeError):
ser // td

def test_mod(self):
# GH#19365
td = Timedelta(hours=37)

# Timedelta-like others
result = td % Timedelta(hours=6)
assert isinstance(result, Timedelta)
assert result == Timedelta(hours=1)

result = td % timedelta(minutes=60)
assert isinstance(result, Timedelta)
assert result == Timedelta(0)

result = td % pd.offsets.Hour(5)
assert isinstance(result, Timedelta)
assert result == Timedelta(hours=2)

result = td % np.timedelta64(2, 'h')
assert isinstance(result, Timedelta)
assert result == Timedelta(hours=1)

result = td % NaT
assert result is NaT

result = td % np.timedelta64('NaT')
assert result is NaT

# Numeric Others
result = td % 2
assert isinstance(result, Timedelta)
assert result == Timedelta(0)

result = td % 1e12
assert isinstance(result, Timedelta)
assert result == Timedelta(minutes=3, seconds=20)

result = td % int(1e12)
assert isinstance(result, Timedelta)
assert result == Timedelta(minutes=3, seconds=20)

# Invalid Others
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make the error conditions separate and test all of the reverse failures e.g.
np.array([1, 2]) % td; I think pretty much every div/mod/divmod op should fail for the reverse (int/float op with td)

with pytest.raises(TypeError):
td % pd.Timestamp('2018-01-22')

# Array-like others
result = td % np.array([6, 5], dtype='timedelta64[h]')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

expected = np.array([1, 2], dtype='timedelta64[h]').astype('m8[ns]')
tm.assert_numpy_array_equal(result, expected)

result = td % pd.TimedeltaIndex(['6H', '5H'])
expected = pd.TimedeltaIndex(['1H', '2H'])
tm.assert_index_equal(result, expected)

result = td % np.array([2, int(1e12)], dtype='i8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here as well

expected = np.array([0, Timedelta(minutes=3, seconds=20).value],
dtype='m8[ns]')
tm.assert_numpy_array_equal(result, expected)

def test_rmod(self):
# GH#19365
td = Timedelta(minutes=3)

result = timedelta(minutes=4) % td
assert isinstance(result, Timedelta)
assert result == Timedelta(minutes=1)

result = np.timedelta64(5, 'm') % td
assert isinstance(result, Timedelta)
assert result == Timedelta(minutes=2)

result = np.array([5, 6], dtype='m8[m]') % td
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above. is there a reason you are not testing TDI? (generally whenever you are testing array-like)? The array-like numerics should be with the TDI tests (if they are not in this module).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ties in to your comments about duplication. In general when we have A.op(B) and B.rop(A) we can either put this test in test_A, test_B, or both. My preference is to order these the same way as NotImplemented behavior goes. Timedelta doesn't know anything about TDI, so test_timedelta doesn't either. (By analogy, Series doesn't know anything about DataFrame...)

expected = np.array([2, 0], dtype='m8[m]').astype('m8[ns]')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

tm.assert_numpy_array_equal(result, expected)

def test_divmod(self):
# GH#19365
td = Timedelta(days=2, hours=6)

result = divmod(td, timedelta(days=1))
assert result[0] == 2
assert isinstance(result[1], Timedelta)
assert result[1] == Timedelta(hours=6)

result = divmod(td, pd.offsets.Hour(-4))
assert result[0] == -14
assert isinstance(result[1], Timedelta)
assert result[1] == Timedelta(hours=-2)

result = divmod(td, 54)
assert result[0] == Timedelta(hours=1)
assert isinstance(result[1], Timedelta)
assert result[1] == Timedelta(0)

result = divmod(td, 53 * 3600 * 1e9)
assert result[0] == Timedelta(1, unit='ns')
assert isinstance(result[1], Timedelta)
assert result[1] == Timedelta(hours=1)

assert result
result = divmod(td, np.nan)
assert result[0] is pd.NaT
assert result[1] is pd.NaT

result = divmod(td, pd.NaT)
assert np.isnan(result[0])
assert result[1] is pd.NaT

def test_rdivmod(self):
# GH#19365
result = divmod(timedelta(days=2, hours=6), Timedelta(days=1))
assert result[0] == 2
assert isinstance(result[1], Timedelta)
assert result[1] == Timedelta(hours=6)

result = divmod(pd.offsets.Hour(54), Timedelta(hours=-4))
assert result[0] == -14
assert isinstance(result[1], Timedelta)
assert result[1] == Timedelta(hours=-2)


class TestTimedeltaComparison(object):
def test_comparison_object_array(self):
Expand Down