-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
implement truediv, rtruediv directly in TimedeltaArray; tests #23829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
6ec1f08
4c2cc59
bd2ee96
3275dd9
79901f5
adea273
da9f743
ba9e490
10bb49b
8f276ae
7d56da9
6097789
2037be8
ffedf35
2fc44aa
cd4ff57
641ad20
e0d696f
7d9e677
dfc7af4
55cad6b
d21ae78
d72bf90
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,7 +7,7 @@ | |
|
||
import numpy as np | ||
|
||
from pandas._libs import tslibs | ||
from pandas._libs import lib, tslibs | ||
from pandas._libs.tslibs import NaT, Timedelta, Timestamp, iNaT | ||
from pandas._libs.tslibs.fields import get_timedelta_field | ||
from pandas._libs.tslibs.timedeltas import ( | ||
|
@@ -165,6 +165,7 @@ def _simple_new(cls, values, freq=None, dtype=_TD_DTYPE): | |
def __new__(cls, values, freq=None, dtype=_TD_DTYPE): | ||
|
||
freq, freq_infer = dtl.maybe_infer_freq(freq) | ||
values, inferred_freq = sequence_to_td64ns(values) | ||
|
||
values = np.array(values, copy=False) | ||
if values.dtype == np.object_: | ||
|
@@ -324,12 +325,102 @@ def _evaluate_with_timedelta_like(self, other, op): | |
|
||
__mul__ = _wrap_tdi_op(operator.mul) | ||
__rmul__ = __mul__ | ||
__truediv__ = _wrap_tdi_op(operator.truediv) | ||
__floordiv__ = _wrap_tdi_op(operator.floordiv) | ||
__rfloordiv__ = _wrap_tdi_op(ops.rfloordiv) | ||
|
||
def __truediv__(self, other): | ||
# timedelta / X is well-defined for timedelta-like or numeric X | ||
other = lib.item_from_zerodim(other) | ||
|
||
if isinstance(other, (ABCSeries, ABCDataFrame, ABCIndexClass)): | ||
return NotImplemented | ||
|
||
if isinstance(other, (timedelta, np.timedelta64, Tick)): | ||
other = Timedelta(other) | ||
if other is NaT: | ||
# specifically timedelta64-NaT | ||
result = np.empty(self.shape, dtype=np.float64) | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
result.fill(np.nan) | ||
return result | ||
|
||
# otherwise, dispatch to Timedelta implementation | ||
return self._data / other | ||
|
||
elif lib.is_scalar(other): | ||
# assume it is numeric | ||
result = self._data / other | ||
freq = None | ||
if self.freq is not None: | ||
# Tick division is not implemented, so operate on Timedelta | ||
freq = self.freq.delta / other | ||
return type(self)(result, freq=freq) | ||
|
||
if not hasattr(other, "dtype"): | ||
# e.g. list, tuple | ||
other = np.array(other) | ||
|
||
if len(other) != len(self): | ||
raise ValueError("Cannot divide vectors with unequal lengths") | ||
|
||
elif is_timedelta64_dtype(other): | ||
# let numpy handle it | ||
return self._data / other | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to ensure There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but that is another level of redirection, while we know this will happen and can directly do the correct thing here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, since we exclude Series and Index at the start.
I guess we could replace There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the (hopefully not too distant future), TimedeltaArray will no longer be an Index. In this case we would want to explicitly grab the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And what's the return type here? Does this need to be wrapped in a a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
float-dtyped ndarray |
||
|
||
elif is_object_dtype(other): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to allow this? (I first wanted to say: can't we dispatch that to numpy, thinking that numpy object dtype would handle that, but they raise a TypeError) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can see why this would be first on the chopping block if we had to support fewer cases. Is there a compelling reason not to handle this case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can also turn around the question :) Is there a compelling reason to do handle this case? It's just an extra case to support. And eg, we could discuss whether this should return object dtype data or timedelta, as you are inferring now? Looking at Series behaviour with int64 and object integers, it actually returns object. For datetimes it now raises. So at least, our support is at the moment not very consistent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because a selling point of pandas is that things Just Work? Because the code and tests are already written, so the marginal cost is \approx zero?
Fair enough. If a goal is to make things more consistent (which I'm +1 on BTW) then we're probably not going to go around and start breaking the places where it currently is supported. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i agree with @jbrockmendel here |
||
result = [self[n] / other[n] for n in range(len(self))] | ||
result = np.array(result) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually this is really close to what soft_convert_objects does. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That isn't clear to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you are essentially re-implementing it. i would rather not do that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not clear on what you have in mind. Something like:
? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough, will change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed. this ends up changing the behavior of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is changed here? shouldn't is_object_type result in a TypeError or a NotImplemented? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two independent questions that have been asked about the object-dtype case:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok this is fine, you are returning object dtype (which is consistent with how we do for Series now) |
||
return result | ||
|
||
else: | ||
result = self._data / other | ||
return type(self)(result) | ||
|
||
def __rtruediv__(self, other): | ||
# X / timedelta is defined only for timedelta-like X | ||
other = lib.item_from_zerodim(other) | ||
|
||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if isinstance(other, (ABCSeries, ABCDataFrame, ABCIndexClass)): | ||
return NotImplemented | ||
|
||
if isinstance(other, (timedelta, np.timedelta64, Tick)): | ||
other = Timedelta(other) | ||
if other is NaT: | ||
# specifically timedelta64-NaT | ||
result = np.empty(self.shape, dtype=np.float64) | ||
result.fill(np.nan) | ||
return result | ||
|
||
# otherwise, dispatch to Timedelta implementation | ||
return other / self._data | ||
|
||
elif lib.is_scalar(other): | ||
raise TypeError("Cannot divide {typ} by {cls}" | ||
.format(typ=type(other).__name__, | ||
cls=type(self).__name__)) | ||
|
||
if not hasattr(other, "dtype"): | ||
# e.g. list, tuple | ||
other = np.array(other) | ||
|
||
if len(other) != len(self): | ||
raise ValueError("Cannot divide vectors with unequal lengths") | ||
|
||
elif is_timedelta64_dtype(other): | ||
# let numpy handle it | ||
return other / self._data | ||
|
||
elif is_object_dtype(other): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or can raise NotImplemented here? does that work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that might be fragile; might depend on having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, can you add a comment here (and above), were we do the operation but do not infer the output type (just for posterity), otherwise this PR lgtm. ping on green. |
||
result = [other[n] / self[n] for n in range(len(self))] | ||
return np.array(result) | ||
|
||
else: | ||
raise TypeError("Cannot divide {dtype} data by {cls}" | ||
.format(dtype=other.dtype, | ||
cls=type(self).__name__)) | ||
|
||
if compat.PY2: | ||
__div__ = __truediv__ | ||
__rdiv__ = __rtruediv__ | ||
|
||
# Note: TimedeltaIndex overrides this in call to cls._add_numeric_methods | ||
def __neg__(self): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1115,14 +1115,33 @@ def test_tdi_rmul_arraylike(self, other, box_with_array): | |
tm.assert_equal(commute, expected) | ||
|
||
# ------------------------------------------------------------------ | ||
# __div__ | ||
# __div__, __rdiv__ | ||
|
||
def test_td64arr_div_nat_invalid(self, box_with_array): | ||
# don't allow division by NaT (maybe could in the future) | ||
rng = timedelta_range('1 days', '10 days', name='foo') | ||
rng = tm.box_expected(rng, box_with_array) | ||
with pytest.raises(TypeError): | ||
|
||
with pytest.raises(TypeError, match='true_divide cannot use operands'): | ||
rng / pd.NaT | ||
with pytest.raises(TypeError, match='Cannot divide NaTType by'): | ||
pd.NaT / rng | ||
|
||
def test_td64arr_div_td64nat(self, box_with_array): | ||
# GH#23829 | ||
rng = timedelta_range('1 days', '10 days',) | ||
rng = tm.box_expected(rng, box_with_array) | ||
|
||
other = np.timedelta64('NaT') | ||
|
||
expected = np.array([np.nan] * 10) | ||
expected = tm.box_expected(expected, box_with_array) | ||
|
||
result = rng / other | ||
tm.assert_equal(result, expected) | ||
|
||
result = other / rng | ||
tm.assert_equal(result, expected) | ||
|
||
def test_td64arr_div_int(self, box_with_array): | ||
idx = TimedeltaIndex(np.arange(5, dtype='int64')) | ||
|
@@ -1131,7 +1150,11 @@ def test_td64arr_div_int(self, box_with_array): | |
result = idx / 1 | ||
tm.assert_equal(result, idx) | ||
|
||
def test_tdi_div_tdlike_scalar(self, two_hours, box_with_array): | ||
with pytest.raises(TypeError, match='Cannot divide'): | ||
# GH#23829 | ||
1 / idx | ||
jbrockmendel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def test_td64arr_div_tdlike_scalar(self, two_hours, box_with_array): | ||
# GH#20088, GH#22163 ensure DataFrame returns correct dtype | ||
rng = timedelta_range('1 days', '10 days', name='foo') | ||
expected = pd.Float64Index((np.arange(10) + 1) * 12, name='foo') | ||
|
@@ -1142,7 +1165,12 @@ def test_tdi_div_tdlike_scalar(self, two_hours, box_with_array): | |
result = rng / two_hours | ||
tm.assert_equal(result, expected) | ||
|
||
def test_tdi_div_tdlike_scalar_with_nat(self, two_hours, box_with_array): | ||
result = two_hours / rng | ||
expected = 1 / expected | ||
tm.assert_equal(result, expected) | ||
|
||
def test_td64arr_div_tdlike_scalar_with_nat(self, two_hours, | ||
box_with_array): | ||
rng = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo') | ||
expected = pd.Float64Index([12, np.nan, 24], name='foo') | ||
|
||
|
@@ -1152,6 +1180,58 @@ def test_tdi_div_tdlike_scalar_with_nat(self, two_hours, box_with_array): | |
result = rng / two_hours | ||
tm.assert_equal(result, expected) | ||
|
||
result = two_hours / rng | ||
expected = 1 / expected | ||
tm.assert_equal(result, expected) | ||
|
||
def test_td64arr_div_td64_ndarray(self, box_with_array): | ||
# GH#22631 | ||
rng = TimedeltaIndex(['1 days', pd.NaT, '2 days']) | ||
expected = pd.Float64Index([12, np.nan, 24]) | ||
gfyoung marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
rng = tm.box_expected(rng, box_with_array) | ||
expected = tm.box_expected(expected, box_with_array) | ||
|
||
other = np.array([2, 4, 2], dtype='m8[h]') | ||
result = rng / other | ||
tm.assert_equal(result, expected) | ||
|
||
result = rng / tm.box_expected(other, box_with_array) | ||
tm.assert_equal(result, expected) | ||
|
||
result = rng / other.astype(object) | ||
tm.assert_equal(result, expected) | ||
|
||
result = rng / list(other) | ||
tm.assert_equal(result, expected) | ||
|
||
# reversed op | ||
expected = 1 / expected | ||
result = other / rng | ||
tm.assert_equal(result, expected) | ||
|
||
result = tm.box_expected(other, box_with_array) / rng | ||
tm.assert_equal(result, expected) | ||
|
||
result = other.astype(object) / rng | ||
tm.assert_equal(result, expected) | ||
|
||
result = list(other) / rng | ||
tm.assert_equal(result, expected) | ||
|
||
def test_tdarr_div_length_mismatch(self, box_with_array): | ||
rng = TimedeltaIndex(['1 days', pd.NaT, '2 days']) | ||
mismatched = [1, 2, 3, 4] | ||
|
||
rng = tm.box_expected(rng, box_with_array) | ||
for obj in [mismatched, mismatched[:2]]: | ||
# one shorter, one longer | ||
for other in [obj, np.array(obj), pd.Index(obj)]: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to parameterize this? Potentially via flags to indicate what transformation to perform on Definitely the outer-loop can be parameterized (just use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's possible, but I'm trying to push back a little bit against over-parametrization. The pytest setup cost is non-trivial |
||
with pytest.raises(ValueError): | ||
rng / other | ||
with pytest.raises(ValueError): | ||
other / rng | ||
|
||
# ------------------------------------------------------------------ | ||
# __floordiv__, __rfloordiv__ | ||
|
||
|
@@ -1203,6 +1283,10 @@ def test_td64arr_floordiv_int(self, box_with_array): | |
result = idx // 1 | ||
tm.assert_equal(result, idx) | ||
|
||
pattern = 'floor_divide cannot use operands' | ||
with pytest.raises(TypeError, match=pattern): | ||
1 // idx | ||
|
||
def test_td64arr_floordiv_tdlike_scalar(self, two_hours, box_with_array): | ||
tdi = timedelta_range('1 days', '10 days', name='foo') | ||
expected = pd.Int64Index((np.arange(10) + 1) * 12, name='foo') | ||
|
@@ -1309,6 +1393,9 @@ def test_td64arr_div_numeric_scalar(self, box_with_array, two): | |
result = tdser / two | ||
tm.assert_equal(result, expected) | ||
|
||
with pytest.raises(TypeError, match='Cannot divide'): | ||
two / tdser | ||
|
||
@pytest.mark.parametrize('dtype', ['int64', 'int32', 'int16', | ||
'uint64', 'uint32', 'uint16', 'uint8', | ||
'float64', 'float32', 'float16']) | ||
|
@@ -1358,9 +1445,26 @@ def test_td64arr_div_numeric_array(self, box_with_array, vector, dtype): | |
result = tdser / vector | ||
tm.assert_equal(result, expected) | ||
|
||
with pytest.raises(TypeError): | ||
pattern = ('true_divide cannot use operands|' | ||
'cannot perform __div__|' | ||
'cannot perform __truediv__|' | ||
'unsupported operand|' | ||
'Cannot divide') | ||
with pytest.raises(TypeError, match=pattern): | ||
vector / tdser | ||
|
||
if (not isinstance(vector, pd.Index) and | ||
box_with_array is not pd.DataFrame): | ||
# Index.__rdiv__ won't try to operate elementwise, just raises | ||
# DataFrame casts just-NaT object column to datetime64 | ||
result = tdser / vector.astype(object) | ||
expected = [tdser[n] / vector[n] for n in range(len(tdser))] | ||
expected = tm.box_expected(expected, xbox) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wouldn't this converted the expected list to an DatetimeArray, while the expected result is on object ndarray? (so it's not really testing that?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. In the case where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, OK, all a bit opaque .. (I checked what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yah, I'm hoping to simplify some of it, and ideally even get rid of box_expected, but it'll be a while before thats feasible. |
||
tm.assert_equal(result, expected) | ||
|
||
with pytest.raises(TypeError, match=pattern): | ||
vector.astype(object) / tdser | ||
|
||
@pytest.mark.parametrize('names', [(None, None, None), | ||
('Egon', 'Venkman', None), | ||
('NCC1701D', 'NCC1701D', 'NCC1701D')]) | ||
|
@@ -1391,20 +1495,25 @@ def test_td64arr_mul_int_series(self, box_df_fail, names): | |
@pytest.mark.parametrize('names', [(None, None, None), | ||
('Egon', 'Venkman', None), | ||
('NCC1701D', 'NCC1701D', 'NCC1701D')]) | ||
def test_float_series_rdiv_td64arr(self, box, names): | ||
def test_float_series_rdiv_td64arr(self, box_with_array, names): | ||
# GH#19042 test for correct name attachment | ||
# TODO: the direct operation TimedeltaIndex / Series still | ||
# needs to be fixed. | ||
box = box_with_array | ||
tdi = TimedeltaIndex(['0days', '1day', '2days', '3days', '4days'], | ||
name=names[0]) | ||
ser = Series([1.5, 3, 4.5, 6, 7.5], dtype=np.float64, name=names[1]) | ||
|
||
xname = names[2] if box is not tm.to_array else names[1] | ||
expected = Series([tdi[n] / ser[n] for n in range(len(ser))], | ||
dtype='timedelta64[ns]', | ||
name=names[2]) | ||
name=xname) | ||
|
||
xbox = box | ||
if box in [pd.Index, tm.to_array] and type(ser) is Series: | ||
xbox = Series | ||
|
||
tdi = tm.box_expected(tdi, box) | ||
xbox = Series if (box is pd.Index and type(ser) is Series) else box | ||
expected = tm.box_expected(expected, xbox) | ||
|
||
result = ser.__rdiv__(tdi) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you need the refernces issue number here. do you need to mention the other issue that is refernces in the top or PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do