-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG/TST: timedelta-like with Index/Series/DataFrame ops #23320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jbrockmendel! Thanks for submitting the PR.
|
Might be out of the loops, but ops with |
Codecov Report
@@ Coverage Diff @@
## master #23320 +/- ##
==========================================
+ Coverage 92.18% 92.18% +<.01%
==========================================
Files 161 161
Lines 51184 51190 +6
==========================================
+ Hits 47185 47191 +6
Misses 3999 3999
Continue to review full report at Codecov.
|
Yes. The "broken" case is one where we don't override numpy's behavior, which is only with a Integer Series/DataFrame operating specifically with np.timedelta64 scalar. |
pandas/core/ops.py
Outdated
@@ -125,11 +127,18 @@ def maybe_upcast_for_op(obj): | |||
Be careful to call this *after* determining the `name` attribute to be | |||
attached to the result of the arithmetic operation. | |||
""" | |||
if type(obj) is datetime.timedelta: | |||
if type(obj) is datetime.timedelta or isinstance(obj, Tick): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put these in the same isinstance check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. The cost is that sometimes we'll catch pd.Timedelta
here, but that's pretty cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k, otherwise this lgtm. ping on green.
@@ -150,14 +150,6 @@ def test_numeric_arr_mul_tdscalar(self, scalar_td, numeric_idx, box): | |||
def test_numeric_arr_rdiv_tdscalar(self, three_days, numeric_idx, box): | |||
index = numeric_idx[1:3] | |||
|
|||
broken = (isinstance(three_days, np.timedelta64) and | |||
three_days.dtype != 'm8[ns]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually had this in a test ...hahha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, this is to xfail the broken stuff this PR fixes
@@ -1243,7 +1243,6 @@ def test_binop_other(self, op, value, dtype): | |||
(operator.mul, '<M8[ns]'), | |||
(operator.add, '<M8[ns]'), | |||
(operator.pow, '<m8[ns]'), | |||
(operator.mod, '<m8[ns]'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing basically-wrong behavior.
does any of this need a whatsnew? |
The fixed bugs are sufficiently small that no one has opened an Issue. I can write a couple lines, but it wouldn't be a grave omission if skipped. |
@jbrockmendel About the offset behaviour you mention in the first post:
Why not keep it as offsets? I find this conversion to timedelta's a bit strange (if somebody really wants this, they can use a timedelta to start with) Also, the same operation with a scalar also raises:
shouldn't that be consistent? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't necessarily request a change, just a discussion, see my comment)
Tick is basically redundant with Timedelta, so basically all that is accomplished by returning a Series of Ticks is getting object-dtype performance. |
Yes, the Tick methods needs some fleshing-out. |
Yes, but if you care about that, you can use Timedelta's ? Eg, now when working with Periods, you can get an object array/series of offsets:
Since you end up with that here, why not when doing arithmetic with offsets? |
For "A" freq that is unavoidable, but for "D" and lower I think we should change that to return timedelta64. (separate Issue/PR)
Immediate reason: because that is the easiest way to fix the extant bug(s). |
Another thing, in the OP you posted:
But will then also the following change?
and
I think those operations should be consistent throughout the cases (whether left/right operands are scalars, ndarray or Series) |
A good "why" IMO is consistency with the scalar operations? |
This one I fully agree on; that is a downside to the change in this PR. I'd rather address it in an upcoming pass (I want to clear out my datetimelike PRs before jumping into DatetimeArray) but understand the opposite viewpoint. |
Yes, I don't think we should introduce the inconsistency here in this PR, if we would later decide we actually want to keep it consistent. |
No. I'll revert it, both because you make a reasonable point and because I can't keep up with your commenting speed. |
Reverted. |
Rebased. This is the last one I'd like to close out before making a big push on DatetimeArray |
lgtm. over to you @jorisvandenbossche |
Thanks @jbrockmendel Do you want to open an issue to further discuss the Offset arithmetic? |
# nanoseconds, or else we get undesired behavior like | ||
# np.timedelta64(3, 'D') / 2 == np.timedelta64(1, 'D') | ||
# The isna check is to avoid casting timedelta64("NaT"), which would | ||
# return NaT and incorrectly be treated as a datetime-NaT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this np.timedelta('nat')
special case here, isn't this pointing to an inherent problem with not having a timedelta NaT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In the past when I've brought up the idea of having something like pd.NaTD
the idea has gone nowhere. I'd be on board if you want to revive the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was testing a few things on master, this seems a bug:
In [9]: pd.Series([pd.Timedelta('1s')]) + np.timedelta64('nat')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-f56b47937d6f> in <module>()
----> 1 pd.Series([pd.Timedelta('1s')]) + np.timedelta64('nat')
~/scipy/pandas/pandas/core/ops.py in wrapper(left, right)
1408
1409 elif is_timedelta64_dtype(left):
-> 1410 result = dispatch_to_index_op(op, left, right, pd.TimedeltaIndex)
1411 return construct_result(left, result,
1412 index=left.index, name=res_name,
~/scipy/pandas/pandas/core/ops.py in dispatch_to_index_op(op, left, right, index_class)
1040 left_idx = left_idx._shallow_copy(freq=None)
1041 try:
-> 1042 result = op(left_idx, right)
1043 except NullFrequencyError:
1044 # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError
~/scipy/pandas/pandas/core/indexes/datetimelike.py in __add__(self, other)
578 def __add__(self, other):
579 # dispatch to ExtensionArray implementation
--> 580 result = super(cls, self).__add__(other)
581 return wrap_arithmetic_op(self, other, result)
582
~/scipy/pandas/pandas/core/arrays/datetimelike.py in __add__(self, other)
626 result = self._add_nat()
627 elif isinstance(other, (Tick, timedelta, np.timedelta64)):
--> 628 result = self._add_delta(other)
629 elif isinstance(other, DateOffset):
630 # specifically _not_ a Tick
~/scipy/pandas/pandas/core/arrays/timedeltas.py in _add_delta(self, delta)
204 result : TimedeltaArray
205 """
--> 206 new_values = dtl.DatetimeLikeArrayMixin._add_delta(self, delta)
207 return type(self)(new_values, freq='infer')
208
~/scipy/pandas/pandas/core/arrays/datetimelike.py in _add_delta(self, other)
371 """
372 if isinstance(other, (Tick, timedelta, np.timedelta64)):
--> 373 new_values = self._add_timedeltalike_scalar(other)
374 elif is_timedelta64_dtype(other):
375 # ndarray[timedelta64] or TimedeltaArray/index
~/scipy/pandas/pandas/core/arrays/datetimelike.py in _add_timedeltalike_scalar(self, other)
383 return the i8 result view
384 """
--> 385 inc = delta_to_nanoseconds(other)
386 new_values = checked_add_with_arr(self.asi8, inc,
387 arr_mask=self._isnan).view('i8')
~/scipy/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds()
~/scipy/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds()
TypeError: an integer is required
But if I remove here the not isna
check, it works correctly (since s + pd.NaT
works correctly and preserves the timedelta dtype). But I suppose there are other code paths don't work with the pd.NaT
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll get a PR to fix+test this going sometime today.
'radd', 'rsub']: | ||
raise TypeError("Operation {opname} between {cls} and {other} " | ||
"is invalid".format(opname=op.__name__, | ||
cls=type(self).__name__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we should use here the name of the dtype instead of the class, as this will bubble up for Series as well
In [3]: ser + tdnat
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-4ad1d16388f8> in <module>()
----> 1 ser + tdnat
~/scipy/pandas/pandas/core/ops.py in wrapper(left, right)
1419 # that may incorrectly raise TypeError when we
1420 # should get NullFrequencyError
-> 1421 result = op(pd.Index(left), right)
1422 return construct_result(left, result,
1423 index=left.index, name=res_name,
~/scipy/pandas/pandas/core/indexes/base.py in index_arithmetic_method(self, other)
137 # handle time-based others
138 if isinstance(other, (ABCDateOffset, np.timedelta64, timedelta)):
--> 139 return self._evaluate_with_timedelta_like(other, op)
140 elif isinstance(other, (datetime, np.datetime64)):
141 return self._evaluate_with_datetime_like(other, op)
~/scipy/pandas/pandas/core/indexes/base.py in _evaluate_with_timedelta_like(self, other, op)
4708 "is invalid".format(opname=op.__name__,
4709 cls=type(self).__name__,
-> 4710 other=type(other).__name__))
4711
4712 other = Timedelta(other)
TypeError: Operation add between Int64Index and timedelta64 is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update in follow-up PR that addresses the bug above.
Broken off from #23308, with additional fixes for problems found along the way.
One more that isn't a bug in master, but where the new behavior is better: