Skip to content

BUG/TST: timedelta-like with Index/Series/DataFrame ops #23320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Oct 30, 2018

Conversation

jbrockmendel
Copy link
Member

Broken off from #23308, with additional fixes for problems found along the way.

val = np.timedelta64(4*10**9, 'ns')
ser = pd.Series([val]) * 1.5

>>> ser % val   # <-- raises TypeError in master
0   00:00:02
dtype: timedelta64[ns] 
ri = pd.RangeIndex(1)
di = pd.DataFrame(ri)

>>> ri + df  # <-- raises TypeError in master
    0
0   0
td64 = np.timedelta64(4*10**9, 'ns')
tdnat = np.timedelta64("NaT", "D")
ser = pd.Series([0, 1, 2])

>>> ser + td64   # <-- fails to raise in master, now raises TypeError; ditto with tdnat
# ditto if for radd, sub, and rsub

>>> pd.Index(ser) + tdnat  # <-- in master returns all-NaT DatetimeIndex, now raises
# ditto for sub; reversed ops DO raise TypeError in master, but for wrong reason
>>> pd.DataFrame(ser) + tdnat # <-- in msater returns all-NaT timedelta64 DataFrame, now raises
# ditto for radd, sub, and rsub
off = pd.offsets.Minute()
ser = pd.Series([1, 2, 3])

>>> off / ser  # <--  raises TypeError in master
0   00:01:00
1   00:00:30
2  00:00:20
dtype: timedelta64[ns]

One more that isn't a bug in master, but where the new behavior is better:

off = pd.offsets.Minute()
ser = pd.Series([1, 2, 3])

# master
>>> ser * off
0        <Minute>
1   <2 * Minutes>
2   <3 * Minutes>
dtype: object

# PR
>>> ser * off
0   00:01:00
1   00:02:00
2   00:03:00
dtype: timedelta64[ns]

@pep8speaks
Copy link

Hello @jbrockmendel! Thanks for submitting the PR.

@mroeschke
Copy link
Member

Might be out of the loops, but ops with datetime.timedelta and TimedeltaIndex operate as intened too with Index, Series, etc?

@codecov
Copy link

codecov bot commented Oct 25, 2018

Codecov Report

Merging #23320 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23320      +/-   ##
==========================================
+ Coverage   92.18%   92.18%   +<.01%     
==========================================
  Files         161      161              
  Lines       51184    51190       +6     
==========================================
+ Hits        47185    47191       +6     
  Misses       3999     3999
Flag Coverage Δ
#multiple 90.62% <100%> (ø) ⬆️
#single 42.23% <50%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/sparse/frame.py 94.86% <100%> (ø) ⬆️
pandas/core/ops.py 94.25% <100%> (+0.01%) ⬆️
pandas/core/frame.py 97.03% <100%> (ø) ⬆️
pandas/core/indexes/base.py 96.62% <100%> (ø) ⬆️
pandas/core/indexes/range.py 95.79% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a2e5994...481bb90. Read the comment docs.

@jbrockmendel
Copy link
Member Author

Might be out of the loops, but ops with datetime.timedelta and TimedeltaIndex operate as intened too with Index, Series, etc?

Yes. The "broken" case is one where we don't override numpy's behavior, which is only with a Integer Series/DataFrame operating specifically with np.timedelta64 scalar.

@@ -125,11 +127,18 @@ def maybe_upcast_for_op(obj):
Be careful to call this *after* determining the `name` attribute to be
attached to the result of the arithmetic operation.
"""
if type(obj) is datetime.timedelta:
if type(obj) is datetime.timedelta or isinstance(obj, Tick):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put these in the same isinstance check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The cost is that sometimes we'll catch pd.Timedelta here, but that's pretty cheap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k, otherwise this lgtm. ping on green.

@@ -150,14 +150,6 @@ def test_numeric_arr_mul_tdscalar(self, scalar_td, numeric_idx, box):
def test_numeric_arr_rdiv_tdscalar(self, three_days, numeric_idx, box):
index = numeric_idx[1:3]

broken = (isinstance(three_days, np.timedelta64) and
three_days.dtype != 'm8[ns]')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually had this in a test ...hahha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, this is to xfail the broken stuff this PR fixes

@@ -1243,7 +1243,6 @@ def test_binop_other(self, op, value, dtype):
(operator.mul, '<M8[ns]'),
(operator.add, '<M8[ns]'),
(operator.pow, '<m8[ns]'),
(operator.mod, '<m8[ns]'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing basically-wrong behavior.

@jreback jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type labels Oct 26, 2018
@jreback
Copy link
Contributor

jreback commented Oct 26, 2018

does any of this need a whatsnew?

@jbrockmendel
Copy link
Member Author

does any of this need a whatsnew?

The fixed bugs are sufficiently small that no one has opened an Issue. I can write a couple lines, but it wouldn't be a grave omission if skipped.

@jreback jreback added this to the 0.24.0 milestone Oct 26, 2018
@jorisvandenbossche
Copy link
Member

@jbrockmendel About the offset behaviour you mention in the first post:

off = pd.offsets.Minute()
ser = pd.Series([1, 2, 3])

>>> off / ser  # <--  raises TypeError in master
0   00:01:00
1   00:00:30
2  00:00:20
dtype: timedelta64[ns]

Why not keep it as offsets? I find this conversion to timedelta's a bit strange (if somebody really wants this, they can use a timedelta to start with)

Also, the same operation with a scalar also raises:

In [107]: off / 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-107-52397db32693> in <module>()
----> 1 off / 2

TypeError: unsupported operand type(s) for /: 'Minute' and 'int'

shouldn't that be consistent?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I don't necessarily request a change, just a discussion, see my comment)

@jbrockmendel
Copy link
Member Author

Why not keep it as offsets? I find this conversion to timedelta's a bit strange (if somebody really wants this, they can use a timedelta to start with)

Tick is basically redundant with Timedelta, so basically all that is accomplished by returning a Series of Ticks is getting object-dtype performance.

@jbrockmendel
Copy link
Member Author

Also, the same operation with a scalar also raises:

Yes, the Tick methods needs some fleshing-out.

@jorisvandenbossche
Copy link
Member

Yes, but if you care about that, you can use Timedelta's ?

Eg, now when working with Periods, you can get an object array/series of offsets:

In [117]: s = pd.Series(pd.core.arrays.period_array(['2012', '2012'], freq='A'))

In [118]: s
Out[118]: 
0    2012
1    2012
dtype: period[A-DEC]

In [119]: s - s[0]
Out[119]: 
0    <0 * YearEnds: month=12>
1    <0 * YearEnds: month=12>
dtype: object

Since you end up with that here, why not when doing arithmetic with offsets?

@jbrockmendel
Copy link
Member Author

Eg, now when working with Periods, you can get an object array/series of offsets:

For "A" freq that is unavoidable, but for "D" and lower I think we should change that to return timedelta64. (separate Issue/PR)

Since you end up with that here, why not when doing arithmetic with offsets?

Immediate reason: because that is the easiest way to fix the extant bug(s).
Big-picture reason: I am not aware of any good reason why a user would want the object array, so see it as a "why not" rather than a "why".

@jorisvandenbossche
Copy link
Member

Another thing, in the OP you posted:

off = pd.offsets.Minute()
ser = pd.Series([1, 2, 3])

# master
>>> ser * off
0        <Minute>
1   <2 * Minutes>
2   <3 * Minutes>
dtype: object

# PR
>>> ser * off
0   00:01:00
1   00:02:00
2   00:03:00
dtype: timedelta64[ns]

But will then also the following change?

In [124]: pd.Series([off, off, off]) * 3
Out[124]: 
0    <3 * Minutes>
1    <3 * Minutes>
2    <3 * Minutes>
dtype: object

and

In [125]: off * 3
Out[125]: <3 * Minutes>

I think those operations should be consistent throughout the cases (whether left/right operands are scalars, ndarray or Series)

@jorisvandenbossche
Copy link
Member

Big-picture reason: I am not aware of any good reason why a user would want the object array, so see it as a "why not" rather than a "why".

A good "why" IMO is consistency with the scalar operations?

@jbrockmendel
Copy link
Member Author

I think those operations should be consistent throughout the cases

This one I fully agree on; that is a downside to the change in this PR. I'd rather address it in an upcoming pass (I want to clear out my datetimelike PRs before jumping into DatetimeArray) but understand the opposite viewpoint.

@jorisvandenbossche
Copy link
Member

Yes, I don't think we should introduce the inconsistency here in this PR, if we would later decide we actually want to keep it consistent.
Is the offset change essential for the other bug fixes?

@jbrockmendel
Copy link
Member Author

Is the offset change essential for the other bug fixes?

No. I'll revert it, both because you make a reasonable point and because I can't keep up with your commenting speed.

@jbrockmendel
Copy link
Member Author

Reverted.

@jbrockmendel
Copy link
Member Author

Rebased. This is the last one I'd like to close out before making a big push on DatetimeArray

@jreback
Copy link
Contributor

jreback commented Oct 30, 2018

lgtm. over to you @jorisvandenbossche

@jorisvandenbossche jorisvandenbossche merged commit 7191af9 into pandas-dev:master Oct 30, 2018
@jorisvandenbossche
Copy link
Member

Thanks @jbrockmendel

Do you want to open an issue to further discuss the Offset arithmetic?

# nanoseconds, or else we get undesired behavior like
# np.timedelta64(3, 'D') / 2 == np.timedelta64(1, 'D')
# The isna check is to avoid casting timedelta64("NaT"), which would
# return NaT and incorrectly be treated as a datetime-NaT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this np.timedelta('nat') special case here, isn't this pointing to an inherent problem with not having a timedelta NaT ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In the past when I've brought up the idea of having something like pd.NaTD the idea has gone nowhere. I'd be on board if you want to revive the suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was testing a few things on master, this seems a bug:

In [9]: pd.Series([pd.Timedelta('1s')]) + np.timedelta64('nat')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-f56b47937d6f> in <module>()
----> 1 pd.Series([pd.Timedelta('1s')]) + np.timedelta64('nat')

~/scipy/pandas/pandas/core/ops.py in wrapper(left, right)
   1408 
   1409         elif is_timedelta64_dtype(left):
-> 1410             result = dispatch_to_index_op(op, left, right, pd.TimedeltaIndex)
   1411             return construct_result(left, result,
   1412                                     index=left.index, name=res_name,

~/scipy/pandas/pandas/core/ops.py in dispatch_to_index_op(op, left, right, index_class)
   1040         left_idx = left_idx._shallow_copy(freq=None)
   1041     try:
-> 1042         result = op(left_idx, right)
   1043     except NullFrequencyError:
   1044         # DatetimeIndex and TimedeltaIndex with freq == None raise ValueError

~/scipy/pandas/pandas/core/indexes/datetimelike.py in __add__(self, other)
    578         def __add__(self, other):
    579             # dispatch to ExtensionArray implementation
--> 580             result = super(cls, self).__add__(other)
    581             return wrap_arithmetic_op(self, other, result)
    582 

~/scipy/pandas/pandas/core/arrays/datetimelike.py in __add__(self, other)
    626                 result = self._add_nat()
    627             elif isinstance(other, (Tick, timedelta, np.timedelta64)):
--> 628                 result = self._add_delta(other)
    629             elif isinstance(other, DateOffset):
    630                 # specifically _not_ a Tick

~/scipy/pandas/pandas/core/arrays/timedeltas.py in _add_delta(self, delta)
    204         result : TimedeltaArray
    205         """
--> 206         new_values = dtl.DatetimeLikeArrayMixin._add_delta(self, delta)
    207         return type(self)(new_values, freq='infer')
    208 

~/scipy/pandas/pandas/core/arrays/datetimelike.py in _add_delta(self, other)
    371         """
    372         if isinstance(other, (Tick, timedelta, np.timedelta64)):
--> 373             new_values = self._add_timedeltalike_scalar(other)
    374         elif is_timedelta64_dtype(other):
    375             # ndarray[timedelta64] or TimedeltaArray/index

~/scipy/pandas/pandas/core/arrays/datetimelike.py in _add_timedeltalike_scalar(self, other)
    383         return the i8 result view
    384         """
--> 385         inc = delta_to_nanoseconds(other)
    386         new_values = checked_add_with_arr(self.asi8, inc,
    387                                           arr_mask=self._isnan).view('i8')

~/scipy/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds()

~/scipy/pandas/pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.delta_to_nanoseconds()

TypeError: an integer is required

But if I remove here the not isna check, it works correctly (since s + pd.NaT works correctly and preserves the timedelta dtype). But I suppose there are other code paths don't work with the pd.NaT ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll get a PR to fix+test this going sometime today.

@jbrockmendel jbrockmendel deleted the noxfail2 branch October 30, 2018 16:28
'radd', 'rsub']:
raise TypeError("Operation {opname} between {cls} and {other} "
"is invalid".format(opname=op.__name__,
cls=type(self).__name__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we should use here the name of the dtype instead of the class, as this will bubble up for Series as well

In [3]: ser + tdnat
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-4ad1d16388f8> in <module>()
----> 1 ser + tdnat

~/scipy/pandas/pandas/core/ops.py in wrapper(left, right)
   1419             #  that may incorrectly raise TypeError when we
   1420             #  should get NullFrequencyError
-> 1421             result = op(pd.Index(left), right)
   1422             return construct_result(left, result,
   1423                                     index=left.index, name=res_name,

~/scipy/pandas/pandas/core/indexes/base.py in index_arithmetic_method(self, other)
    137         # handle time-based others
    138         if isinstance(other, (ABCDateOffset, np.timedelta64, timedelta)):
--> 139             return self._evaluate_with_timedelta_like(other, op)
    140         elif isinstance(other, (datetime, np.datetime64)):
    141             return self._evaluate_with_datetime_like(other, op)

~/scipy/pandas/pandas/core/indexes/base.py in _evaluate_with_timedelta_like(self, other, op)
   4708                             "is invalid".format(opname=op.__name__,
   4709                                                 cls=type(self).__name__,
-> 4710                                                 other=type(other).__name__))
   4711 
   4712         other = Timedelta(other)

TypeError: Operation add between Int64Index and timedelta64 is invalid

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update in follow-up PR that addresses the bug above.

@jbrockmendel jbrockmendel mentioned this pull request Oct 30, 2018
4 tasks
gfyoung pushed a commit that referenced this pull request Nov 6, 2018
JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants