Skip to content

TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jan 7, 2018 · 4 comments · Fixed by #19139
Closed

TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

jbrockmendel opened this issue Jan 7, 2018 · 4 comments · Fixed by #19139
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type
Milestone

Comments

@jbrockmendel
Copy link
Member

Analogous to (#18808)

Adapted from tests.indexes.timedeltas.test_arithmetic.TestTimedeltaIndexArithmetic.test_timedelta_ops_with_missing_values

tdi = pd.TimedeltaIndex(['00:00:01'])
ser = pd.Series(tdi)

>>> tdi + pd.NaT
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)
>>> tdi - pd.NaT
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)

>>> ser + pd.NaT
0   NaT
dtype: timedelta64[ns]
>>> ser - pd.NaT
0   NaT
dtype: timedelta64[ns]

i.e. Series[timedelta64] treats pd.NaT as a timedelta, while TimedeltaIndex treats pd.NaT as a datetime (in a way that makes sense for addition but not for subtraction).

Expected Behavior
Internal consistency in the sense that:

(tdi - pd.NaT).dtype == (ser - pd.NaT).dtype
(tdi + pd.NaT).dtype == (ser + pd.NaT).dtype

For tdi - pd.NaT it only makes sense for the result to be timedelta64, so the TimedeltaIndex behavior is wrong.

For tdi + pd.NaT a reasonable argument can be made for the result being either timedelta64 or datetime64. Returning timedelta64 maintains consistency with TimedeltaIndex.__sub__, while datetime64 maintains consistency with TimedeltaIndex.__radd__. All in all I think keeping addition commutative is the more important of the two.

@jreback
Copy link
Contributor

jreback commented Jan 7, 2018

these should all be timedelta

@jreback jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type labels Jan 7, 2018
@jreback jreback added this to the 0.23.0 milestone Jan 7, 2018
@jbrockmendel
Copy link
Member Author

these should all be timedelta

I'm fine with this, but it merits double-checking because the current behavior is specifically tested:

    tdi = TimedeltaIndex(['1 day', '2 day'], name='x')
    exp = DatetimeIndex([NaT, NaT], name='x')
    for (left, right) in [(NaT, tdi)]:
        tm.assert_index_equal(left + right, exp)
        tm.assert_index_equal(right + left, exp)
        tm.assert_index_equal(left - right, exp)
        tm.assert_index_equal(right - left, exp)

The subtraction is clearly wrong as is, but if the default is to view NaT as a datetime, then pd.NaT + tdi should be a DatetimeIndex, in which case tdi + pd.NaT probably should too.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2018

as I said it is inconsistent, so let's change it to make it correct

@jbrockmendel
Copy link
Member Author

I've implemented the change in #19139, but really want to make sure we're on the same page that there is no way to be entirely internally consistent because pd.NaT is quacks like either a datetime or a timedelta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants