TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

jbrockmendel · 2018-01-07T20:17:19Z

Analogous to (#18808)

Adapted from tests.indexes.timedeltas.test_arithmetic.TestTimedeltaIndexArithmetic.test_timedelta_ops_with_missing_values

tdi = pd.TimedeltaIndex(['00:00:01'])
ser = pd.Series(tdi)

>>> tdi + pd.NaT
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)
>>> tdi - pd.NaT
DatetimeIndex(['NaT'], dtype='datetime64[ns]', freq=None)

>>> ser + pd.NaT
0   NaT
dtype: timedelta64[ns]
>>> ser - pd.NaT
0   NaT
dtype: timedelta64[ns]

i.e. Series[timedelta64] treats pd.NaT as a timedelta, while TimedeltaIndex treats pd.NaT as a datetime (in a way that makes sense for addition but not for subtraction).

Expected Behavior
Internal consistency in the sense that:

(tdi - pd.NaT).dtype == (ser - pd.NaT).dtype
(tdi + pd.NaT).dtype == (ser + pd.NaT).dtype

For tdi - pd.NaT it only makes sense for the result to be timedelta64, so the TimedeltaIndex behavior is wrong.

For tdi + pd.NaT a reasonable argument can be made for the result being either timedelta64 or datetime64. Returning timedelta64 maintains consistency with TimedeltaIndex.__sub__, while datetime64 maintains consistency with TimedeltaIndex.__radd__. All in all I think keeping addition commutative is the more important of the two.

The text was updated successfully, but these errors were encountered:

jreback · 2018-01-07T21:04:00Z

these should all be timedelta

jbrockmendel · 2018-01-08T01:49:06Z

these should all be timedelta

I'm fine with this, but it merits double-checking because the current behavior is specifically tested:

    tdi = TimedeltaIndex(['1 day', '2 day'], name='x')
    exp = DatetimeIndex([NaT, NaT], name='x')
    for (left, right) in [(NaT, tdi)]:
        tm.assert_index_equal(left + right, exp)
        tm.assert_index_equal(right + left, exp)
        tm.assert_index_equal(left - right, exp)
        tm.assert_index_equal(right - left, exp)

The subtraction is clearly wrong as is, but if the default is to view NaT as a datetime, then pd.NaT + tdi should be a DatetimeIndex, in which case tdi + pd.NaT probably should too.

jreback · 2018-01-09T11:46:56Z

as I said it is inconsistent, so let's change it to make it correct

jbrockmendel · 2018-01-09T16:54:41Z

I've implemented the change in #19139, but really want to make sure we're on the same page that there is no way to be entirely internally consistent because pd.NaT is quacks like either a datetime or a timedelta.

jreback added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timedelta Timedelta data type labels Jan 7, 2018

jreback added this to the 0.23.0 milestone Jan 7, 2018

jbrockmendel mentioned this issue Jan 8, 2018

Make TimedeltaIndex +/- pd.NaT return TimedeltaIndex #19139

Merged

4 tasks

jbrockmendel mentioned this issue Jan 9, 2018

Separate pd2.NaT for datetime vs timedelta wesm/pandas2#74

Open

jreback closed this as completed in #19139 Jan 16, 2018

shoyer mentioned this issue Jan 28, 2019

Separate NaT values for Timedelta ("NaTD") and Period? #24983

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

jbrockmendel commented Jan 7, 2018

jreback commented Jan 7, 2018

jbrockmendel commented Jan 8, 2018

jreback commented Jan 9, 2018

jbrockmendel commented Jan 9, 2018

TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

TimedeltaIndex +/- NaT inconsistency with Series[timedelta64]+/- NaT #19124

Comments

jbrockmendel commented Jan 7, 2018

jreback commented Jan 7, 2018

jbrockmendel commented Jan 8, 2018

jreback commented Jan 9, 2018

jbrockmendel commented Jan 9, 2018