Skip to content

Make Series[datetime64] - pd.NaT behave like DatetimeIndex - pd.NaT #18960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.23.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ Other API Changes
- In :func:`read_excel`, the ``comment`` argument is now exposed as a named parameter (:issue:`18735`)
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
- The options ``html.border`` and ``mode.use_inf_as_null`` were deprecated in prior versions, these will now show ``FutureWarning`` rather than a ``DeprecationWarning`` (:issue:`19003`)
- Subtracting ``NaT`` from a :class:`Series` with ``dtype='datetime64[ns]'`` returns a ``Series`` with ``dtype='timedelta64[ns]'`` instead of ``dtype='datetime64[ns]'``(:issue:`18808`)

.. _whatsnew_0230.deprecations:

Expand Down
18 changes: 14 additions & 4 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,8 +407,12 @@ def _validate_datetime(self, lvalues, rvalues, name):

# if tz's must be equal (same or None)
if getattr(lvalues, 'tz', None) != getattr(rvalues, 'tz', None):
raise ValueError("Incompatible tz's on datetime subtraction "
"ops")
if len(rvalues) == 1 and isna(rvalues).all():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this matter if it is exactly len == 1, seems odd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue at hand is that pd.NaT is special because it can play the role of a datetime or a timedelta depending on context. _TimeOp wraps scalar inputs in an array that becomes rvalues. But at a basic level here what we are interested in is "was the original right argument pd.NaT?" What this is doing (and the same you're asking me to do below) is to undo wrapping done by _TimeOp. It is a much less clear way of checking "is the argument pd.NaT?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't answer the question, whether I have 1 NaT or all NaT is immaterial, I cann't use the rhs dtype as its ambiguous and must use the left.

# NaT gets a pass
pass
else:
raise ValueError("Incompatible tz's on datetime "
"subtraction ops", rvalues)

else:
raise TypeError('cannot operate on a series without a rhs '
Expand Down Expand Up @@ -505,12 +509,18 @@ def _convert_to_array(self, values, name=None, other=None):
inferred_type = lib.infer_dtype(values)
if (inferred_type in ('datetime64', 'datetime', 'date', 'time') or
is_datetimetz(inferred_type)):

# if we have a other of timedelta, but use pd.NaT here we
# we are in the wrong path
if (supplied_dtype is None and other is not None and
(other.dtype in ('timedelta64[ns]', 'datetime64[ns]')) and
(other.dtype in ('timedelta64[ns]', 'datetime64[ns]')) and
isna(values).all()):
values = np.empty(values.shape, dtype='timedelta64[ns]')
if len(values) == 1 and other.dtype == 'timedelta64[ns]':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not simpler at all.

values = np.empty(values.shape, dtye=other.dtype)
values[:] = iNaT

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, hence the sarcasm quotes and the simpler original implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, even then this is wrong because it isn't conditioning on the op being sub. Gonna revert back to how it was before this commit.

values = np.empty(values.shape, dtype='timedelta64[ns]')
elif len(values) == 1 and other.dtype == 'datetime64[ns]':
values = np.empty(values.shape, dtype='datetime64[ns]')
else:
values = np.empty(values.shape, dtype='timedelta64[ns]')
values[:] = iNaT

# a datelike
Expand Down
24 changes: 21 additions & 3 deletions pandas/tests/series/test_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -960,6 +960,13 @@ def test_timedelta64_ops_nat(self):
assert_series_equal(timedelta_series / nan,
nat_series_dtype_timedelta)

def test_td64_sub_NaT(self):
# GH#18808
ser = Series([NaT, Timedelta('1s')])
res = ser - NaT
expected = Series([NaT, NaT], dtype='timedelta64[ns]')
tm.assert_series_equal(res, expected)

@pytest.mark.parametrize('scalar_td', [timedelta(minutes=5, seconds=4),
Timedelta(minutes=5, seconds=4),
Timedelta('5m4s').to_timedelta64()])
Expand Down Expand Up @@ -1224,13 +1231,10 @@ def test_datetime64_ops_nat(self):
single_nat_dtype_datetime = Series([NaT], dtype='datetime64[ns]')

# subtraction
assert_series_equal(datetime_series - NaT, nat_series_dtype_timestamp)
assert_series_equal(-NaT + datetime_series, nat_series_dtype_timestamp)
with pytest.raises(TypeError):
-single_nat_dtype_datetime + datetime_series

assert_series_equal(nat_series_dtype_timestamp - NaT,
nat_series_dtype_timestamp)
assert_series_equal(-NaT + nat_series_dtype_timestamp,
nat_series_dtype_timestamp)
with pytest.raises(TypeError):
Expand Down Expand Up @@ -1263,6 +1267,20 @@ def test_datetime64_ops_nat(self):
with pytest.raises(TypeError):
nat_series_dtype_timestamp / 1

def test_dt64_sub_NaT(self):
# GH#18808
dti = pd.DatetimeIndex([pd.NaT, pd.Timestamp('19900315')])
ser = pd.Series(dti)
res = ser - pd.NaT
expected = pd.Series([pd.NaT, pd.NaT], dtype='timedelta64[ns]')
tm.assert_series_equal(res, expected)

dti_tz = dti.tz_localize('Asia/Tokyo')
ser_tz = pd.Series(dti_tz)
res = ser_tz - pd.NaT
expected = pd.Series([pd.NaT, pd.NaT], dtype='timedelta64[ns]')
tm.assert_series_equal(res, expected)


class TestSeriesOperators(TestData):
def test_op_method(self):
Expand Down