Skip to content

BUG: fix tz-aware DatetimeIndex + TimedeltaIndex (#17558) #17583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ Indexing
- Bug in ``CategoricalIndex`` reindexing in which specified indices containing duplicates were not being respected (:issue:`17323`)
- Bug in intersection of ``RangeIndex`` with negative step (:issue:`17296`)
- Bug in ``IntervalIndex`` where performing a scalar lookup fails for included right endpoints of non-overlapping monotonic decreasing indexes (:issue:`16417`, :issue:`17271`)
- Bug in adding ``DatetimeIndex`` with a ``TimedeltaIndex`` or a numpy array with ``dtype="timedelta64"`` (:issue:`17558`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct; the bug is only with a numpy array of dtype=timedelta64[ns] (timedelta64 is not a valid dtype for pandas)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "or" was a bit confusing, but there are two separate bugs:

  • adding tz-aware DatetimeIndex with TimedeltaIndex
  • adding tz-aware DatetimeIndex with numpy array which is timedelta64-dtype-like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how so? where does DTI + TDI tz-aware not work? your issues is w.r.t. np.array of timedeltas

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned it briefly in the issue ticket, although I neglected to post the full stack trace:

In [1]: import numpy as np, pandas as pd

In [2]: dti = pd.DatetimeIndex([pd.Timestamp("2017/01/01")]).tz_localize('US/Eastern')

In [3]: td_np = np.array([np.timedelta64(1, 'ns')]) 

In [4]: (dti + pd.TimedeltaIndex(td_np))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-20f28143c2bb> in <module>()
----> 1 (dti + pd.TimedeltaIndex(td_np))

/spare/local/azhu/venv-el6-20170811-stonelib/lib/python2.7/site-packages/pandas/tseries/base.pyc in __add__(self, other)
    617             from pandas.tseries.offsets import DateOffset
    618             if isinstance(other, TimedeltaIndex):
--> 619                 return self._add_delta(other)
    620             elif isinstance(self, TimedeltaIndex) and isinstance(other, Index):
    621                 if hasattr(other, '_add_delta'):

/spare/local/azhu/venv-el6-20170811-stonelib/lib/python2.7/site-packages/pandas/tseries/index.pyc in _add_delta(self, delta)
    787             new_values = self._add_delta_td(delta)
    788         elif isinstance(delta, TimedeltaIndex):
--> 789             new_values = self._add_delta_tdi(delta)
    790             # update name when delta is Index
    791             name = com._maybe_match_name(self, delta)

/spare/local/azhu/venv-el6-20170811-stonelib/lib/python2.7/site-packages/pandas/tseries/base.pyc in _add_delta_tdi(self, other)
    704             mask = (self._isnan) | (other._isnan)
    705             new_values[mask] = tslib.iNaT
--> 706         return new_values.view(self.dtype)
    707 
    708     def isin(self, values):

TypeError: data type not understood


I/O
^^^
Expand Down
12 changes: 11 additions & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
is_integer, is_float,
is_bool_dtype, _ensure_int64,
is_scalar, is_dtype_equal,
is_timedelta64_dtype, is_integer_dtype,
is_list_like)
from pandas.core.dtypes.generic import (
ABCIndex, ABCSeries,
Expand Down Expand Up @@ -651,6 +652,15 @@ def __add__(self, other):
raise TypeError("cannot add {typ1} and {typ2}"
.format(typ1=type(self).__name__,
typ2=type(other).__name__))
elif isinstance(other, np.ndarray):
if is_timedelta64_dtype(other):
return self._add_delta(TimedeltaIndex(other))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can only accept timedelta64 here, integer is not valid either fro datetimelikes in add

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in another comment above, the integer case is being used by internals for PeriodIndex. Example: pandas/tseries/offsets.py:_beg_apply_index#L468 adds base_period (PeriodIndex) + roll (ndarray[dtype=int64]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we don't allow this user facting; this is ONLY an internal operation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I don't quite understand what you're getting at -- are you saying there's still something to be done here? I just added the integer branch to prevent aforementioned internally used operation from breaking. (since it returns NotImplemented as it currently does, it will fall through to ufunc implementation.) I can add a code comment to explain the intent?

elif is_integer_dtype(other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of this, we want to specifically define add in PeriodIndex which handles this case and calls super otherwise.

return NotImplemented
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test that hits this branch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but indirectly, without this branch it was breaking a couple of other tests that added a PeriodIndex and a np.array(dtype=int64). I didn't check what the behavior is there, but could add a more direct test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct test is certainly appreciated. Go for it!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked into the cases getting triggered with PeriodIndex and unfortunately they're buried pretty deeply in internals (e.g. not obvious this would get called from any public API, at least any more so than the tests that were already reaching this). Also @jreback commented that this should only be internal facing, so I figure its better not to "advertise" this API in a test.

else:
raise TypeError("cannot add {typ1} and np.ndarray[{typ2}]"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as with the NotImplemented. Let's make sure we have a test to hit this branch.

.format(typ1=type(self).__name__,
typ2=other.dtype))
elif isinstance(other, (DateOffset, timedelta, np.timedelta64,
Timedelta)):
return self._add_delta(other)
Expand Down Expand Up @@ -731,7 +741,7 @@ def _add_delta_tdi(self, other):
if self.hasnans or other.hasnans:
mask = (self._isnan) | (other._isnan)
new_values[mask] = iNaT
return new_values.view(self.dtype)
return new_values.view('i8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes adding tz-aware DatetimeIndex and TimedeltaIndex, since self.dtype here is a Datetime64TZDtype which causes new_values.view(..) to raise an exception. Would you prefer another solution, say checking if is_datetime64tz_dtype(self.dtype) first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes use is_datetime64tz_dtype(self.dtype) is the correct idiom. but you'll have to demonstrate this bug specifically

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted stack trace that hits this line above. So I can change this to

if is_datetime64tz_dtype(self.dtype):
  return new_values.view('i8')
return new_values.view(self.dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the not the right fix.

tz-aware need to be turned into local i8, added, then turned back into the original timezone via localization. normally we do add i8 directly, BUT you cannot do this with tz-aware, e.g. the days are shifted otherwise (imagine adding days), or don't work when you cross DST.

pandas.tseries.offsets already does all of this. I haven't stepped thru this code lately, but need to hit a similar path to that.


def isin(self, values):
"""
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/indexes/datetimes/test_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,20 @@ def test_add_dti_dti(self):
with pytest.raises(TypeError):
dti + dti_tz

def test_add_dti_ndarray(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test which raises for numpy arrays of incorrect type (e.g. not m8[ns]);

# GH 17558
# Check that tz-aware DatetimeIndex + np.array(dtype="timedelta64")
# and DatetimeIndex + TimedeltaIndex work as expected
dti = pd.DatetimeIndex([pd.Timestamp("2017/01/01")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can move this entire test to datetimelike.py; it will hit datetime, period, timedelta that way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should prob move more of these add tests there as well, but can do that in another PR

dti = dti.tz_localize('US/Eastern')
expected = pd.DatetimeIndex([pd.Timestamp("2017/01/01 01:00")])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : add newline above this one.

expected = expected.tz_localize('US/Eastern')

td_np = np.array([np.timedelta64(1, 'h')], dtype="timedelta64[ns]")
tm.assert_index_equal(dti + td_np, expected)
tm.assert_index_equal(dti + td_np[0], expected)
tm.assert_index_equal(dti + TimedeltaIndex(td_np), expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just for-loop these checks instead? Also, let's be a little more explicit and write it this way:

actual = dti + ...
tm.assert_index_equal(actual, expected)


def test_difference(self):
for tz in self.tz:
# diff
Expand Down