Skip to content

BUG: Slicing non monotonic DatetimeIndex did not raise for non-existing keys #37796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` returned requested key plus missing values when ``loc`` was applied to single level from :class:`MultiIndex` (:issue:`27104`)
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`CategoricalIndex` using a listlike indexer containing NA values (:issue:`37722`)
- Bug in :meth:`DataFrame.xs` ignored ``droplevel=False`` for columns (:issue:`19056`)
- Bug in :meth:`DataFrame.loc` and :meth:`Series.loc` did not raise ``KeyError`` when non-existing label was sliced in unordered :class:`DatetimeIndex` (:issue:`18531`)

Missing
^^^^^^^
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -810,7 +810,8 @@ def slice_indexer(self, start=None, end=None, step=None, kind=None):
if end is not None:
end_casted = self._maybe_cast_slice_bound(end, "right", kind)
mask = (self <= end_casted) & mask

if not any(mask):
raise
indexer = mask.nonzero()[0][::step]
if len(indexer) == len(self):
return slice(None)
Expand Down
19 changes: 11 additions & 8 deletions pandas/tests/indexes/datetimes/test_partial_slicing.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,19 +311,22 @@ def test_partial_slicing_with_multiindex(self):
tm.assert_frame_equal(result, expected)

def test_partial_slice_doesnt_require_monotonicity(self):
# For historical reasons.
s = Series(np.arange(10), date_range("2014-01-01", periods=10))
ser = Series(np.arange(10), date_range("2014-01-01", periods=10))

nonmonotonic = s[[3, 5, 4]]
expected = nonmonotonic.iloc[:0]
nonmonotonic = ser[[3, 5, 4]]
timestamp = Timestamp("2014-01-10")
msg = r"Timestamp\('2014-01-10 00:00:00'\)"

tm.assert_series_equal(nonmonotonic["2014-01-10":], expected)
with pytest.raises(KeyError, match=r"Timestamp\('2014-01-10 00:00:00'\)"):
with pytest.raises(KeyError, match=msg):
nonmonotonic["2014-01-10":]

with pytest.raises(KeyError, match=msg):
nonmonotonic[timestamp:]

tm.assert_series_equal(nonmonotonic.loc["2014-01-10":], expected)
with pytest.raises(KeyError, match=r"Timestamp\('2014-01-10 00:00:00'\)"):
with pytest.raises(KeyError, match=msg):
nonmonotonic.loc["2014-01-10":]

with pytest.raises(KeyError, match=msg):
nonmonotonic.loc[timestamp:]

def test_loc_datetime_length_one(self):
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/indexing/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1564,6 +1564,19 @@ def test_loc_getitem_slice_label_td64obj(self, start, stop, expected_slice):
expected = ser.iloc[expected_slice]
tm.assert_series_equal(result, expected)

def test_loc_getitem_slice_unordered_dt_index(self, frame_or_series):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the str index example from the OP as well. ping on green.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly enough this does not raise any more. I will open an issue for this. We have to define if slice_locs should raise too or if we should doo this after calling the function. Is not related to the DateTime fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# GH#18531
obj = frame_or_series(
[1, 2, 3],
index=[Timestamp("2017"), Timestamp("2019"), Timestamp("2018")],
)
with pytest.raises(KeyError, match=r"Timestamp\('2020-01-01 00:00:00'\)"):
obj.loc["2020":"2022"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add an example where the label falls within the range of the Index?
Eg with the index being [2016, 2019, 2017] and then the slice "2018":"2020" or so

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, because of

# For historical reasons DatetimeIndex by default supports

Should we remove or deprecate? Harder fix than the previous one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i guess we could try to deprecate, @jorisvandenbossche you feel strongly here?


obj.index = ["a", "c", "b"]
with pytest.raises(KeyError, match=r"d"):
obj.loc["d":"e"]


class TestLocBooleanMask:
def test_loc_setitem_bool_mask_timedeltaindex(self):
Expand Down