Skip to content

REGR: fix pd.cut with datetime IntervalIndex as bins #47771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in taking NULL :class:`objects` from a :class:`DataFrame` causing a segmentation violation. These NULL values are created by :meth:`numpy.empty_like` (:issue:`46848`)
- Fixed regression in :func:`concat` materializing :class:`Index` during sorting even if :class:`Index` was already sorted (:issue:`47501`)
- Fixed regression in :func:`cut` using a ``datetime64`` IntervalIndex as bins (:issue:`46218`)
- Fixed regression in :meth:`DataFrame.loc` not updating the cache correctly after values were set (:issue:`47867`)
- Fixed regression in :meth:`DataFrame.loc` not aligning index in some cases when setting a :class:`DataFrame` (:issue:`47578`)
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`)
Expand Down
8 changes: 7 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3987,8 +3987,14 @@ def _should_partial_index(self, target: Index) -> bool:
Should we attempt partial-matching indexing?
"""
if is_interval_dtype(self.dtype):
if is_interval_dtype(target.dtype):
return False
# See https://github.com/pandas-dev/pandas/issues/47772 the commented
# out code can be restored (instead of hardcoding `return True`)
# once that issue if fixed
# "Index" has no attribute "left"
return self.left._should_compare(target) # type: ignore[attr-defined]
# return self.left._should_compare(target) # type: ignore[attr-defined]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this commented out line work in the future or is it leftover from investigating this fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so that ideally it gets restored in the future. I added a comment referencing the issue I had opened about that

return True
return False

@final
Expand Down
15 changes: 15 additions & 0 deletions pandas/tests/indexes/interval/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from pandas import (
NA,
CategoricalIndex,
DatetimeIndex,
Index,
Interval,
IntervalIndex,
Expand Down Expand Up @@ -316,6 +317,20 @@ def test_get_indexer_categorical_with_nans(self):
expected = np.array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4], dtype=np.intp)
tm.assert_numpy_array_equal(result, expected)

def test_get_indexer_datetime(self):
ii = IntervalIndex.from_breaks(date_range("2018-01-01", periods=4))
result = ii.get_indexer(DatetimeIndex(["2018-01-02"]))
expected = np.array([0], dtype=np.intp)
tm.assert_numpy_array_equal(result, expected)

result = ii.get_indexer(DatetimeIndex(["2018-01-02"]).astype(str))
tm.assert_numpy_array_equal(result, expected)

# TODO this should probably be deprecated?
# https://github.com/pandas-dev/pandas/issues/47772
result = ii.get_indexer(DatetimeIndex(["2018-01-02"]).asi8)
tm.assert_numpy_array_equal(result, expected)

@pytest.mark.parametrize(
"tuples, inclusive",
[
Expand Down
10 changes: 10 additions & 0 deletions pandas/tests/reshape/test_cut.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
Timestamp,
cut,
date_range,
interval_range,
isna,
qcut,
timedelta_range,
Expand Down Expand Up @@ -739,3 +740,12 @@ def test_cut_with_timestamp_tuple_labels():

expected = Categorical.from_codes([0, 1, 2], labels, ordered=True)
tm.assert_categorical_equal(result, expected)


def test_cut_bins_datetime_intervalindex():
# https://github.com/pandas-dev/pandas/issues/46218
bins = interval_range(Timestamp("2022-02-25"), Timestamp("2022-02-27"), freq="1D")
# passing Series instead of list is important to trigger bug
result = cut(Series([Timestamp("2022-02-26")]), bins=bins)
expected = Categorical.from_codes([0], bins, ordered=True)
tm.assert_categorical_equal(result.array, expected)