Skip to content

API: IntervalIndex.get_indexer not strict about passed target values dtype #47772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jul 17, 2022 · 2 comments · Fixed by #54919 or #54964
Closed
Labels
cut cut, qcut Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type

Comments

@jorisvandenbossche
Copy link
Member

Consider the following example of IntervalIndex with datetime64 subdtype:

In [40]: iidx = pd.IntervalIndex.from_breaks(pd.date_range("2018-01-01", periods=4))

In [41]: iidx
Out[41]: IntervalIndex([(2018-01-01, 2018-01-02], (2018-01-02, 2018-01-03], (2018-01-03, 2018-01-04]], dtype='interval[datetime64[ns], right]')

In [42]: iidx.get_indexer([pd.Timestamp("2018-01-02")])
Out[42]: array([0])

In [43]: iidx.get_indexer(["2018-01-02"])
Out[43]: array([0])

In [44]: iidx.get_indexer([pd.Timestamp("2018-01-02").value])
Out[44]: array([0])

(the above is with pandas 1.3.5, on 1.4 / main, the first two still work, but the last one not anymore)

Being able to index with strings (in addition to Timestamp / datetime64 values) is probably expected? (since that also seems to work like that for DatetimeIndex)
But we shouldn't accept integer values, I think? (this could also be deprecated first, since it also impacts behaviour of .loc indexing)

This last case was changed (unintentionally I think, given there were no tests) in #47771, and I am changing this back in #47771 to fix a cut regression (and implicitly also restoring the get_indexer behaviour).

If we want to remove this again (or deprecate first), we have to change the logic inside cut a bit to ensure we pass correctly dtyped values to IntervalIndex.get_indexer (see explanation in top post of #47771 for context)

@jorisvandenbossche jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type cut cut, qcut labels Jul 17, 2022
@jorisvandenbossche jorisvandenbossche added the Deprecate Functionality to remove in pandas label Aug 12, 2022
@jbrockmendel
Copy link
Member

Agreed that [44] should raise (after deprecation cycle).

The string case [43] is trickier since it potentially involves 2 layers of "partial indexing". Probably not worth sweating though.

@jbrockmendel
Copy link
Member

#54919 was preliminary to a PR that closes this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cut cut, qcut Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type
Projects
None yet
2 participants