Skip to content

BUG/inconsistency: IntervalIndex.get_loc gives a location for non-exact inputs #19349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
topper-123 opened this issue Jan 22, 2018 · 3 comments
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type

Comments

@topper-123
Copy link
Contributor

topper-123 commented Jan 22, 2018

This one is a bit complex to explain, but I'll do my best.

Currently IntervalIndex.get_indexer fails if the other index doesn't contain Interval only (there's also another bug, but let's keep it simple here).

The underlying issue is that IntervalIndex.get_indexer depends on IntervalIndex.get_loc which is ambigous for how it treats number inputs:

>> ii = pd.IntervalIndex.from_breaks([0,1,2,3])
>> ii.get_loc(pd.Interval(1, 2))
1  # ok
>> ii.get_loc(1)  # do we mean exactly 1, or if an interval contains the number 1?
1  # ambigous

The issue is that get_loc returns the location for both exact matches and inexact matches (i.e. if the number input is in an interval). For the purposes of get_indexer however, this behavious fails, as get_indexer needs get_loc to find exact matches only.

See #19021 (comment) for further discussion.

Solution

A solution could be adding a 'strict' option to the method parameter of IntervalIndex.get_loc.

This wasn't so difficult after all, and I've already made a PR on this, see #19353

@topper-123 topper-123 changed the title BUG/inconsistency: IntervalIndex.get_loc gives a location for non-interval inputs BUG/inconsistency: IntervalIndex.get_loc gives a location for non-exact inputs Jan 22, 2018
@gfyoung gfyoung added the Interval Interval data type label Jan 24, 2018
@jreback jreback added this to the 0.23.0 milestone Feb 10, 2018
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design labels Feb 10, 2018
@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019
@jorisvandenbossche
Copy link
Member

cc @jschendel is this covered by the PR?

@jorisvandenbossche
Copy link
Member

Ah, no, it is about scalars passed to get_loc, I don't think anything changed for that?

@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves and removed Index Related to the Index class or subclasses labels Feb 22, 2020
@mroeschke mroeschke added the Bug label Apr 26, 2020
@jbrockmendel
Copy link
Member

I see this as analogous to how we treat strings in DatetimeIndex.get_loc. Is this still a problem?

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Indexing Related to indexing on series/frames, not to indexes themselves Interval Interval data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants