-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: join MultiIndex with overlapping IntervalIndex level (#44096) #44588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jreback
merged 1 commit into
pandas-dev:master
from
boschresearch:bugfix/join_multiindex-overlapping-intervals-44096
Nov 25, 2021
Merged
BUG: join MultiIndex with overlapping IntervalIndex level (#44096) #44588
jreback
merged 1 commit into
pandas-dev:master
from
boschresearch:bugfix/join_multiindex-overlapping-intervals-44096
Nov 25, 2021
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
phofl
reviewed
Nov 23, 2021
5e3a141
to
8f1027f
Compare
phofl
reviewed
Nov 23, 2021
phofl
reviewed
Nov 23, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment, otherwise lgtm
8f1027f
to
7939a68
Compare
phofl
approved these changes
Nov 24, 2021
@johannes-mueller can you merge master and ping on green |
…#44096) `BaseMultiIndexCodesEngine._extract_level_codes` relies on the assumption that `get_indexer()` is always usable as any index level `is_unique`. However, that is not the case for overlapping `IntervalIndex` levels. `get_indexer()` can not be used on an `IntervalIndex` that `is_overlapping` even if it `is_unique`. This patch uses `get_indexer_for()` instead.
7939a68
to
83d9ab7
Compare
@jreback rebased |
thanks @johannes-mueller |
johannes-mueller
added a commit
to boschresearch/pandas
that referenced
this pull request
Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as `IntervalIndex`es can be unique and overlapping. Similar to pandas-dev#44588
johannes-mueller
added a commit
to boschresearch/pandas
that referenced
this pull request
Jan 27, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as `IntervalIndex`es can be unique and overlapping. Similar to pandas-dev#44588
4 tasks
johannes-mueller
added a commit
to boschresearch/pandas
that referenced
this pull request
Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as `IntervalIndex`es can be unique and overlapping. Similar to pandas-dev#44588
johannes-mueller
added a commit
to boschresearch/pandas
that referenced
this pull request
Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as `IntervalIndex`es can be unique and overlapping. Similar to pandas-dev#44588
johannes-mueller
added a commit
to boschresearch/pandas
that referenced
this pull request
Jan 28, 2022
Replacing calls to `get_indexer()` with `get_indexer_for()` as `IntervalIndex`es can be unique and overlapping. Similar to pandas-dev#44588
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BaseMultiIndexCodesEngine._extract_level_codes
relies on the assumption thatget_indexer()
is always usable as any index levelis_unique
. However, thatis not the case for overlapping
IntervalIndex
levels.get_indexer()
cannot be used on an
IntervalIndex
thatis_overlapping
even if itis_unique
.This patch uses
get_indexer_for()
instead.This patch checks if the index level is an overlappingIntervalIndex
and thencalls
get_indexer_non_unique()
instead.We should not switch toget_indexer_non_unique()
for other index levels thanoverlapping
IntervalIndex
, even though that actually would work as well.That is because
get_indexer_non_unique()
comes with a performance penalty ofroughly 30% compared to
get_indexer()
, measured by simple%timeit
measurements.