Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[PERF] Get rid of MultiIndex conversion in IntervalIndex.intersection #26225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Get rid of MultiIndex conversion in IntervalIndex.intersection #26225
Changes from 31 commits
a5a1272
3cd095a
0486a4e
09c89f1
841a0b7
8b22623
32d4005
d502fcb
8ec6366
6000904
7cb7d2c
bcf36bb
745c0bb
ff8bb97
17d775f
03a989a
b36cbc8
1cdb170
18c2d37
d229677
35594b0
0834206
3cf5be8
9cf9b7e
ab67edd
402b09c
b4f130d
3ff4c64
3db3130
1f25adb
4a9cd29
1467e94
ea2550a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be an issue with this approach when dupes are present in
self
andother
. For other index types, such a scenario can result in more dupes being present in the intersection than inself
. This behavior looks a bit buggy and inconsistent though, so I'm not sure if we actually wantIntervalIndex
to be consistent with it.Some examples of the buggy and inconsistent behavior with
Index
:It seems strange that
[3]
has more dupes present than in either original index but[4]
does not. Similarly, it seems like[5]
and[6]
should be identical, as the presence of a non-intersecting element shouldn't impact the number of dupes returned.@jreback : Do you know what the expected behavior for
intersection
with dupes should be? Or if there are any dependencies on the behavior ofintersection
that would dictate this?If we treat indexes like multisets, then the intersection should contain the minimum multiplicity of dupes, e.g.
idx2.intersection(idx3)
andidx3.intersection(idx2)
should both have length 2, so you maintain the property of the intersection being a subset of the original indexes.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is weird as these are set ops
what happens (meaning how much breakage) if
prob need to do this for all set ops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't had time to extensively test this out but I made the two changes you suggested in
indexes/base.py
forintersection
and both resulted in some breakage. Aside some from breakage in the index set ops tests, there was also some breakage intests/reshape/test_merge.py
.