-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: MultiIndex.union #48752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: MultiIndex.union #48752
Conversation
pandas/core/indexes/multi.py
Outdated
right_missing = lib.fast_unique_multiple(self._values, rvals) | ||
if right_missing: | ||
result = self.append(other.take(right_missing)) | ||
right_missing = other.difference(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this behave with duplicates and especially with Indexes that contain tuples?
Apart from that, I would prefer not doing that now. I have a different pr that I would prefer merging first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicates shouldn’t reach this given the if else block this is within (other is always unique). Not sure about tuples but I can take a look. Or I can just close it if you have something in the works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really in the works yet, but I would prefer optimising the duplicates in other case before we optimise here. This is more needed (or you could look into that if you like, I suspect that we have to push that into the cython layer to get an acceptable performance improvement)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll take a look at the duplicates. I'll leave this open a bit longer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phofl, are you open to this one now that the duplicates case has been optimized a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase now?
thx @lukemanley |
doc/source/whatsnew/v1.6.0.rst
file if fixing a bug or adding a new feature.cc @phofl - following your improvements to MultiIndex.difference, it looks like we can replace fast_unique_multiple.