-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: MultiIndex.get_indexer #43370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: MultiIndex.get_indexer #43370
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm typing requests for followon
@@ -603,35 +603,34 @@ cdef class BaseMultiIndexCodesEngine: | |||
def _codes_to_ints(self, ndarray[uint64_t] codes) -> np.ndarray: | |||
raise NotImplementedError("Implemented by subclass") | |||
|
|||
def _extract_level_codes(self, ndarray[object] target) -> np.ndarray: | |||
def _extract_level_codes(self, target) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type in followon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah mentioned in OP plans for follow-on to restore common signature/typing
return self._codes_to_ints(np.array(level_codes, dtype='uint64').T) | ||
|
||
def get_indexer(self, ndarray[object] target) -> np.ndarray: | ||
def get_indexer(self, target) -> np.ndarray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type in followon
def get_indexer_non_unique(self, ndarray[object] target): | ||
|
||
def get_indexer_non_unique(self, target): | ||
# target: MultiIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type if possile
@@ -3619,7 +3619,12 @@ def _get_indexer( | |||
elif method == "nearest": | |||
indexer = self._get_nearest_indexer(target, limit, tolerance) | |||
else: | |||
indexer = self._engine.get_indexer(target._get_engine_target()) | |||
tgt_values = target._get_engine_target() | |||
if target._is_multi and self._is_multi: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth a comment here and L5453 that this is for perf as not super obvious
lgtm but if you can add some comments, ping on green |
ok fair enough all of this is fine for a followon. |
also since this is a huge speedup this migth improve some concat / join asvs for multi-multi (if so that is worth a whatsnew note0 |
Avoids expensive construction of MultiIndex.values, followed by expensive casting of sequence-of-tuples back to tuple-of-Indexes.
Follow-up will move _extract_level_codes out of cython so that Engine._get_indexer and Engine._get_indexer_non_unique are restored to having consistent signatures.