-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF/PERF: MultiIndex.get_locs to use boolean arrays internally #46330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/PERF: MultiIndex.get_locs to use boolean arrays internally #46330
Conversation
# if we have a provided indexer, then this need not consider | ||
# the entire labels set | ||
if step is not None and step < 0: | ||
# Switch elements for negative step size | ||
start, stop = stop - 1, start - 1 | ||
r = np.arange(start, stop, step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an explanation (similar to the below) say around L3160, e.g. for a future reader to understand what this algorithm is doing.
wow. cc @mroeschke @phofl @jbrockmendel if comments. |
yep needs a rebase :-> |
rebased this one |
doc/source/whatsnew/v1.5.0.rst
Outdated
@@ -310,7 +310,7 @@ Performance improvements | |||
- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`) | |||
- Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`) | |||
- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions when only a single group exists (:issue:`44977`) | |||
- Performance improvement in :meth:`MultiIndex.get_locs` (:issue:`45681`, :issue:`46040`) | |||
- Performance improvement in :meth:`MultiIndex.get_locs` (:issue:`45681`, :issue:`46040`, :issue:`46330`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most users dont use get_locs directly; is there a more user-facing description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
Performance improvement in :meth:DataFrame.loc
and :meth:Series.loc
for tuple-based indexing of a :class:MultiIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks
LGTM |
can you merge master once again |
@jreback - merged main and greenish. I don't think the error is related as I see it showing up in other PRs as well |
) | ||
indexer &= lvl_indexer | ||
if not np.any(indexer) and np.any(lvl_indexer): | ||
raise KeyError(seq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this hit by tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, covered by test_loc.py > test_missing_key_combination
thanks @lukemanley |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Use boolean arrays internally within
MultiIndex.get_locs
rather than int64 indexes. Logical operations show performance improvements over intersecting int64 indexes. The output remains an integer positional indexer.