Skip to content

PERF: avoid unnecessary reordering in MultiIndex._reorder_indexer #48384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 14, 2022

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Sep 4, 2022

MultiIndex._reorder_indexer contains logic to determine if indexer reordering is required. This PR expands the logic a bit to identify additional cases where reordering is not necessary.

ASVs:

       before           after         ratio
     [4b645ae5]       [07679a60]
     <main>           <multiindex-need-sort>
-        5.07±1ms          726±9μs     0.14  indexing.MultiIndexing.time_loc_all_bool_indexers(True)
-        8.13±2ms         709±20μs     0.09  indexing.MultiIndexing.time_loc_all_bool_indexers(False)

@lukemanley lukemanley added Performance Memory or execution speed performance MultiIndex labels Sep 4, 2022
@lukemanley lukemanley changed the title PREF: avoid unnecessary reordering in MultiIndex._reorder_indexer PERF: avoid unnecessary reordering in MultiIndex._reorder_indexer Sep 4, 2022
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lukemanley, looks good to me.

Can you merge master into your branch? I think CI problems should be fixed now.

# check if sorting is necessary
need_sort = False
for i, k in enumerate(seq):
if com.is_null_slice(k) or com.is_bool_indexer(k) or is_scalar(k):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have unit tests for all these new branches that short circuit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty there are tests that hit these in one way or another. Regardless, I just added test_get_locs_reordering that explicitly tests the ordering effects of: null slices, bool indexers, scalars, and lists of length 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks!

@mroeschke mroeschke added this to the 1.6 milestone Sep 13, 2022
@mroeschke mroeschke merged commit 4fb83b0 into pandas-dev:main Sep 14, 2022
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the multiindex-need-sort branch September 17, 2022 19:03
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
…ndas-dev#48384)

* Avoid unnecessary reordering in MultiIndex._reorder_indexer

* whatsnew

* mypy

* cleanup

* add multiindex.get_locs reordering tests

* fix test

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants