Skip to content

BUG/PERF: MultiIndex setops with sort=None #49010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 12, 2022

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Oct 8, 2022

Simplify and improve perf in algos.safe_sort which improves perf in MultiIndex setops when sort=None.

           before           after         ratio
     [55dc3243]       [f00149ce]
     <main>           <safe-sort-multiindex>
-        73.3±2ms       20.2±0.4ms     0.28  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'intersection', None)
-        71.0±2ms       19.0±0.6ms     0.27  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'intersection', None)
-       104±0.7ms         24.0±3ms     0.23  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'intersection', None)
-        98.7±2ms       21.4±0.3ms     0.22  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'intersection', None)
-       114±0.5ms       24.7±0.4ms     0.22  multiindex_object.SetOperations.time_operation('non_monotonic', 'ea_int', 'intersection', None)
-       109±0.7ms         21.3±2ms     0.20  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'intersection', None)
-        99.3±1ms         18.5±2ms     0.19  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'intersection', None)
-         157±6ms         21.4±2ms     0.14  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'intersection', None)

I updated the expected value in test_union_nan_got_duplicated which was added in #38977.

MultiIndex.union sorts by default but the expected value in that test was not sorted.

For reference, here are the docs for the sort parameter:

sort : bool or None, default None
Whether to sort the resulting Index.
* None : Sort the result, except when
1. `self` and `other` are equal.
2. `self` or `other` has length 0.
3. Some values in `self` or `other` cannot be compared.
A RuntimeWarning is issued in this case.

@mroeschke mroeschke added Performance Memory or execution speed performance MultiIndex setops union, intersection, difference, symmetric_difference labels Oct 10, 2022
@lukemanley lukemanley added the Bug label Oct 10, 2022
@lukemanley lukemanley changed the title PERF: MultiIndex setops with sort=None BUG/PERF: MultiIndex setops with sort=None Oct 10, 2022
@mroeschke mroeschke added this to the 1.6 milestone Oct 11, 2022
@mroeschke mroeschke merged commit c6cf37a into pandas-dev:main Oct 12, 2022
@mroeschke
Copy link
Member

Thanks @lukemanley

@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
@lukemanley lukemanley deleted the safe-sort-multiindex branch October 26, 2022 10:18
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
* perf: algos.safe_sort with multiindex

* add sort to multiindex setop asv

* fix asv

* whatsnew

* update test_union_nan_got_duplicated

* add test for sort bug

* parameterize dtype in test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Performance Memory or execution speed performance setops union, intersection, difference, symmetric_difference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants