Skip to content

PERF: skip libjoin fastpath for MultiIndex #54765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 28, 2023

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Aug 26, 2023

Addresses #54646 for MultiIndex but leaves RangeIndex as is. The RangeIndex case
seems to have a few side effects - I'll add comments separately to #54646 for RangeIndex.

Also, simplifies the logic in Index.join which improves perf for a few MultiIndex and non-MultiIndex cases.

multiindex_object benchmarks:

       before           after         ratio
-      18.7±0.2ms       16.6±0.4ms     0.89  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'symmetric_difference', False)
-      19.0±0.7ms       16.5±0.2ms     0.87  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'intersection', False)
-       60.6±10ms         49.8±1ms     0.82  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'intersection', None)
-        40.1±1ms       32.2±0.4ms     0.80  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'intersection', None)

join_merge benchmarks:

       before           after         ratio
-      9.98±0.2ms       8.99±0.1ms     0.90  join_merge.ConcatIndexDtype.time_concat_series('Int64', 'has_na', 1, True)
-      12.2±0.3ms      10.5±0.05ms     0.86  join_merge.ConcatIndexDtype.time_concat_series('int64[pyarrow]', 'has_na', 1, True)
-        13.9±2ms       11.8±0.2ms     0.85  join_merge.ConcatIndexDtype.time_concat_series('datetime64[ns]', 'has_na', 1, False)
-        26.3±3ms      22.3±0.05ms     0.85  join_merge.ConcatIndexDtype.time_concat_series('string[python]', 'monotonic', 1, False)
-        6.50±1ms      5.19±0.05ms     0.80  join_merge.ConcatIndexDtype.time_concat_series('datetime64[ns]', 'monotonic', 1, False)
-         133±2ms         51.1±2ms     0.38  join_merge.Align.time_series_align_int64_index
-       105±0.9ms       23.7±0.5ms     0.22  join_merge.Align.time_series_align_left_monotonic

@lukemanley lukemanley added Performance Memory or execution speed performance MultiIndex labels Aug 26, 2023
@mroeschke mroeschke added this to the 2.2 milestone Aug 28, 2023
@mroeschke mroeschke merged commit 01c3e8b into pandas-dev:main Aug 28, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

@lukemanley lukemanley deleted the mi-can-use-libjoin branch September 6, 2023 00:53
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Sep 11, 2023
* PERF: skip libjoin fastpath for MultiIndex

* fix levels sort
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants