Skip to content

PERF: MultiIndex.values for MI's with DatetimeIndex, TimedeltaIndex, or ExtensionDtype levels #46288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 15, 2022

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Mar 9, 2022

The change is to do the boxing of these types on the distinct level values and then call .take rather than calling .take and then having to box a potentially much larger array.

The impact is most pronounced when there is a large difference between the number of rows in the index and the number of unique values (e.g. dates) in a given level of the index.

import pandas as pd
import numpy as np

mi = pd.MultiIndex.from_product(
    [ 
        pd.array(np.arange(10000), dtype="Int64"),
        pd.date_range('2000-01-01', periods=1000),
    ]
)

%timeit mi.copy().values

6.63 s ± 212 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- main
775 ms ± 6.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <- PR

Existing asvs:

$ asv continuous -f 1.1 upstream/main multiindex-values -b multiindex_object

       before           after         ratio
     <main>           <multiindex-values>
-        32.0±2ms       28.6±0.3ms     0.89  multiindex_object.Duplicated.time_duplicated
-         215±3ms          151±5ms     0.70  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'intersection')
-       347±0.8ms          226±3ms     0.65  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'intersection')
-         323±3ms          208±2ms     0.64  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'union')
-         326±5ms          207±5ms     0.64  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'union')
-         139±1ms         32.4±2ms     0.23  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'symmetric_difference')
-         140±2ms       32.3±0.8ms     0.23  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'symmetric_difference')
-        65.6±3ms       8.79±0.2ms     0.13  multiindex_object.Values.time_datetime_level_values_copy

@lukemanley lukemanley added MultiIndex Performance Memory or execution speed performance labels Mar 9, 2022
@jreback jreback added this to the 1.5 milestone Mar 9, 2022
@jreback jreback merged commit fb3e3e6 into pandas-dev:main Mar 15, 2022
@jreback
Copy link
Contributor

jreback commented Mar 15, 2022

thanks @lukemanley keep em coming!

@lukemanley lukemanley deleted the multiindex-values branch March 20, 2022 23:18
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants