PERF: Avoid materializing entire IntervalIndex when using cut #27668

jschendel · 2019-07-31T03:46:08Z

When using cut with an IntervalIndex for bins the result of the cut is first materialized as an IntervalIndex and then converted to a Categorical:

pandas/pandas/core/reshape/tile.py

Lines 373 to 378 in 143bc34

    
           if isinstance(bins, IntervalIndex): 
        
               # we have a fast-path here 
        
               ids = bins.get_indexer(x) 
        
               result = algos.take_nd(bins, ids) 
        
               result = Categorical(result, categories=bins, ordered=True) 
        
               return result, bins

It seems like it'd be more performant from a computational and memory standpoint to bypass the intermediate construction of an IntervalIndex via take_nd and instead directly construct the Categorical via Categorical.from_codes.

Some ad hoc measurements on master:

In [3]: ii = pd.interval_range(0, 20)

In [4]: values = np.linspace(0, 20, 100).repeat(10**4)

In [5]: %timeit pd.cut(values, ii)
7.69 s ± 43.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %memit pd.cut(values, ii)
peak memory: 278.39 MiB, increment: 130.76 MiB

And the same measurements with the Categorical.from_codes fix:

In [3]: ii = pd.interval_range(0, 20)

In [4]: values = np.linspace(0, 20, 100).repeat(10**4)

In [5]: %timeit pd.cut(values, ii)
1.02 s ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [6]: %memit pd.cut(values, ii)
peak memory: 145.81 MiB, increment: 15.98 MiB

The text was updated successfully, but these errors were encountered:

jschendel added Performance Memory or execution speed performance Interval Interval data type labels Jul 31, 2019

jschendel added this to the 1.0 milestone Jul 31, 2019

jschendel mentioned this issue Jul 31, 2019

PERF: Improve performance of cut with IntervalIndex bins #27669

Merged

5 tasks

jreback closed this as completed in #27669 Jul 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Avoid materializing entire IntervalIndex when using cut #27668

PERF: Avoid materializing entire IntervalIndex when using cut #27668

jschendel commented Jul 31, 2019 •

edited

Loading

PERF: Avoid materializing entire IntervalIndex when using cut #27668

PERF: Avoid materializing entire IntervalIndex when using cut #27668

Comments

jschendel commented Jul 31, 2019 • edited Loading

jschendel commented Jul 31, 2019 •

edited

Loading