PERF: regression in getattr for IntervalIndex #30742

jorisvandenbossche · 2020-01-06T15:30:03Z

Master:

In [14]: idx = pd.interval_range(0, 1000, 1000) 

In [15]: %timeit getattr(idx, '_ndarray_values', idx)
1.29 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [16]: %timeit idx.closed
321 ns ± 2.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

while on 0.25.3:

In [13]: idx = pd.interval_range(0, 1000, 1000) 

In [14]: %timeit getattr(idx, '_ndarray_values', idx)  
90.5 ns ± 2.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [15]: %timeit idx.closed 
105 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

(just checked a few attributes, didn't check if it is related to those specific ones or getattr in general)

I think this is a cause / one of the causes of several regressions that can currently be seen at https://pandas.pydata.org/speed/pandas/ (eg https://pandas.pydata.org/speed/pandas/#reshape.Cut.time_cut_timedelta?p-bins=1000&commits=6efc2379-b9de33e3)

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-01-06T15:31:16Z

cc @jbrockmendel this might be related: #30605

jbrockmendel · 2020-01-06T16:06:30Z

#30605 changed _ndarray_values from a cache_readonly to a property. Easy to update.

jorisvandenbossche · 2020-01-06T16:11:04Z

That might be easy to update, but I am not sure that this explains the slowdown in the other attributes / slowdown in dependent functionality? Eg pd.cut (the actual benchmark I was investigating):

bins = 1000 
N = 10 ** 5
timedelta_series = pd.Series(np.random.randint(N, size=N), dtype="timedelta64[ns]")
pd.cut(timedelta_series, bins)

This pd.cut call also shows a slowdown, so would need to check if the caching improves that again (note, I also didn't fully investigate if the slowdown here is caused by the IntervalIndex attributes slowdown, there might also be other reasons).

TomAugspurger · 2020-01-06T20:47:09Z

On master right now, CategoricalIndex._ndarray_values is using Index._ndarray_values, which is something like 6x slower than the old implementation (just an attribute lookup).

https://github.com/pandas-dev/pandas/pull/30717/files is adding an ExtensionIndex._ndarray_values which would I think fix at least one source of the slowdown. Or we can have a targeted fix restoring CategoricalIndex._ndarray_values as _data.values.

jorisvandenbossche · 2020-01-08T08:30:23Z

Reopening this because the other attributes are still slower (the _ndarray_values was solved in #30797), eg the original example in the top post for idx.closes is still 3x slower.

jorisvandenbossche · 2020-01-08T08:34:34Z

The original cut benchmark (see example in #30742 (comment)) seems to be fixed with #30797!

jreback · 2020-01-23T04:29:41Z

The original cut benchmark (see example in #30742 (comment)) seems to be fixed with #30797!

is this fixed @jorisvandenbossche

jorisvandenbossche · 2020-01-23T12:37:15Z

Not in general, only the specific case of the _ndarray_values attribute

TomAugspurger · 2020-01-27T15:08:47Z

Any suggestions for the .closed regression?

IIUC, the additional overhead comes from going through accessor.PandasDelegate. Some options

Just manually write the definition of IntervalIndex.closed
Make IntervalIndex.closed cache readonly
Some simpler mixin, such that the definition of IntervalIndex.closed becomes just ._data.closed, rather than the dynamic _delegate_property_get?

jbrockmendel · 2020-01-27T16:12:31Z

indexes.extension.inherit_names has a cache kwarg to turn a property into a cache_readonly. I can update #30860 to make closed (and anything else?) cached.

TomAugspurger · 2020-01-27T16:13:52Z

I think just closed IntervalIndex.

jorisvandenbossche · 2020-01-27T20:25:22Z

I don't think it's just closed, it's all attributes that use the same mechanism.

I think we could also have a version of PandasDelagate that is specific for the indexes that have _data as the underlying data as extension array. I think that way, it can be written more specific / more efficient.

TomAugspurger · 2020-01-28T15:27:43Z

@jbrockmendel does #30860 close this, or is the caching being done in a separate PR?

Or do we want a PandasDelegate subclass that's specific to cases where Index.attr is just Index._data.attr?

jorisvandenbossche · 2020-01-28T15:32:19Z

BTW, I don't think this should necessarily be a blocker for 1.0.
But we should keep this in mind for all index->EA delegation refactor that is happening, and look into improving this.

jbrockmendel · 2020-01-28T19:34:41Z

#30860 did the caching for closed but there may be other properties that can be made cached

simonjayhawkins · 2020-08-17T19:48:32Z

moved off 1.1.1 milestone (scheduled for this week) as no PRs to fix in the pipeline

simonjayhawkins · 2020-09-07T09:23:29Z

moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline

simonjayhawkins · 2020-10-05T12:32:55Z

moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline

simonjayhawkins · 2020-10-29T13:54:22Z

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

jbrockmendel · 2023-02-15T02:57:14Z

Closing as fixed

jorisvandenbossche added Performance Memory or execution speed performance Interval Interval data type labels Jan 6, 2020

jorisvandenbossche added this to the 1.0 milestone Jan 6, 2020

jbrockmendel mentioned this issue Jan 7, 2020

PERF: cache IntervalIndex._ndarray_values #30797

Merged

jreback closed this as completed in #30797 Jan 7, 2020

jorisvandenbossche reopened this Jan 8, 2020

jorisvandenbossche mentioned this issue Jan 10, 2020

PERF: performance regression in 1.0 compared to 0.25 #30790

Closed

4 tasks

TomAugspurger mentioned this issue Jan 22, 2020

RLS: 1.0.0 #27492

Closed

jorisvandenbossche modified the milestones: 1.0.0, 1.1 Jan 28, 2020

jreback modified the milestones: 1.1, 1.1.1 Jul 10, 2020

simonjayhawkins modified the milestones: 1.1.1, 1.1.2 Aug 17, 2020

simonjayhawkins modified the milestones: 1.1.2, 1.1.3 Sep 7, 2020

simonjayhawkins modified the milestones: 1.1.3, 1.1.4 Oct 5, 2020

simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020

jreback modified the milestones: 1.1.5, Contributions Welcome Nov 25, 2020

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

jbrockmendel closed this as completed Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: regression in getattr for IntervalIndex #30742

PERF: regression in getattr for IntervalIndex #30742

jorisvandenbossche commented Jan 6, 2020

jorisvandenbossche commented Jan 6, 2020

jbrockmendel commented Jan 6, 2020

jorisvandenbossche commented Jan 6, 2020

TomAugspurger commented Jan 6, 2020 •

edited

Loading

jorisvandenbossche commented Jan 8, 2020

jorisvandenbossche commented Jan 8, 2020

jreback commented Jan 23, 2020

jorisvandenbossche commented Jan 23, 2020

TomAugspurger commented Jan 27, 2020

jbrockmendel commented Jan 27, 2020

TomAugspurger commented Jan 27, 2020

jorisvandenbossche commented Jan 27, 2020

TomAugspurger commented Jan 28, 2020

jorisvandenbossche commented Jan 28, 2020

jbrockmendel commented Jan 28, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Sep 7, 2020

simonjayhawkins commented Oct 5, 2020

simonjayhawkins commented Oct 29, 2020

jbrockmendel commented Feb 15, 2023

PERF: regression in getattr for IntervalIndex #30742

PERF: regression in getattr for IntervalIndex #30742

Comments

jorisvandenbossche commented Jan 6, 2020

jorisvandenbossche commented Jan 6, 2020

jbrockmendel commented Jan 6, 2020

jorisvandenbossche commented Jan 6, 2020

TomAugspurger commented Jan 6, 2020 • edited Loading

jorisvandenbossche commented Jan 8, 2020

jorisvandenbossche commented Jan 8, 2020

jreback commented Jan 23, 2020

jorisvandenbossche commented Jan 23, 2020

TomAugspurger commented Jan 27, 2020

jbrockmendel commented Jan 27, 2020

TomAugspurger commented Jan 27, 2020

jorisvandenbossche commented Jan 27, 2020

TomAugspurger commented Jan 28, 2020

jorisvandenbossche commented Jan 28, 2020

jbrockmendel commented Jan 28, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Sep 7, 2020

simonjayhawkins commented Oct 5, 2020

simonjayhawkins commented Oct 29, 2020

jbrockmendel commented Feb 15, 2023

TomAugspurger commented Jan 6, 2020 •

edited

Loading