PERF: regression in IntervalIndex intersection #42240

jorisvandenbossche · 2021-06-25T20:46:34Z

There were several regressions related to the set op methods of IntervalIndex. For example #41929 (comment) which was fixed (and backported) in #42197.

But as far as I can see, that didn't fix the regression in https://pandas.pydata.org/speed/pandas/#index_object.IntervalIndexMethod.time_intersection?python=3.8&Cython=0.29.21&p-param1=100000&commits=de07087a-6f953a8c (50x slow down of index_object.IntervalIndexMethod.time_intersection(100000))
This one might be caused by one of the other interval index related commits in the range (#41832, #41824, #41883)

cc @jbrockmendel

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2021-06-25T20:58:25Z

#42197 fixed (in local measurements at least) about half of the perf hit mentioned in #41929. Still looking at ways to get the rest back.

jorisvandenbossche · 2021-06-25T21:03:20Z

(as I mentioned, this is a different issue as the one in #41929)

jbrockmendel · 2021-06-25T21:36:19Z

Thank you for clarifying.

jorisvandenbossche · 2021-06-26T07:39:34Z

The equivalent code from the benchmark mentioned above:

N = 10 ** 5
left = pd.IntervalIndex.from_breaks(np.arange(N))
right = pd.IntervalIndex.from_breaks(np.arange(N - 3, 2 * N - 3))

%timeit left.intersection(right)
9.33 ms ± 591 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <-- pandas 1.2.5
227 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)    # <-- master

jbrockmendel · 2021-06-26T17:36:27Z

Profiling, looks like most of the time is taken up by a call to _get_join_target, which boils down to casting to object dtype. If I go into Index._intersection and change

-        if self.is_monotonic and other.is_monotonic:
+        if self.is_monotonic and other.is_monotonic and not is_interval_dtype(self.dtype):

then I see a runtime of 6.61ms, vs 163ms on master, so ever-so-slightly faster than 1.2.5. (this change passes the tests/indexes tests, havent run the full test suite)

Will look into a less-kludgy solution.

simonjayhawkins · 2021-06-28T14:59:57Z

reopening as #42268 has not recovered all the performance regression

(pandas-dev) simon@T3630:~/pandas/asv_bench (master)$ asv compare -f 1.1 v1.3.0.dev0 master --only-changed --sort ratio
       before           after         ratio
     [0f587028]       [2d69cf68]
     <v1.3.0.dev0^0>       <master>  
+     4.80±0.08ms          181±2ms    37.68  index_object.IntervalIndexMethod.time_intersection(100000)
+      4.99±0.1ms          102±1ms    20.46  index_object.IntervalIndexMethod.time_intersection_one_duplicate(100000)
+         187±4μs      1.78±0.03ms     9.53  index_object.IntervalIndexMethod.time_intersection_one_duplicate(1000)
+         632±6μs      2.17±0.04ms     3.44  index_object.IntervalIndexMethod.time_intersection_both_duplicate(1000)
+         204±3μs         475±10μs     2.32  index_object.IntervalIndexMethod.time_intersection(1000)
+      59.8±0.5ms          108±1ms     1.81  index_object.IntervalIndexMethod.time_intersection_both_duplicate(100000)

jbrockmendel · 2021-06-28T16:15:34Z

Huh, I get the same asv results, but when i run the worst one with %timeit i get much better perf

simonjayhawkins · 2021-06-28T17:02:54Z

There were several regressions related to the set op methods of IntervalIndex. For example #41929 (comment) which was fixed (and backported) in #42197.

But as far as I can see, that didn't fix the regression in https://pandas.pydata.org/speed/pandas/#index_object.IntervalIndexMethod.time_intersection?python=3.8&Cython=0.29.21&p-param1=100000&commits=de07087a-6f953a8c (50x slow down of index_object.IntervalIndexMethod.time_intersection(100000))
This one might be caused by one of the other interval index related commits in the range (#41832, #41824, #41883)

94756fe seemed to be responsible for the majority of the slowdown

simonjayhawkins · 2021-06-28T17:09:11Z

but 1ff3b9a seems to be responsible for the majority of the slowdown in index_object.IntervalIndexMethod.time_intersection_one_duplicate(100000)

jbrockmendel · 2021-06-28T17:51:31Z

Probably best to revert the offending PRs for 1.3 and i'll see about more robust solutions in 1.4

simonjayhawkins · 2021-06-29T08:58:52Z

but 1ff3b9a seems to be responsible for the majority of the slowdown in index_object.IntervalIndexMethod.time_intersection_one_duplicate(100000)

I didn't check the results properly here. The majority of the slowdown is also from 94756fe

jreback · 2021-06-29T12:32:20Z

#42293 should fix here (need to verify)

jreback · 2021-06-29T12:32:59Z

reopen to verify the results

simonjayhawkins · 2021-06-29T13:59:42Z

(pandas-dev) simon@T3630:~/pandas/asv_bench (master)$ asv compare v1.3.0.dev0 master --sort ratio|grep index_object.IntervalIndexMethod.time_intersection
+         632±6μs      2.23±0.09ms     3.52  index_object.IntervalIndexMethod.time_intersection_both_duplicate(1000)
+      59.8±0.5ms        110±0.6ms     1.85  index_object.IntervalIndexMethod.time_intersection_both_duplicate(100000)
+         204±3μs          281±7μs     1.38  index_object.IntervalIndexMethod.time_intersection(1000)
+         187±4μs          227±2μs     1.22  index_object.IntervalIndexMethod.time_intersection_one_duplicate(1000)
      4.80±0.08ms       5.16±0.2ms     1.07  index_object.IntervalIndexMethod.time_intersection(100000)
       4.99±0.1ms       4.93±0.2ms     0.99  index_object.IntervalIndexMethod.time_intersection_one_duplicate(100000)```

simonjayhawkins · 2021-06-29T14:00:33Z

can probably close once we have official timings on the benchmark machine

simonjayhawkins · 2021-06-30T06:47:31Z

closing. can reopen if time_intersection_both_duplicate needs attention. cc @jbrockmendel

simonjayhawkins · 2021-07-01T12:01:19Z

closing. can reopen if time_intersection_both_duplicate needs attention. cc @jbrockmendel

https://simonjayhawkins.github.io/fantastic-dollop/#regressions?sort=3&dir=desc&branch=1.3.x

jorisvandenbossche added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version Interval Interval data type labels Jun 25, 2021

jorisvandenbossche added this to the 1.3 milestone Jun 25, 2021

jbrockmendel mentioned this issue Jun 27, 2021

PERF: IntervalIndex.intersection #42268

Merged

4 tasks

jreback closed this as completed in #42268 Jun 28, 2021

simonjayhawkins reopened this Jun 28, 2021

jbrockmendel mentioned this issue Jun 29, 2021

PERF/REGR: IntervalIndex.intersection, PeriodIndex.get_loc #42293

Merged

4 tasks

jreback closed this as completed in #42293 Jun 29, 2021

jreback reopened this Jun 29, 2021

simonjayhawkins added the Closing Candidate May be closeable, needs more eyeballs label Jun 29, 2021

simonjayhawkins closed this as completed Jun 30, 2021

simonjayhawkins mentioned this issue Jul 1, 2021

PERF/REGR: restore IntervalIndex._intersection_non_unique #42334

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: regression in IntervalIndex intersection #42240

PERF: regression in IntervalIndex intersection #42240

jorisvandenbossche commented Jun 25, 2021

jbrockmendel commented Jun 25, 2021

jorisvandenbossche commented Jun 25, 2021

jbrockmendel commented Jun 25, 2021

jorisvandenbossche commented Jun 26, 2021

jbrockmendel commented Jun 26, 2021

simonjayhawkins commented Jun 28, 2021

jbrockmendel commented Jun 28, 2021

simonjayhawkins commented Jun 28, 2021

simonjayhawkins commented Jun 28, 2021

jbrockmendel commented Jun 28, 2021

simonjayhawkins commented Jun 29, 2021

jreback commented Jun 29, 2021

jreback commented Jun 29, 2021

simonjayhawkins commented Jun 29, 2021 •

edited

Loading

simonjayhawkins commented Jun 29, 2021

simonjayhawkins commented Jun 30, 2021

simonjayhawkins commented Jul 1, 2021

PERF: regression in IntervalIndex intersection #42240

PERF: regression in IntervalIndex intersection #42240

Comments

jorisvandenbossche commented Jun 25, 2021

jbrockmendel commented Jun 25, 2021

jorisvandenbossche commented Jun 25, 2021

jbrockmendel commented Jun 25, 2021

jorisvandenbossche commented Jun 26, 2021

jbrockmendel commented Jun 26, 2021

simonjayhawkins commented Jun 28, 2021

jbrockmendel commented Jun 28, 2021

simonjayhawkins commented Jun 28, 2021

simonjayhawkins commented Jun 28, 2021

jbrockmendel commented Jun 28, 2021

simonjayhawkins commented Jun 29, 2021

jreback commented Jun 29, 2021

jreback commented Jun 29, 2021

simonjayhawkins commented Jun 29, 2021 • edited Loading

simonjayhawkins commented Jun 29, 2021

simonjayhawkins commented Jun 30, 2021

simonjayhawkins commented Jul 1, 2021

simonjayhawkins commented Jun 29, 2021 •

edited

Loading