Skip to content

PERF: speed up IntervalIndex._intersection_non_unique by ~50x #27489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 20, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jul 20, 2019

I've been backfilling asv data and noticed the following regression in IntervalIndexMethod.time_intersection_both_duplicate (see here):
Screenshot from 2019-07-20 02-30-17

This regression was missed as the benchmark was added in #26711, which was after introduction in #26225.

This PR both simplifies the IntervalIndex._intersection_non_unique logic (now equivalent to MultiIndex._intersection_non_unique) and provides a ~50x speedup:

       before           after         ratio
     [9bab81e0]       [2848036e]
     <interval_non_unique_intersection~1>       <interval_non_unique_intersection>
-      12.6±0.1ms         725±30μs     0.06  index_object.IntervalIndexMethod.time_intersection_both_duplicate(1000)
-         4.96±0s         96.7±6ms     0.02  index_object.IntervalIndexMethod.time_intersection_both_duplicate(100000)

The new numbers are about 10x faster than the old baseline.

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@WillAyd WillAyd added the Performance Memory or execution speed performance label Jul 20, 2019
@WillAyd
Copy link
Member

WillAyd commented Jul 20, 2019

@jschendel

@jreback jreback added this to the 0.26.0 milestone Jul 20, 2019
@jreback jreback merged commit 556245e into pandas-dev:master Jul 20, 2019
@jreback
Copy link
Contributor

jreback commented Jul 20, 2019

thanks @qwhelan

@jreback jreback modified the milestones: 0.26.0, 1.0 Jul 20, 2019
quintusdias pushed a commit to quintusdias/pandas_dev that referenced this pull request Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants