PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex #23591

qwhelan · 2018-11-09T07:30:01Z

We get a few orders of magnitude speedup in IntervalIndex slicing by simply overriding the base class definition of is_all_dates, like all other Index derivatives also do. The root cause of the performance degradation is as follows:

When slicing a Series, a new Series is created for the result.
The last step in Series.__init__() is Series._set_axis(), which in turn calls .is_all_dates on the new Index
The base definition of Index.is_all_dates is:

@cache_readonly
def is_all_dates(self):
    if self._data is None:
        return False
    return is_datetime_array(ensure_object(self.values))

which seems harmless at first glance. However, this eventually invokes IntervalArray.__array__, which is a pure Python for-loop creating Interval objects and leading to the performance regression here.

As the value of IntervalIndex.is_all_dates appears to always be False, even in the case of datetime-like left/right values, we simply override to return that value and shortcut the inadvertent copy described above.

Benchmarks

       before           after         ratio
     [ce62a5c1]       [a8f5e90b]
     <interval_index_fix~1>       <interval_index_fix>
        3.42±0.6s          446±0μs    ~0.00  indexing.IntervalIndexing.time_getitem_list
         101±50μs          103±0μs     1.02  indexing.IntervalIndexing.time_getitem_scalar
-      3.80±0.08s          355±0μs     0.00  indexing.IntervalIndexing.time_loc_list
          140±0μs          123±0μs    ~0.88  indexing.IntervalIndexing.time_loc_scalar

Speed up of ~10704x for time_loc_list

closes PERF: Slowdown in IntervalIndex.get_loc #23576
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry (I don't think this regression has been in a release)

…n IntervalIndex

pep8speaks · 2018-11-09T07:30:13Z

Hello @qwhelan! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/indexes/interval.py !
There are no PEP8 issues in the file pandas/tests/indexes/interval/test_interval.py !

codecov · 2018-11-09T13:02:05Z

Codecov Report

Merging #23591 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23591      +/-   ##
==========================================
+ Coverage   92.25%   92.25%   +<.01%     
==========================================
  Files         161      161              
  Lines       51237    51239       +2     
==========================================
+ Hits        47269    47271       +2     
  Misses       3968     3968

Flag	Coverage Δ
#multiple	`90.63% <100%> (ø)`	⬆️
#single	`42.29% <50%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/interval.py	`94.68% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce62a5c...a8f5e90. Read the comment docs.

TomAugspurger · 2018-11-09T13:30:26Z

Thanks! I opened #23598 to investigate removing this implicit casting, but this will be nice if we don't do #23598 for 0.24.0

jreback · 2018-11-09T14:38:57Z

pandas/core/indexes/interval.py

@@ -1061,6 +1061,14 @@ def func(self, other):
                                          name=result_name)
        return func

+    @property
+    def is_all_dates(self):


note that we should actually change the default for Index suclasses i think

qwhelan · 2018-11-09T17:14:15Z

Thanks @TomAugspurger !

…fixed * upstream/master: (47 commits) CLN: remove values attribute from datetimelike EAs (pandas-dev#23603) DOC/CI: Add linting to rst files, and fix issues (pandas-dev#23381) PERF: Speeds up creation of Period, PeriodArray, with Offset freq (pandas-dev#23589) PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex (pandas-dev#23591) TST: Tests and Helpers for Datetime/Period Arrays (pandas-dev#23502) Update description of Index._values/values/ndarray_values (pandas-dev#23507) Fixes to make validate_docstrings.py not generate warnings or unwanted output (pandas-dev#23552) DOC: Added note about groupby excluding Decimal columns by default (pandas-dev#18953) ENH: Support writing timestamps with timezones with to_sql (pandas-dev#22654) CI: Auto-cancel redundant builds (pandas-dev#23523) Preserve EA dtype in DataFrame.stack (pandas-dev#23285) TST: Fix dtype mismatch on 32bit in IntervalTree get_indexer test (pandas-dev#23468) BUG: raise if invalid freq is passed (pandas-dev#23546) remove uses of (ts)?lib.(NaT|iNaT|Timestamp) (pandas-dev#23562) BUG: Fix error message for invalid HTML flavor (pandas-dev#23550) ENH: Support EAs in Series.unstack (pandas-dev#23284) DOC: Updating DataFrame.join docstring (pandas-dev#23471) TST: coverage for skipped tests in io/formats/test_to_html.py (pandas-dev#22888) BUG: Return KeyError for invalid string key (pandas-dev#23540) BUG: DatetimeIndex slicing with boolean Index raises TypeError (pandas-dev#22852) ...

…n IntervalIndex (pandas-dev#23591)

PERF: define is_all_dates to shortcut inadvertent copy when slicing a…

a8f5e90

…n IntervalIndex

qwhelan mentioned this pull request Nov 9, 2018

PERF: Slowdown in IntervalIndex.get_loc #23576

Closed

jorisvandenbossche added this to the 0.24.0 milestone Nov 9, 2018

jorisvandenbossche added Performance Memory or execution speed performance Interval Interval data type labels Nov 9, 2018

TomAugspurger mentioned this pull request Nov 9, 2018

PERF: Deprecate casting of index of dates to DatetimeIndex #23598

Closed

TomAugspurger approved these changes Nov 9, 2018

View reviewed changes

TomAugspurger merged commit 700520d into pandas-dev:master Nov 9, 2018

jreback reviewed Nov 9, 2018

View reviewed changes

qwhelan deleted the interval_index_fix branch November 9, 2018 17:14

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this pull request Nov 14, 2018

PERF: define is_all_dates to shortcut inadvertent copy when slicing a…

03d632c

…n IntervalIndex (pandas-dev#23591)

tm9k1 pushed a commit to tm9k1/pandas that referenced this pull request Nov 19, 2018

PERF: define is_all_dates to shortcut inadvertent copy when slicing a…

edea5ec

…n IntervalIndex (pandas-dev#23591)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

PERF: define is_all_dates to shortcut inadvertent copy when slicing a…

723165a

…n IntervalIndex (pandas-dev#23591)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

PERF: define is_all_dates to shortcut inadvertent copy when slicing a…

a5c742a

…n IntervalIndex (pandas-dev#23591)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex #23591

PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex #23591

qwhelan commented Nov 9, 2018 •

edited

Loading

pep8speaks commented Nov 9, 2018

codecov bot commented Nov 9, 2018

TomAugspurger commented Nov 9, 2018

jreback Nov 9, 2018

qwhelan commented Nov 9, 2018

PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex #23591

PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex #23591

Conversation

qwhelan commented Nov 9, 2018 • edited Loading

Benchmarks

pep8speaks commented Nov 9, 2018

codecov bot commented Nov 9, 2018

Codecov Report

TomAugspurger commented Nov 9, 2018

jreback Nov 9, 2018

Choose a reason for hiding this comment

qwhelan commented Nov 9, 2018

qwhelan commented Nov 9, 2018 •

edited

Loading