PERF: restore DatetimeIndex.iter performance by using non-EA implementation #26702

qwhelan · 2019-06-07T08:46:53Z

The timeseries.TimeIteration.time_iter benchmark shows a 6.4x regression here: https://qwhelan.github.io/pandas/#timeseries.Iteration.time_iter?p-time_index=%3Cfunction%20date_range%3E&commits=08395af4-fc24c2cd . An iteration of asv find blames 1fc76b80#diff-26a6d2ca7adfca586aabbb1c9dd8bf36R74 . I believe what is happening is:

pandas.core.indexes.datetimelike.ea_passthrough was modified from calling getattr() to being passed a classmethod directly
The corresponding change in DatetimeIndexOpsMixin.__init__ then implicitly changed __iter__ from DatetimeArray.__iter__ to DatetimeLikeArrayMixin.__iter__. The latter must assume an ExtensionArray, so is intrinsically slower.
This regression was then hidden in asv due to inadvertent inclusion of memory addresses when using certain parameters. This is resolved in ENH: remove memory addresses from param repr() airspeed-velocity/asv#771 and explains why my asv results shows this regression while http://pandas.pydata.org/speed/pandas/#timeseries.Iteration.time_iter? does not

Reverting back to old behavior yields a 7x speedup:

$ asv compare HEAD^ HEAD --only-changed
       before           after         ratio
     [649ad5c9]       [4d0c13cc]
     <any_all_fix>       <datetime_iter>
-      3.30±0.02s          469±6ms     0.14  timeseries.Iteration.time_iter(<function date_range>)
-        34.7±1ms       11.2±0.2ms     0.32  timeseries.Iteration.time_iter_preexit(<function date_range>)

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2019-06-07T09:24:26Z

Codecov Report

Merging #26702 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26702      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50701    50702       +1     
==========================================
- Hits        46588    46584       -4     
- Misses       4113     4118       +5

Flag	Coverage Δ
#multiple	`90.41% <100%> (ø)`	⬆️
#single	`41.91% <100%> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/datetimes.py	`96.37% <100%> (ø)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97% <0%> (-0.12%)`	⬇️
pandas/util/testing.py	`90.6% <0%> (-0.11%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 649ad5c...4d0c13c. Read the comment docs.

codecov · 2019-06-07T09:24:32Z

Codecov Report

Merging #26702 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #26702      +/-   ##
==========================================
- Coverage   91.88%   91.88%   -0.01%     
==========================================
  Files         174      174              
  Lines       50702    50703       +1     
==========================================
- Hits        46590    46587       -3     
- Misses       4112     4116       +4

Flag	Coverage Δ
#multiple	`90.42% <100%> (ø)`	⬆️
#single	`41.9% <100%> (-0.11%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/datetimes.py	`96.37% <100%> (ø)`	⬆️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ff4f38...a1e703f. Read the comment docs.

TomAugspurger

LGTM. cc @jbrockmendel if you have a chance to verify.

Release note in 0.25.0.rst would be nice to have @qwhelan. We have a performance improvements section.

jbrockmendel · 2019-06-07T13:04:49Z

LGTM. Only question is if in principle we should be wrapping self._data.__iter__, but I don't think its worth the hassle.

…ementation

qwhelan · 2019-06-07T17:05:49Z

@TomAugspurger Done

TomAugspurger · 2019-06-07T18:56:00Z

Thanks. In theory, a similar chunking trick could be done for PeriodArray & TimedeltaArray, right? The only thing to change would be the function doing the conversion in

            converted = tslib.ints_to_pydatetime(data[start_i:end_i],
                                                 tz=self.tz, freq=self.freq,
                                                 box="timestamp")

Happy to leave that as a followup.

jreback · 2019-06-07T19:01:10Z

thanks @qwhelan

qwhelan · 2019-06-07T19:21:34Z

@jreback Thanks for merging!

@TomAugspurger Yes, definitely a possibility. There's probably mid-single-digit factor speedups available there. Unfortunately, I'm headed to Europe for vacation next week and starting a new job shortly thereafter so can't promise to make any headway on those.

TomAugspurger · 2019-06-07T19:35:04Z

No worries, I opened #26713 to track this.

…

On Fri, Jun 7, 2019 at 2:21 PM Christopher Whelan ***@***.***> wrote: @jreback <https://github.com/jreback> Thanks for merging! @TomAugspurger <https://github.com/TomAugspurger> Yes, definitely a possibility. There's probably mid-single-digit factor speedups available there. Unfortunately, I'm headed to Europe for vacation next week and starting a new job shortly thereafter so can't promise to make any headway on those. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26702?email_source=notifications&email_token=AAKAOITHGCNLCEIOHQRJW5TPZKYMNA5CNFSM4HVT2QQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXGYLQI#issuecomment-500008385>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKAOIRY3W43FBBZU7WWP3LPZKYMNANCNFSM4HVT2QQQ> .

TomAugspurger added Performance Memory or execution speed performance Timedelta Timedelta data type Datetime Datetime data dtype and removed Timedelta Timedelta data type labels Jun 7, 2019

TomAugspurger approved these changes Jun 7, 2019

View reviewed changes

TomAugspurger added this to the 0.25.0 milestone Jun 7, 2019

PERF: restore DatetimeIndex.__iter__ performance by using non-EA impl…

a1e703f

…ementation

qwhelan force-pushed the datetime_iter branch from 4d0c13c to a1e703f Compare June 7, 2019 17:05

jreback merged commit cdc07fd into pandas-dev:master Jun 7, 2019

qwhelan deleted the datetime_iter branch June 7, 2019 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: restore DatetimeIndex.iter performance by using non-EA implementation #26702

PERF: restore DatetimeIndex.iter performance by using non-EA implementation #26702

qwhelan commented Jun 7, 2019 •

edited

Loading

codecov bot commented Jun 7, 2019

codecov bot commented Jun 7, 2019 •

edited

Loading

TomAugspurger left a comment •

edited

Loading

jbrockmendel commented Jun 7, 2019

qwhelan commented Jun 7, 2019

TomAugspurger commented Jun 7, 2019

jreback commented Jun 7, 2019

qwhelan commented Jun 7, 2019

TomAugspurger commented Jun 7, 2019 via email

PERF: restore DatetimeIndex.__iter__ performance by using non-EA implementation #26702

PERF: restore DatetimeIndex.__iter__ performance by using non-EA implementation #26702

Conversation

qwhelan commented Jun 7, 2019 • edited Loading

codecov bot commented Jun 7, 2019

Codecov Report

codecov bot commented Jun 7, 2019 • edited Loading

Codecov Report

TomAugspurger left a comment • edited Loading

Choose a reason for hiding this comment

jbrockmendel commented Jun 7, 2019

qwhelan commented Jun 7, 2019

TomAugspurger commented Jun 7, 2019

jreback commented Jun 7, 2019

qwhelan commented Jun 7, 2019

TomAugspurger commented Jun 7, 2019 via email

PERF: restore DatetimeIndex.iter performance by using non-EA implementation #26702

PERF: restore DatetimeIndex.iter performance by using non-EA implementation #26702

qwhelan commented Jun 7, 2019 •

edited

Loading

codecov bot commented Jun 7, 2019 •

edited

Loading

TomAugspurger left a comment •

edited

Loading