-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: restore DatetimeIndex.__iter__ performance by using non-EA implementation #26702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26702 +/- ##
==========================================
- Coverage 91.88% 91.87% -0.01%
==========================================
Files 174 174
Lines 50701 50702 +1
==========================================
- Hits 46588 46584 -4
- Misses 4113 4118 +5
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26702 +/- ##
==========================================
- Coverage 91.88% 91.88% -0.01%
==========================================
Files 174 174
Lines 50702 50703 +1
==========================================
- Hits 46590 46587 -3
- Misses 4112 4116 +4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @jbrockmendel if you have a chance to verify.
Release note in 0.25.0.rst would be nice to have @qwhelan. We have a performance improvements section.
LGTM. Only question is if in principle we should be wrapping |
@TomAugspurger Done |
Thanks. In theory, a similar chunking trick could be done for PeriodArray & TimedeltaArray, right? The only thing to change would be the function doing the conversion in converted = tslib.ints_to_pydatetime(data[start_i:end_i],
tz=self.tz, freq=self.freq,
box="timestamp") Happy to leave that as a followup. |
thanks @qwhelan |
@jreback Thanks for merging! @TomAugspurger Yes, definitely a possibility. There's probably mid-single-digit factor speedups available there. Unfortunately, I'm headed to Europe for vacation next week and starting a new job shortly thereafter so can't promise to make any headway on those. |
No worries, I opened #26713 to
track this.
…On Fri, Jun 7, 2019 at 2:21 PM Christopher Whelan ***@***.***> wrote:
@jreback <https://github.com/jreback> Thanks for merging!
@TomAugspurger <https://github.com/TomAugspurger> Yes, definitely a
possibility. There's probably mid-single-digit factor speedups available
there. Unfortunately, I'm headed to Europe for vacation next week and
starting a new job shortly thereafter so can't promise to make any headway
on those.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26702?email_source=notifications&email_token=AAKAOITHGCNLCEIOHQRJW5TPZKYMNA5CNFSM4HVT2QQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXGYLQI#issuecomment-500008385>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIRY3W43FBBZU7WWP3LPZKYMNANCNFSM4HVT2QQQ>
.
|
The
timeseries.TimeIteration.time_iter
benchmark shows a6.4x
regression here: https://qwhelan.github.io/pandas/#timeseries.Iteration.time_iter?p-time_index=%3Cfunction%20date_range%3E&commits=08395af4-fc24c2cd . An iteration ofasv find
blames 1fc76b80#diff-26a6d2ca7adfca586aabbb1c9dd8bf36R74 . I believe what is happening is:pandas.core.indexes.datetimelike.ea_passthrough
was modified from callinggetattr()
to being passed a classmethod directlyDatetimeIndexOpsMixin.__init__
then implicitly changed__iter__
fromDatetimeArray.__iter__
toDatetimeLikeArrayMixin.__iter__
. The latter must assume anExtensionArray
, so is intrinsically slower.asv
due to inadvertent inclusion of memory addresses when using certain parameters. This is resolved in ENH: remove memory addresses from param repr() airspeed-velocity/asv#771 and explains why myasv
results shows this regression while http://pandas.pydata.org/speed/pandas/#timeseries.Iteration.time_iter? does notReverting back to old behavior yields a
7x
speedup:git diff upstream/master -u -- "*.py" | flake8 --diff