Skip to content

PERF: restore DatetimeIndex.__iter__ performance by using non-EA implementation #26702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 7, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jun 7, 2019

The timeseries.TimeIteration.time_iter benchmark shows a 6.4x regression here: https://qwhelan.github.io/pandas/#timeseries.Iteration.time_iter?p-time_index=%3Cfunction%20date_range%3E&commits=08395af4-fc24c2cd . An iteration of asv find blames 1fc76b80#diff-26a6d2ca7adfca586aabbb1c9dd8bf36R74 . I believe what is happening is:

Reverting back to old behavior yields a 7x speedup:

$ asv compare HEAD^ HEAD --only-changed
       before           after         ratio
     [649ad5c9]       [4d0c13cc]
     <any_all_fix>       <datetime_iter>
-      3.30±0.02s          469±6ms     0.14  timeseries.Iteration.time_iter(<function date_range>)
-        34.7±1ms       11.2±0.2ms     0.32  timeseries.Iteration.time_iter_preexit(<function date_range>)
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Jun 7, 2019

Codecov Report

Merging #26702 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26702      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50701    50702       +1     
==========================================
- Hits        46588    46584       -4     
- Misses       4113     4118       +5
Flag Coverage Δ
#multiple 90.41% <100%> (ø) ⬆️
#single 41.91% <100%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.37% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️
pandas/util/testing.py 90.6% <0%> (-0.11%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 649ad5c...4d0c13c. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 7, 2019

Codecov Report

Merging #26702 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26702      +/-   ##
==========================================
- Coverage   91.88%   91.88%   -0.01%     
==========================================
  Files         174      174              
  Lines       50702    50703       +1     
==========================================
- Hits        46590    46587       -3     
- Misses       4112     4116       +4
Flag Coverage Δ
#multiple 90.42% <100%> (ø) ⬆️
#single 41.9% <100%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.37% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ff4f38...a1e703f. Read the comment docs.

@TomAugspurger TomAugspurger added Performance Memory or execution speed performance Timedelta Timedelta data type Datetime Datetime data dtype and removed Timedelta Timedelta data type labels Jun 7, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @jbrockmendel if you have a chance to verify.

Release note in 0.25.0.rst would be nice to have @qwhelan. We have a performance improvements section.

@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Jun 7, 2019
@jbrockmendel
Copy link
Member

LGTM. Only question is if in principle we should be wrapping self._data.__iter__, but I don't think its worth the hassle.

@qwhelan
Copy link
Contributor Author

qwhelan commented Jun 7, 2019

@TomAugspurger Done

@TomAugspurger
Copy link
Contributor

Thanks. In theory, a similar chunking trick could be done for PeriodArray & TimedeltaArray, right? The only thing to change would be the function doing the conversion in

            converted = tslib.ints_to_pydatetime(data[start_i:end_i],
                                                 tz=self.tz, freq=self.freq,
                                                 box="timestamp")

Happy to leave that as a followup.

@jreback jreback merged commit cdc07fd into pandas-dev:master Jun 7, 2019
@jreback
Copy link
Contributor

jreback commented Jun 7, 2019

thanks @qwhelan

@qwhelan qwhelan deleted the datetime_iter branch June 7, 2019 19:16
@qwhelan
Copy link
Contributor Author

qwhelan commented Jun 7, 2019

@jreback Thanks for merging!

@TomAugspurger Yes, definitely a possibility. There's probably mid-single-digit factor speedups available there. Unfortunately, I'm headed to Europe for vacation next week and starting a new job shortly thereafter so can't promise to make any headway on those.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 7, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants