Skip to content

ENH: Series.resample performance with datetime64[ns] #7754 #10057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 12, 2015

Conversation

scari
Copy link
Contributor

@scari scari commented May 4, 2015

It fixes the performance drop.
See %timeit result here.
closes #7754

@jreback Would you review this PR? Thanks!

@jreback
Copy link
Contributor

jreback commented May 4, 2015

hmm, would need a benchmark, e.g. post a sample timeit for before/after (on a good sized test frame)

@scari
Copy link
Contributor Author

scari commented May 4, 2015

import pandas as pd

intSeries = pd.Series(5, pd.date_range(start='2000-01-01', end='2000-01-08', freq='555000U'), dtype='int64')
timeSeries = intSeries.astype('datetime64[ns]')

%timeit intSeries.resample('1S', how='last')
%timeit timeSeries.resample('1S', how='last')
%prun intSeries.resample('1S', how='last')
%prun timeSeries.resample('1S', how='last')

Before

100 loops, best of 3: 18.8 ms per loop
1 loops, best of 3: 17 s per loop
449 function calls (441 primitive calls) in 0.021 seconds
[snip]
22378318 function calls (21773499 primitive calls) in 24.219 seconds
[snip]

After

100 loops, best of 3: 19 ms per loop
100 loops, best of 3: 20 ms per loop
449 function calls (441 primitive calls) in 0.020 seconds
[snip]
418 function calls (410 primitive calls) in 0.020 seconds
[snip]

@jreback
Copy link
Contributor

jreback commented May 4, 2015

ok, looks ok, can you add a vbench in vb_suite/timeseries.py. you can make similar to the above. but make smaller, so that the slowest time doesn't take too long (max 1s)

@jreback jreback added Datetime Datetime data dtype Performance Memory or execution speed performance labels May 4, 2015
@jreback jreback added this to the 0.17.0 milestone May 4, 2015
@scari
Copy link
Contributor Author

scari commented May 5, 2015

Added test. @jreback please review again.

$ ./test_perf.sh -b HEAD~1 -t HEAD -r timeseries
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_resample_datetime64               |   3.7426 | 2009.6250 |   0.0019 |

Target [dd6ffa3] : ENH: Series.resample performance with datetime64[ns] #7754
Base   [3128215] : cleanup test for GH10044

@scari
Copy link
Contributor Author

scari commented May 12, 2015

Updated whatsnew/v0.17.0.txt.

@jreback
Copy link
Contributor

jreback commented May 12, 2015

looks good ping when green

@scari
Copy link
Contributor Author

scari commented May 12, 2015

on green! :) @jreback

jreback added a commit that referenced this pull request May 12, 2015
ENH: Series.resample performance with datetime64[ns] #7754
@jreback jreback merged commit abba4a1 into pandas-dev:master May 12, 2015
@jreback
Copy link
Contributor

jreback commented May 12, 2015

ty!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series.resample('1S', how='last') on series with dtype=datetime64[ns] is very slow
3 participants