Skip to content

performance regression in DataFrame.sum? #4365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpcloud opened this issue Jul 25, 2013 · 4 comments · Fixed by #4366
Closed

performance regression in DataFrame.sum? #4365

cpcloud opened this issue Jul 25, 2013 · 4 comments · Fixed by #4366
Assignees
Labels
Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@cpcloud
Copy link
Member

cpcloud commented Jul 25, 2013

In [29]: df = DataFrame(randint(2, size=(8e6, 10)))

In [30]: time df.sum()
CPU times: user 1.21 s, sys: 0.03 s, total: 1.24 s
Wall time: 1.23 s
Out[30]:
0    3999325
1    3998264
2    4000047
3    3997902
4    4001078
5    4001965
6    4001701
7    4000482
8    3997581
9    4000691
dtype: int64

In [31]: time df.mean()
CPU times: user 0.38 s, sys: 0.00 s, total: 0.38 s
Wall time: 0.39 s
Out[31]:
0    0.5
1    0.5
2    0.5
3    0.5
4    0.5
5    0.5
6    0.5
7    0.5
8    0.5
9    0.5
dtype: float64

Should sum be 3.15x slower than mean?

@cpcloud
Copy link
Member Author

cpcloud commented Jul 25, 2013

INSTALLED VERSIONS
------------------
Python: 2.7.5.final.0
OS: Linux 3.9.9-1-ARCH #1 SMP PREEMPT Wed Jul 3 22:45:16 CEST 2013 x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

Cython: 0.19.1
Numpy: 1.7.1
Scipy: 0.12.0
statsmodels: 0.5.0.dev-db14222
    patsy: 0.1.0
scikits.timeseries: 0.91.3
dateutil: 2.1
pytz: 2013b
bottleneck: 0.6.0
PyTables: 3.0.0
    numexpr: 2.1
matplotlib: 1.2.1
openpyxl: 1.6.2
xlrd: 0.9.2
xlwt: 0.7.5
sqlalchemy: 0.8.1
lxml: 3.2.1
bs4: 4.2.1
html5lib: 1.0b1

@cpcloud
Copy link
Member Author

cpcloud commented Jul 25, 2013

issue is slightly worse before #3731 so that's not the cause

@cpcloud
Copy link
Member Author

cpcloud commented Jul 25, 2013

Only happens for int64 dtypes so far...perf is on point for float64

@cpcloud
Copy link
Member Author

cpcloud commented Jul 25, 2013

new vbench for this

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
stat_ops_frame_sum_int_axis_1                |   0.2921 |   5.4233 |   0.0539 |
stat_ops_frame_sum_int_axis_0                |   0.3717 |   5.5280 |   0.0672 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant