Skip to content

PERF: Rolling._apply iterates over blocks _and_ columns #42736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Jul 26, 2021 · 5 comments · Fixed by #43171
Closed

PERF: Rolling._apply iterates over blocks _and_ columns #42736

jbrockmendel opened this issue Jul 26, 2021 · 5 comments · Fixed by #43171
Labels
Performance Memory or execution speed performance Refactor Internal refactoring of code Window rolling, ewma, expanding
Milestone

Comments

@jbrockmendel
Copy link
Member

@mroeschke looking at BaseWindow._apply we're spending a lot of time in apply_along_axis in our asvs (results posted below). IIUC whats happening is that homogeneous_func is passed to BlockManager.apply thereby iterating over Blocks, and then within that call it iterates over columns.

My intuition is that we should either iterate over blocks or columns but not both. Is there a reason to do both?

Not sure about UDFs, but for mean/sum/... it seems like we'd have to edit the cython functions to iterate over columns there, right?

from asv_bench.benchmarks.rolling import *

self = Engine()
self.setup("DataFrame", "float", sum, "cython", "mean")

%prun -s cumtime for n in range(1000): self.time_rolling_methods('DataFrame', 'float', sum, 'cython', 'mean')

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.325    0.325 {built-in method builtins.exec}
        1    0.001    0.001    0.325    0.325 <string>:1(<module>)
     1000    0.003    0.000    0.324    0.000 rolling.py:68(time_rolling_methods)
     1000    0.002    0.000    0.302    0.000 rolling.py:1802(mean)
     1000    0.002    0.000    0.299    0.000 rolling.py:1291(mean)
     1000    0.002    0.000    0.295    0.000 rolling.py:479(_apply)
     1000    0.007    0.000    0.291    0.000 rolling.py:408(_apply_blockwise)
     1000    0.003    0.000    0.268    0.000 managers.py:276(apply)
     1000    0.003    0.000    0.204    0.000 blocks.py:364(apply)
     1000    0.002    0.000    0.165    0.000 rolling.py:425(hfunc)
     1000    0.004    0.000    0.149    0.000 rolling.py:514(homogeneous_func)
14000/3000    0.013    0.000    0.140    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
     1000    0.001    0.000    0.136    0.000 <__array_function__ internals>:2(apply_along_axis)
     1000    0.015    0.000    0.134    0.000 shape_base.py:267(apply_along_axis)
     1000    0.002    0.000    0.074    0.000 rolling.py:520(calc)
     1000    0.010    0.000    0.058    0.000 managers.py:539(_combine)
     1000    0.007    0.000    0.050    0.000 indexers.py:73(get_window_bounds)

# With ArrayManager so we only iterate over columns:
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.198    0.198 {built-in method builtins.exec}
        1    0.001    0.001    0.198    0.198 <string>:1(<module>)
     1000    0.003    0.000    0.197    0.000 rolling.py:68(time_rolling_methods)
     1000    0.002    0.000    0.174    0.000 rolling.py:1802(mean)
     1000    0.002    0.000    0.171    0.000 rolling.py:1291(mean)
     1000    0.002    0.000    0.167    0.000 rolling.py:479(_apply)
     1000    0.007    0.000    0.163    0.000 rolling.py:408(_apply_blockwise)
     1000    0.005    0.000    0.139    0.000 array_manager.py:194(apply)
     1000    0.002    0.000    0.100    0.000 rolling.py:425(hfunc)
     1000    0.004    0.000    0.086    0.000 rolling.py:514(homogeneous_func)
     1000    0.002    0.000    0.071    0.000 rolling.py:520(calc)
     1000    0.006    0.000    0.047    0.000 indexers.py:73(get_window_bounds)
     2000    0.001    0.000    0.039    0.000 <__array_function__ internals>:2(clip)
7000/3000    0.003    0.000    0.038    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
     2000    0.002    0.000    0.036    0.000 fromnumeric.py:2046(clip)
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2021
@mroeschke
Copy link
Member

You're correct. I think currently it (should) iterates over 1 float block and then iterates over the columns in each block which seems like extra work.

All rolling cython aggregations (including UDF) assume a 1D input, so I think internally it should just iterate over the columns (and rows since there's axis=1 support).

@mroeschke mroeschke added Performance Memory or execution speed performance Refactor Internal refactoring of code Window rolling, ewma, expanding and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2021
@jbrockmendel
Copy link
Member Author

First attempt to make this only-columnwise saw a nonzero-but-small perf bump. Is making the cython functions support 2D directly a viable option?

@mroeschke
Copy link
Member

It is viable. method="table" is essentially 2D in numba.

I am experimenting with using 1D numba or cython function, parallelizing over the columns using numba, that can be shared among mean | groupby mean | rolling mean in twosigma#51

@jbrockmendel
Copy link
Member Author

So if I were to go into window_aggregations and 2D-ify roll_foo, would that get the job done? or would we still have to handle UDFs somewhere?

@mroeschke
Copy link
Member

I think some of the plumbing in rolling.py would need to support the 2d roll_foo, but that should be all that is needed.

Also since there is roll.apply, users would now need to make sure their UDF supports 2D inputs?

(IMO your approach in #43171 is reasonable enough)

@mroeschke mroeschke added this to the 1.4 milestone Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Refactor Internal refactoring of code Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants