PERF: Rolling._apply iterates over blocks _and_ columns #42736

jbrockmendel · 2021-07-26T20:02:28Z

@mroeschke looking at BaseWindow._apply we're spending a lot of time in apply_along_axis in our asvs (results posted below). IIUC whats happening is that homogeneous_func is passed to BlockManager.apply thereby iterating over Blocks, and then within that call it iterates over columns.

My intuition is that we should either iterate over blocks or columns but not both. Is there a reason to do both?

Not sure about UDFs, but for mean/sum/... it seems like we'd have to edit the cython functions to iterate over columns there, right?

from asv_bench.benchmarks.rolling import *

self = Engine()
self.setup("DataFrame", "float", sum, "cython", "mean")

%prun -s cumtime for n in range(1000): self.time_rolling_methods('DataFrame', 'float', sum, 'cython', 'mean')

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.325    0.325 {built-in method builtins.exec}
        1    0.001    0.001    0.325    0.325 <string>:1(<module>)
     1000    0.003    0.000    0.324    0.000 rolling.py:68(time_rolling_methods)
     1000    0.002    0.000    0.302    0.000 rolling.py:1802(mean)
     1000    0.002    0.000    0.299    0.000 rolling.py:1291(mean)
     1000    0.002    0.000    0.295    0.000 rolling.py:479(_apply)
     1000    0.007    0.000    0.291    0.000 rolling.py:408(_apply_blockwise)
     1000    0.003    0.000    0.268    0.000 managers.py:276(apply)
     1000    0.003    0.000    0.204    0.000 blocks.py:364(apply)
     1000    0.002    0.000    0.165    0.000 rolling.py:425(hfunc)
     1000    0.004    0.000    0.149    0.000 rolling.py:514(homogeneous_func)
14000/3000    0.013    0.000    0.140    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
     1000    0.001    0.000    0.136    0.000 <__array_function__ internals>:2(apply_along_axis)
     1000    0.015    0.000    0.134    0.000 shape_base.py:267(apply_along_axis)
     1000    0.002    0.000    0.074    0.000 rolling.py:520(calc)
     1000    0.010    0.000    0.058    0.000 managers.py:539(_combine)
     1000    0.007    0.000    0.050    0.000 indexers.py:73(get_window_bounds)

# With ArrayManager so we only iterate over columns:
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.198    0.198 {built-in method builtins.exec}
        1    0.001    0.001    0.198    0.198 <string>:1(<module>)
     1000    0.003    0.000    0.197    0.000 rolling.py:68(time_rolling_methods)
     1000    0.002    0.000    0.174    0.000 rolling.py:1802(mean)
     1000    0.002    0.000    0.171    0.000 rolling.py:1291(mean)
     1000    0.002    0.000    0.167    0.000 rolling.py:479(_apply)
     1000    0.007    0.000    0.163    0.000 rolling.py:408(_apply_blockwise)
     1000    0.005    0.000    0.139    0.000 array_manager.py:194(apply)
     1000    0.002    0.000    0.100    0.000 rolling.py:425(hfunc)
     1000    0.004    0.000    0.086    0.000 rolling.py:514(homogeneous_func)
     1000    0.002    0.000    0.071    0.000 rolling.py:520(calc)
     1000    0.006    0.000    0.047    0.000 indexers.py:73(get_window_bounds)
     2000    0.001    0.000    0.039    0.000 <__array_function__ internals>:2(clip)
7000/3000    0.003    0.000    0.038    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
     2000    0.002    0.000    0.036    0.000 fromnumeric.py:2046(clip)

The text was updated successfully, but these errors were encountered:

mroeschke · 2021-07-26T20:11:08Z

You're correct. I think currently it (should) iterates over 1 float block and then iterates over the columns in each block which seems like extra work.

All rolling cython aggregations (including UDF) assume a 1D input, so I think internally it should just iterate over the columns (and rows since there's axis=1 support).

jbrockmendel · 2021-07-29T04:09:57Z

First attempt to make this only-columnwise saw a nonzero-but-small perf bump. Is making the cython functions support 2D directly a viable option?

mroeschke · 2021-07-29T04:19:40Z

It is viable. method="table" is essentially 2D in numba.

I am experimenting with using 1D numba or cython function, parallelizing over the columns using numba, that can be shared among mean | groupby mean | rolling mean in twosigma#51

jbrockmendel · 2021-08-22T20:47:48Z

So if I were to go into window_aggregations and 2D-ify roll_foo, would that get the job done? or would we still have to handle UDFs somewhere?

mroeschke · 2021-08-23T23:45:32Z

I think some of the plumbing in rolling.py would need to support the 2d roll_foo, but that should be all that is needed.

Also since there is roll.apply, users would now need to make sure their UDF supports 2D inputs?

(IMO your approach in #43171 is reasonable enough)

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2021

mroeschke added Performance Memory or execution speed performance Refactor Internal refactoring of code Window rolling, ewma, expanding and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 26, 2021

jbrockmendel mentioned this issue Aug 22, 2021

PERF: Rolling._apply #43171

Merged

4 tasks

mroeschke closed this as completed in #43171 Aug 25, 2021

mroeschke added this to the 1.4 milestone Aug 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Rolling._apply iterates over blocks _and_ columns #42736

PERF: Rolling._apply iterates over blocks _and_ columns #42736

jbrockmendel commented Jul 26, 2021

mroeschke commented Jul 26, 2021

jbrockmendel commented Jul 29, 2021

mroeschke commented Jul 29, 2021

jbrockmendel commented Aug 22, 2021

mroeschke commented Aug 23, 2021

PERF: Rolling._apply iterates over blocks _and_ columns #42736

PERF: Rolling._apply iterates over blocks _and_ columns #42736

Comments

jbrockmendel commented Jul 26, 2021

mroeschke commented Jul 26, 2021

jbrockmendel commented Jul 29, 2021

mroeschke commented Jul 29, 2021

jbrockmendel commented Aug 22, 2021

mroeschke commented Aug 23, 2021