Skip to content

Commit 7bfa883

Browse files
mroeschkeAlexKirko
authored andcommitted
ENH: Add numba engine for rolling apply (pandas-dev#30151)
1 parent 8853bbc commit 7bfa883

16 files changed

+552
-96
lines changed

ci/deps/azure-36-minimum_versions.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ dependencies:
1717
- beautifulsoup4=4.6.0
1818
- bottleneck=1.2.1
1919
- jinja2=2.8
20+
- numba=0.46.0
2021
- numexpr=2.6.2
2122
- numpy=1.13.3
2223
- openpyxl=2.5.7

ci/deps/azure-windows-36.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ dependencies:
1717
- bottleneck
1818
- fastparquet>=0.3.2
1919
- matplotlib=3.0.2
20+
- numba
2021
- numexpr
2122
- numpy=1.15.*
2223
- openpyxl

doc/source/getting_started/install.rst

+1
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,7 @@ gcsfs 0.2.2 Google Cloud Storage access
256256
html5lib HTML parser for read_html (see :ref:`note <optional_html>`)
257257
lxml 3.8.0 HTML parser for read_html (see :ref:`note <optional_html>`)
258258
matplotlib 2.2.2 Visualization
259+
numba 0.46.0 Alternative execution engine for rolling operations
259260
openpyxl 2.5.7 Reading / writing for xlsx files
260261
pandas-gbq 0.8.0 Google Big Query access
261262
psycopg2 PostgreSQL engine for sqlalchemy

doc/source/user_guide/computation.rst

+47
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,11 @@ We provide a number of common statistical functions:
321321
:meth:`~Rolling.cov`, Unbiased covariance (binary)
322322
:meth:`~Rolling.corr`, Correlation (binary)
323323

324+
.. _stats.rolling_apply:
325+
326+
Rolling Apply
327+
~~~~~~~~~~~~~
328+
324329
The :meth:`~Rolling.apply` function takes an extra ``func`` argument and performs
325330
generic rolling computations. The ``func`` argument should be a single function
326331
that produces a single value from an ndarray input. Suppose we wanted to
@@ -334,6 +339,48 @@ compute the mean absolute deviation on a rolling basis:
334339
@savefig rolling_apply_ex.png
335340
s.rolling(window=60).apply(mad, raw=True).plot(style='k')
336341
342+
.. versionadded:: 1.0
343+
344+
Additionally, :meth:`~Rolling.apply` can leverage `Numba <https://numba.pydata.org/>`__
345+
if installed as an optional dependency. The apply aggregation can be executed using Numba by specifying
346+
``engine='numba'`` and ``engine_kwargs`` arguments (``raw`` must also be set to ``True``).
347+
Numba will be applied in potentially two routines:
348+
349+
1. If ``func`` is a standard Python function, the engine will `JIT <http://numba.pydata.org/numba-doc/latest/user/overview.html>`__
350+
the passed function. ``func`` can also be a JITed function in which case the engine will not JIT the function again.
351+
2. The engine will JIT the for loop where the apply function is applied to each window.
352+
353+
The ``engine_kwargs`` argument is a dictionary of keyword arguments that will be passed into the
354+
`numba.jit decorator <https://numba.pydata.org/numba-doc/latest/reference/jit-compilation.html#numba.jit>`__.
355+
These keyword arguments will be applied to *both* the passed function (if a standard Python function)
356+
and the apply for loop over each window. Currently only ``nogil``, ``nopython``, and ``parallel`` are supported,
357+
and their default values are set to ``False``, ``True`` and ``False`` respectively.
358+
359+
.. note::
360+
361+
In terms of performance, **the first time a function is run using the Numba engine will be slow**
362+
as Numba will have some function compilation overhead. However, ``rolling`` objects will cache
363+
the function and subsequent calls will be fast. In general, the Numba engine is performant with
364+
a larger amount of data points (e.g. 1+ million).
365+
366+
.. code-block:: ipython
367+
368+
In [1]: data = pd.Series(range(1_000_000))
369+
370+
In [2]: roll = data.rolling(10)
371+
372+
In [3]: def f(x):
373+
...: return np.sum(x) + 5
374+
# Run the first time, compilation time will affect performance
375+
In [4]: %timeit -r 1 -n 1 roll.apply(f, engine='numba', raw=True) # noqa: E225
376+
1.23 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
377+
# Function is cached and performance will improve
378+
In [5]: %timeit roll.apply(f, engine='numba', raw=True)
379+
188 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
380+
381+
In [6]: %timeit roll.apply(f, engine='cython', raw=True)
382+
3.92 s ± 59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
383+
337384
.. _stats.rolling_window:
338385

339386
Rolling windows

doc/source/whatsnew/v1.0.0.rst

+13
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,17 @@ You can use the alias ``"boolean"`` as well.
169169
s = pd.Series([True, False, None], dtype="boolean")
170170
s
171171
172+
.. _whatsnew_1000.numba_rolling_apply:
173+
174+
Using Numba in ``rolling.apply``
175+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
176+
177+
We've added an ``engine`` keyword to :meth:`~Rolling.apply` that allows the user to execute the
178+
routine using `Numba <https://numba.pydata.org/>`__ instead of Cython. Using the Numba engine
179+
can yield significant performance gains if the apply function can operate on numpy arrays and
180+
the data set is larger (1 million rows or greater). For more details, see
181+
:ref:`rolling apply documentation <stats.rolling_apply>` (:issue:`28987`)
182+
172183
.. _whatsnew_1000.custom_window:
173184

174185
Defining custom windows for rolling operations
@@ -432,6 +443,8 @@ Optional libraries below the lowest tested version may still work, but are not c
432443
+-----------------+-----------------+---------+
433444
| matplotlib | 2.2.2 | |
434445
+-----------------+-----------------+---------+
446+
| numba | 0.46.0 | X |
447+
+-----------------+-----------------+---------+
435448
| openpyxl | 2.5.7 | X |
436449
+-----------------+-----------------+---------+
437450
| pyarrow | 0.12.0 | X |

environment.yml

+1
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ dependencies:
7575
- matplotlib>=2.2.2 # pandas.plotting, Series.plot, DataFrame.plot
7676
- numexpr>=2.6.8
7777
- scipy>=1.1
78+
- numba>=0.46.0
7879

7980
# optional for io
8081
# ---------------

pandas/compat/_optional.py

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
"xlrd": "1.1.0",
2929
"xlwt": "1.2.0",
3030
"xlsxwriter": "0.9.8",
31+
"numba": "0.46.0",
3132
}
3233

3334

pandas/core/window/common.py

+1
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ def _apply(
7070
floor: int = 1,
7171
is_weighted: bool = False,
7272
name: Optional[str] = None,
73+
use_numba_cache: bool = False,
7374
**kwargs,
7475
):
7576
"""

pandas/core/window/numba_.py

+127
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
import types
2+
from typing import Any, Callable, Dict, Optional, Tuple
3+
4+
import numpy as np
5+
6+
from pandas._typing import Scalar
7+
from pandas.compat._optional import import_optional_dependency
8+
9+
10+
def make_rolling_apply(
11+
func: Callable[..., Scalar],
12+
args: Tuple,
13+
nogil: bool,
14+
parallel: bool,
15+
nopython: bool,
16+
):
17+
"""
18+
Creates a JITted rolling apply function with a JITted version of
19+
the user's function.
20+
21+
Parameters
22+
----------
23+
func : function
24+
function to be applied to each window and will be JITed
25+
args : tuple
26+
*args to be passed into the function
27+
nogil : bool
28+
nogil parameter from engine_kwargs for numba.jit
29+
parallel : bool
30+
parallel parameter from engine_kwargs for numba.jit
31+
nopython : bool
32+
nopython parameter from engine_kwargs for numba.jit
33+
34+
Returns
35+
-------
36+
Numba function
37+
"""
38+
numba = import_optional_dependency("numba")
39+
40+
if parallel:
41+
loop_range = numba.prange
42+
else:
43+
loop_range = range
44+
45+
if isinstance(func, numba.targets.registry.CPUDispatcher):
46+
# Don't jit a user passed jitted function
47+
numba_func = func
48+
else:
49+
50+
@numba.generated_jit(nopython=nopython, nogil=nogil, parallel=parallel)
51+
def numba_func(window, *_args):
52+
if getattr(np, func.__name__, False) is func or isinstance(
53+
func, types.BuiltinFunctionType
54+
):
55+
jf = func
56+
else:
57+
jf = numba.jit(func, nopython=nopython, nogil=nogil)
58+
59+
def impl(window, *_args):
60+
return jf(window, *_args)
61+
62+
return impl
63+
64+
@numba.jit(nopython=nopython, nogil=nogil, parallel=parallel)
65+
def roll_apply(
66+
values: np.ndarray, begin: np.ndarray, end: np.ndarray, minimum_periods: int,
67+
) -> np.ndarray:
68+
result = np.empty(len(begin))
69+
for i in loop_range(len(result)):
70+
start = begin[i]
71+
stop = end[i]
72+
window = values[start:stop]
73+
count_nan = np.sum(np.isnan(window))
74+
if len(window) - count_nan >= minimum_periods:
75+
result[i] = numba_func(window, *args)
76+
else:
77+
result[i] = np.nan
78+
return result
79+
80+
return roll_apply
81+
82+
83+
def generate_numba_apply_func(
84+
args: Tuple,
85+
kwargs: Dict[str, Any],
86+
func: Callable[..., Scalar],
87+
engine_kwargs: Optional[Dict[str, bool]],
88+
):
89+
"""
90+
Generate a numba jitted apply function specified by values from engine_kwargs.
91+
92+
1. jit the user's function
93+
2. Return a rolling apply function with the jitted function inline
94+
95+
Configurations specified in engine_kwargs apply to both the user's
96+
function _AND_ the rolling apply function.
97+
98+
Parameters
99+
----------
100+
args : tuple
101+
*args to be passed into the function
102+
kwargs : dict
103+
**kwargs to be passed into the function
104+
func : function
105+
function to be applied to each window and will be JITed
106+
engine_kwargs : dict
107+
dictionary of arguments to be passed into numba.jit
108+
109+
Returns
110+
-------
111+
Numba function
112+
"""
113+
114+
if engine_kwargs is None:
115+
engine_kwargs = {}
116+
117+
nopython = engine_kwargs.get("nopython", True)
118+
nogil = engine_kwargs.get("nogil", False)
119+
parallel = engine_kwargs.get("parallel", False)
120+
121+
if kwargs and nopython:
122+
raise ValueError(
123+
"numba does not support kwargs with nopython=True: "
124+
"https://github.com/numba/numba/issues/2916"
125+
)
126+
127+
return make_rolling_apply(func, args, nogil, parallel, nopython)

0 commit comments

Comments
 (0)