Skip to content

Commit 45aeaac

Browse files
Backport PR #42439: DOC: Refactor Numba enhancing performance and add parallelism caveat (#42490)
Co-authored-by: Matthew Roeschke <[email protected]>
1 parent 25f402c commit 45aeaac

File tree

4 files changed

+87
-140
lines changed

4 files changed

+87
-140
lines changed

doc/source/user_guide/enhancingperf.rst

+59-38
Original file line numberDiff line numberDiff line change
@@ -302,28 +302,63 @@ For more about ``boundscheck`` and ``wraparound``, see the Cython docs on
302302

303303
.. _enhancingperf.numba:
304304

305-
Using Numba
306-
-----------
305+
Numba (JIT compilation)
306+
-----------------------
307307

308-
A recent alternative to statically compiling Cython code, is to use a *dynamic jit-compiler*, Numba.
308+
An alternative to statically compiling Cython code is to use a dynamic just-in-time (JIT) compiler with `Numba <https://numba.pydata.org/>`__.
309309

310-
Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.
310+
Numba allows you to write a pure Python function which can be JIT compiled to native machine instructions, similar in performance to C, C++ and Fortran,
311+
by decorating your function with ``@jit``.
311312

312-
Numba works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool). Numba supports compilation of Python to run on either CPU or GPU hardware, and is designed to integrate with the Python scientific software stack.
313+
Numba works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool).
314+
Numba supports compilation of Python to run on either CPU or GPU hardware and is designed to integrate with the Python scientific software stack.
313315

314316
.. note::
315317

316-
You will need to install Numba. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.
318+
The ``@jit`` compilation will add overhead to the runtime of the function, so performance benefits may not be realized especially when using small data sets.
319+
Consider `caching <https://numba.readthedocs.io/en/stable/developer/caching.html>`__ your function to avoid compilation overhead each time your function is run.
317320

318-
.. note::
321+
Numba can be used in 2 ways with pandas:
322+
323+
#. Specify the ``engine="numba"`` keyword in select pandas methods
324+
#. Define your own Python function decorated with ``@jit`` and pass the underlying NumPy array of :class:`Series` or :class:`Dataframe` (using ``to_numpy()``) into the function
325+
326+
pandas Numba Engine
327+
~~~~~~~~~~~~~~~~~~~
328+
329+
If Numba is installed, one can specify ``engine="numba"`` in select pandas methods to execute the method using Numba.
330+
Methods that support ``engine="numba"`` will also have an ``engine_kwargs`` keyword that accepts a dictionary that allows one to specify
331+
``"nogil"``, ``"nopython"`` and ``"parallel"`` keys with boolean values to pass into the ``@jit`` decorator.
332+
If ``engine_kwargs`` is not specified, it defaults to ``{"nogil": False, "nopython": True, "parallel": False}`` unless otherwise specified.
333+
334+
In terms of performance, **the first time a function is run using the Numba engine will be slow**
335+
as Numba will have some function compilation overhead. However, the JIT compiled functions are cached,
336+
and subsequent calls will be fast. In general, the Numba engine is performant with
337+
a larger amount of data points (e.g. 1+ million).
319338

320-
As of Numba version 0.20, pandas objects cannot be passed directly to Numba-compiled functions. Instead, one must pass the NumPy array underlying the pandas object to the Numba-compiled function as demonstrated below.
339+
.. code-block:: ipython
340+
341+
In [1]: data = pd.Series(range(1_000_000)) # noqa: E225
342+
343+
In [2]: roll = data.rolling(10)
321344
322-
Jit
323-
~~~
345+
In [3]: def f(x):
346+
...: return np.sum(x) + 5
347+
# Run the first time, compilation time will affect performance
348+
In [4]: %timeit -r 1 -n 1 roll.apply(f, engine='numba', raw=True)
349+
1.23 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
350+
# Function is cached and performance will improve
351+
In [5]: %timeit roll.apply(f, engine='numba', raw=True)
352+
188 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
324353
325-
We demonstrate how to use Numba to just-in-time compile our code. We simply
326-
take the plain Python code from above and annotate with the ``@jit`` decorator.
354+
In [6]: %timeit roll.apply(f, engine='cython', raw=True)
355+
3.92 s ± 59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
356+
357+
Custom Function Examples
358+
~~~~~~~~~~~~~~~~~~~~~~~~
359+
360+
A custom Python function decorated with ``@jit`` can be used with pandas objects by passing their NumPy array
361+
representations with ``to_numpy()``.
327362

328363
.. code-block:: python
329364
@@ -360,8 +395,6 @@ take the plain Python code from above and annotate with the ``@jit`` decorator.
360395
)
361396
return pd.Series(result, index=df.index, name="result")
362397
363-
Note that we directly pass NumPy arrays to the Numba function. ``compute_numba`` is just a wrapper that provides a
364-
nicer interface by passing/returning pandas objects.
365398
366399
.. code-block:: ipython
367400
@@ -370,19 +403,9 @@ nicer interface by passing/returning pandas objects.
370403
371404
In this example, using Numba was faster than Cython.
372405

373-
Numba as an argument
374-
~~~~~~~~~~~~~~~~~~~~
375-
376-
Additionally, we can leverage the power of `Numba <https://numba.pydata.org/>`__
377-
by calling it as an argument in :meth:`~Rolling.apply`. See :ref:`Computation tools
378-
<window.numba_engine>` for an extensive example.
379-
380-
Vectorize
381-
~~~~~~~~~
382-
383406
Numba can also be used to write vectorized functions that do not require the user to explicitly
384407
loop over the observations of a vector; a vectorized function will be applied to each row automatically.
385-
Consider the following toy example of doubling each observation:
408+
Consider the following example of doubling each observation:
386409

387410
.. code-block:: python
388411
@@ -414,25 +437,23 @@ Consider the following toy example of doubling each observation:
414437
Caveats
415438
~~~~~~~
416439

417-
.. note::
418-
419-
Numba will execute on any function, but can only accelerate certain classes of functions.
420-
421440
Numba is best at accelerating functions that apply numerical functions to NumPy
422-
arrays. When passed a function that only uses operations it knows how to
423-
accelerate, it will execute in ``nopython`` mode.
424-
425-
If Numba is passed a function that includes something it doesn't know how to
426-
work with -- a category that currently includes sets, lists, dictionaries, or
427-
string functions -- it will revert to ``object mode``. In ``object mode``,
428-
Numba will execute but your code will not speed up significantly. If you would
441+
arrays. If you try to ``@jit`` a function that contains unsupported `Python <https://numba.readthedocs.io/en/stable/reference/pysupported.html>`__
442+
or `NumPy <https://numba.readthedocs.io/en/stable/reference/numpysupported.html>`__
443+
code, compilation will revert `object mode <https://numba.readthedocs.io/en/stable/glossary.html#term-object-mode>`__ which
444+
will mostly likely not speed up your function. If you would
429445
prefer that Numba throw an error if it cannot compile a function in a way that
430446
speeds up your code, pass Numba the argument
431-
``nopython=True`` (e.g. ``@numba.jit(nopython=True)``). For more on
447+
``nopython=True`` (e.g. ``@jit(nopython=True)``). For more on
432448
troubleshooting Numba modes, see the `Numba troubleshooting page
433449
<https://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#the-compiled-code-is-too-slow>`__.
434450

435-
Read more in the `Numba docs <https://numba.pydata.org/>`__.
451+
Using ``parallel=True`` (e.g. ``@jit(parallel=True)``) may result in a ``SIGABRT`` if the threading layer leads to unsafe
452+
behavior. You can first `specify a safe threading layer <https://numba.readthedocs.io/en/stable/user/threading-layer.html#selecting-a-threading-layer-for-safe-parallel-execution>`__
453+
before running a JIT function with ``parallel=True``.
454+
455+
Generally if the you encounter a segfault (``SIGSEGV``) while using Numba, please report the issue
456+
to the `Numba issue tracker. <https://github.com/numba/numba/issues/new/choose>`__
436457

437458
.. _enhancingperf.eval:
438459

doc/source/user_guide/groupby.rst

+3-51
Original file line numberDiff line numberDiff line change
@@ -1106,11 +1106,9 @@ Numba Accelerated Routines
11061106
.. versionadded:: 1.1
11071107

11081108
If `Numba <https://numba.pydata.org/>`__ is installed as an optional dependency, the ``transform`` and
1109-
``aggregate`` methods support ``engine='numba'`` and ``engine_kwargs`` arguments. The ``engine_kwargs``
1110-
argument is a dictionary of keyword arguments that will be passed into the
1111-
`numba.jit decorator <https://numba.pydata.org/numba-doc/latest/reference/jit-compilation.html#numba.jit>`__.
1112-
These keyword arguments will be applied to the passed function. Currently only ``nogil``, ``nopython``,
1113-
and ``parallel`` are supported, and their default values are set to ``False``, ``True`` and ``False`` respectively.
1109+
``aggregate`` methods support ``engine='numba'`` and ``engine_kwargs`` arguments.
1110+
See :ref:`enhancing performance with Numba <enhancingperf.numba>` for general usage of the arguments
1111+
and performance considerations.
11141112

11151113
The function signature must start with ``values, index`` **exactly** as the data belonging to each group
11161114
will be passed into ``values``, and the group index will be passed into ``index``.
@@ -1121,52 +1119,6 @@ will be passed into ``values``, and the group index will be passed into ``index`
11211119
data and group index will be passed as NumPy arrays to the JITed user defined function, and no
11221120
alternative execution attempts will be tried.
11231121

1124-
.. note::
1125-
1126-
In terms of performance, **the first time a function is run using the Numba engine will be slow**
1127-
as Numba will have some function compilation overhead. However, the compiled functions are cached,
1128-
and subsequent calls will be fast. In general, the Numba engine is performant with
1129-
a larger amount of data points (e.g. 1+ million).
1130-
1131-
.. code-block:: ipython
1132-
1133-
In [1]: N = 10 ** 3
1134-
1135-
In [2]: data = {0: [str(i) for i in range(100)] * N, 1: list(range(100)) * N}
1136-
1137-
In [3]: df = pd.DataFrame(data, columns=[0, 1])
1138-
1139-
In [4]: def f_numba(values, index):
1140-
...: total = 0
1141-
...: for i, value in enumerate(values):
1142-
...: if i % 2:
1143-
...: total += value + 5
1144-
...: else:
1145-
...: total += value * 2
1146-
...: return total
1147-
...:
1148-
1149-
In [5]: def f_cython(values):
1150-
...: total = 0
1151-
...: for i, value in enumerate(values):
1152-
...: if i % 2:
1153-
...: total += value + 5
1154-
...: else:
1155-
...: total += value * 2
1156-
...: return total
1157-
...:
1158-
1159-
In [6]: groupby = df.groupby(0)
1160-
# Run the first time, compilation time will affect performance
1161-
In [7]: %timeit -r 1 -n 1 groupby.aggregate(f_numba, engine='numba') # noqa: E225
1162-
2.14 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
1163-
# Function is cached and performance will improve
1164-
In [8]: %timeit groupby.aggregate(f_numba, engine='numba')
1165-
4.93 ms ± 32.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1166-
1167-
In [9]: %timeit groupby.aggregate(f_cython, engine='cython')
1168-
18.6 ms ± 84.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1169-
11701122
Other useful features
11711123
---------------------
11721124

doc/source/user_guide/window.rst

+23-49
Original file line numberDiff line numberDiff line change
@@ -262,26 +262,24 @@ and we want to use an expanding window where ``use_expanding`` is ``True`` other
262262
.. code-block:: ipython
263263
264264
In [2]: from pandas.api.indexers import BaseIndexer
265-
...:
266-
...: class CustomIndexer(BaseIndexer):
267-
...:
268-
...: def get_window_bounds(self, num_values, min_periods, center, closed):
269-
...: start = np.empty(num_values, dtype=np.int64)
270-
...: end = np.empty(num_values, dtype=np.int64)
271-
...: for i in range(num_values):
272-
...: if self.use_expanding[i]:
273-
...: start[i] = 0
274-
...: end[i] = i + 1
275-
...: else:
276-
...: start[i] = i
277-
...: end[i] = i + self.window_size
278-
...: return start, end
279-
...:
280-
281-
In [3]: indexer = CustomIndexer(window_size=1, use_expanding=use_expanding)
282-
283-
In [4]: df.rolling(indexer).sum()
284-
Out[4]:
265+
266+
In [3]: class CustomIndexer(BaseIndexer):
267+
...: def get_window_bounds(self, num_values, min_periods, center, closed):
268+
...: start = np.empty(num_values, dtype=np.int64)
269+
...: end = np.empty(num_values, dtype=np.int64)
270+
...: for i in range(num_values):
271+
...: if self.use_expanding[i]:
272+
...: start[i] = 0
273+
...: end[i] = i + 1
274+
...: else:
275+
...: start[i] = i
276+
...: end[i] = i + self.window_size
277+
...: return start, end
278+
279+
In [4]: indexer = CustomIndexer(window_size=1, use_expanding=use_expanding)
280+
281+
In [5]: df.rolling(indexer).sum()
282+
Out[5]:
285283
values
286284
0 0.0
287285
1 1.0
@@ -365,45 +363,21 @@ Numba engine
365363
Additionally, :meth:`~Rolling.apply` can leverage `Numba <https://numba.pydata.org/>`__
366364
if installed as an optional dependency. The apply aggregation can be executed using Numba by specifying
367365
``engine='numba'`` and ``engine_kwargs`` arguments (``raw`` must also be set to ``True``).
366+
See :ref:`enhancing performance with Numba <enhancingperf.numba>` for general usage of the arguments and performance considerations.
367+
368368
Numba will be applied in potentially two routines:
369369

370370
#. If ``func`` is a standard Python function, the engine will `JIT <https://numba.pydata.org/numba-doc/latest/user/overview.html>`__ the passed function. ``func`` can also be a JITed function in which case the engine will not JIT the function again.
371371
#. The engine will JIT the for loop where the apply function is applied to each window.
372372

373-
.. versionadded:: 1.3.0
374-
375-
``mean``, ``median``, ``max``, ``min``, and ``sum`` also support the ``engine`` and ``engine_kwargs`` arguments.
376-
377373
The ``engine_kwargs`` argument is a dictionary of keyword arguments that will be passed into the
378374
`numba.jit decorator <https://numba.pydata.org/numba-doc/latest/reference/jit-compilation.html#numba.jit>`__.
379375
These keyword arguments will be applied to *both* the passed function (if a standard Python function)
380-
and the apply for loop over each window. Currently only ``nogil``, ``nopython``, and ``parallel`` are supported,
381-
and their default values are set to ``False``, ``True`` and ``False`` respectively.
382-
383-
.. note::
376+
and the apply for loop over each window.
384377

385-
In terms of performance, **the first time a function is run using the Numba engine will be slow**
386-
as Numba will have some function compilation overhead. However, the compiled functions are cached,
387-
and subsequent calls will be fast. In general, the Numba engine is performant with
388-
a larger amount of data points (e.g. 1+ million).
389-
390-
.. code-block:: ipython
391-
392-
In [1]: data = pd.Series(range(1_000_000))
393-
394-
In [2]: roll = data.rolling(10)
395-
396-
In [3]: def f(x):
397-
...: return np.sum(x) + 5
398-
# Run the first time, compilation time will affect performance
399-
In [4]: %timeit -r 1 -n 1 roll.apply(f, engine='numba', raw=True) # noqa: E225, E999
400-
1.23 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
401-
# Function is cached and performance will improve
402-
In [5]: %timeit roll.apply(f, engine='numba', raw=True)
403-
188 ms ± 1.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
378+
.. versionadded:: 1.3.0
404379

405-
In [6]: %timeit roll.apply(f, engine='cython', raw=True)
406-
3.92 s ± 59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
380+
``mean``, ``median``, ``max``, ``min``, and ``sum`` also support the ``engine`` and ``engine_kwargs`` arguments.
407381

408382
.. _window.cov_corr:
409383

pandas/core/window/doc.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@ def create_section_header(header: str) -> str:
9494
).replace("\n", "", 1)
9595

9696
numba_notes = (
97-
"See :ref:`window.numba_engine` for extended documentation "
98-
"and performance considerations for the Numba engine.\n\n"
97+
"See :ref:`window.numba_engine` and :ref:`enhancingperf.numba` for "
98+
"extended documentation and performance considerations for the Numba engine.\n\n"
9999
)
100100

101101
window_agg_numba_parameters = dedent(

0 commit comments

Comments
 (0)