diff --git a/doc/source/enhancingperf.rst b/doc/source/enhancingperf.rst index 4ada4d4bbdfe5..456ac5e79ac4b 100644 --- a/doc/source/enhancingperf.rst +++ b/doc/source/enhancingperf.rst @@ -307,6 +307,10 @@ Numba works by generating optimized machine code using the LLVM compiler infrast You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda`. +.. note:: + + As of ``numba`` version 0.20, pandas objects cannot be passed directly to numba-compiled functions. Instead, one must pass the ``numpy`` array underlying the ``pandas`` object to the numba-compiled function as demonstrated below. + We simply take the plain python code from above and annotate with the ``@jit`` decorator. .. code-block:: python @@ -338,14 +342,49 @@ We simply take the plain python code from above and annotate with the ``@jit`` d result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values) return pd.Series(result, index=df.index, name='result') -Similar to above, we directly pass ``numpy`` arrays directly to the numba function. Further -we are wrapping the results to provide a nice interface by passing/returning pandas objects. +Note that we directly pass ``numpy`` arrays to the numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects. .. code-block:: python In [4]: %timeit compute_numba(df) 1000 loops, best of 3: 798 us per loop +``numba`` can also be used to write vectorized functions that do not require the user to explicitly +loop over the observations of a vector; a vectorized function will be applied to each row automatically. +Consider the following toy example of doubling each observation: + +.. code-block:: python + + import numba + + def double_every_value_nonumba(x): + return x*2 + + @numba.vectorize + def double_every_value_withnumba(x): + return x*2 + + + # Custom function without numba + In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba) + 1000 loops, best of 3: 797 us per loop + + # Standard implementation (faster than a custom function) + In [6]: %timeit df['col1_doubled'] = df.a*2 + 1000 loops, best of 3: 233 us per loop + + # Custom function with numba + In [7]: %timeit df['col1_doubled'] = double_every_value_withnumba(df.a.values) + 1000 loops, best of 3: 145 us per loop + +.. note:: + + ``numba`` will execute on any function, but can only accelerate certain classes of functions. + +``numba`` is best at accelerating functions that apply numerical functions to numpy arrays. When passed a function that only uses operations it knows how to accelerate, it will execute in ``nopython`` mode. + +If ``numba`` is passed a function that includes something it doesn't know how to work with -- a category that currently includes sets, lists, dictionaries, or string functions -- it will revert to ``object mode``. In ``object mode``, numba will execute but your code will not speed up significantly. If you would prefer that ``numba`` throw an error if it cannot compile a function in a way that speeds up your code, pass numba the argument ``nopython=True`` (e.g. ``@numba.jit(nopython=True)``). For more on troubleshooting ``numba`` modes, see the `numba troubleshooting page `__. + Read more in the `numba docs `__. .. _enhancingperf.eval: