Skip to content

Commit 640c5cb

Browse files
author
Nick Eubank
committed
Extended docs on numba
1 parent 6c48d12 commit 640c5cb

File tree

1 file changed

+41
-2
lines changed

1 file changed

+41
-2
lines changed

doc/source/enhancingperf.rst

+41-2
Original file line numberDiff line numberDiff line change
@@ -307,6 +307,10 @@ Numba works by generating optimized machine code using the LLVM compiler infrast
307307

308308
You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.
309309

310+
.. note::
311+
312+
As of ``numba`` version 0.20, pandas objects cannot be passed directly to numba-compiled functions. Instead, one must pass the ``numpy`` array underlying the ``pandas`` object to the numba-compiled function as demonstrated below.
313+
310314
We simply take the plain python code from above and annotate with the ``@jit`` decorator.
311315

312316
.. code-block:: python
@@ -338,14 +342,49 @@ We simply take the plain python code from above and annotate with the ``@jit`` d
338342
result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values)
339343
return pd.Series(result, index=df.index, name='result')
340344
341-
Similar to above, we directly pass ``numpy`` arrays directly to the numba function. Further
342-
we are wrapping the results to provide a nice interface by passing/returning pandas objects.
345+
Note that we directly pass ``numpy`` arrays to the numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects.
343346

344347
.. code-block:: python
345348
346349
In [4]: %timeit compute_numba(df)
347350
1000 loops, best of 3: 798 us per loop
348351
352+
``numba`` can also be used to write vectorized functions that do not require the user to explicitly
353+
loop over the observations of a vector; a vectorized function will be applied to each row automatically.
354+
Consider the following toy example of doubling each observation:
355+
356+
.. code-block:: python
357+
358+
import numba
359+
360+
def double_every_value_nonumba(x):
361+
return x*2
362+
363+
@numba.vectorize
364+
def double_every_value_withnumba(x):
365+
return x*2
366+
367+
368+
# Custom function without numba
369+
In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba)
370+
1000 loops, best of 3: 797 us per loop
371+
372+
# Standard implementation (faster than a custom function)
373+
In [6]: %timeit df['col1_doubled'] = df.a*2
374+
1000 loops, best of 3: 233 us per loop
375+
376+
# Custom function with numba
377+
In [7]: %timeit df['col1_doubled'] = double_every_value_withnumba(df.a.values)
378+
1000 loops, best of 3: 145 us per loop
379+
380+
.. note::
381+
382+
``numba`` will execute on any function, but can only accelerate certain classes of functions.
383+
384+
``numba`` is best at accelerating functions that apply numerical functions to numpy arrays. When passed a function that only uses operations it knows how to accelerate, it will execute in ``nopython`` mode.
385+
386+
If ``numba`` is passed a function that includes something it doesn't know how to work with -- a category that currently includes sets, lists, dictionaries, or string functions -- it will revert to ``object mode``. In ``object mode``, numba will execute but your code will not speed up significantly. If you would prefer that ``numba`` throw an error if it cannot compile a function in a way that speeds up your code, pass numba the argument ``nopython=True`` (e.g. ``@numba.jit(nopython=True)``). For more on troubleshooting ``numba`` modes, see the `numba troubleshooting page <http://numba.pydata.org/numba-doc/0.20.0/user/troubleshoot.html#the-compiled-code-is-too-slow>`__.
387+
349388
Read more in the `numba docs <http://numba.pydata.org/>`__.
350389

351390
.. _enhancingperf.eval:

0 commit comments

Comments
 (0)