Skip to content

Extended docs on numba #10614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 22, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 41 additions & 2 deletions doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,10 @@ Numba works by generating optimized machine code using the LLVM compiler infrast

You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.

.. note::

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still duplicative
don't say suffix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

As of ``numba`` version 0.20, pandas objects cannot be passed directly to numba-compiled functions. Instead, one must pass the ``numpy`` array underlying the ``pandas`` object to the numba-compiled function as demonstrated below.

We simply take the plain python code from above and annotate with the ``@jit`` decorator.

.. code-block:: python
Expand Down Expand Up @@ -338,14 +342,49 @@ We simply take the plain python code from above and annotate with the ``@jit`` d
result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values)
return pd.Series(result, index=df.index, name='result')

Similar to above, we directly pass ``numpy`` arrays directly to the numba function. Further
we are wrapping the results to provide a nice interface by passing/returning pandas objects.
Note that we directly pass ``numpy`` arrays to the numba function. ``compute_numba`` is just a wrapper that provides a nicer interface by passing/returning pandas objects.

.. code-block:: python

In [4]: %timeit compute_numba(df)
1000 loops, best of 3: 798 us per loop

``numba`` can also be used to write vectorized functions that do not require the user to explicitly
loop over the observations of a vector; a vectorized function will be applied to each row automatically.
Consider the following toy example of doubling each observation:

.. code-block:: python

import numba

def double_every_value_nonumba(x):
return x*2

@numba.vectorize
def double_every_value_withnumba(x):
return x*2


# Custom function without numba
In [5]: %timeit df['col1_doubled'] = df.a.apply(double_every_value_nonumba)
1000 loops, best of 3: 797 us per loop

# Standard implementation (faster than a custom function)
In [6]: %timeit df['col1_doubled'] = df.a*2
1000 loops, best of 3: 233 us per loop

# Custom function with numba
In [7]: %timeit df['col1_doubled'] = double_every_value_withnumba(df.a.values)
1000 loops, best of 3: 145 us per loop

.. note::

``numba`` will execute on any function, but can only accelerate certain classes of functions.

``numba`` is best at accelerating functions that apply numerical functions to numpy arrays. When passed a function that only uses operations it knows how to accelerate, it will execute in ``nopython`` mode.

If ``numba`` is passed a function that includes something it doesn't know how to work with -- a category that currently includes sets, lists, dictionaries, or string functions -- it will revert to ``object mode``. In ``object mode``, numba will execute but your code will not speed up significantly. If you would prefer that ``numba`` throw an error if it cannot compile a function in a way that speeds up your code, pass numba the argument ``nopython=True`` (e.g. ``@numba.jit(nopython=True)``). For more on troubleshooting ``numba`` modes, see the `numba troubleshooting page <http://numba.pydata.org/numba-doc/0.20.0/user/troubleshoot.html#the-compiled-code-is-too-slow>`__.

Read more in the `numba docs <http://numba.pydata.org/>`__.

.. _enhancingperf.eval:
Expand Down