Skip to content

add numba example to enhancingperf.rst #10257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 3, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 70 additions & 16 deletions doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import os
import csv
from pandas import DataFrame
from pandas import DataFrame, Series
import pandas as pd
pd.options.display.max_rows=15

Expand Down Expand Up @@ -68,9 +68,10 @@ Here's the function in pure python:

We achieve our result by using ``apply`` (row-wise):

.. ipython:: python
.. code-block:: python

%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
In [7]: %timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
10 loops, best of 3: 174 ms per loop

But clearly this isn't fast enough for us. Let's take a look and see where the
time is spent during this operation (limited to the most time consuming
Expand All @@ -97,7 +98,7 @@ First we're going to need to import the cython magic function to ipython:

.. ipython:: python

%load_ext cythonmagic
%load_ext Cython


Now, let's simply copy our functions over to cython as is (the suffix
Expand All @@ -122,9 +123,10 @@ is here to distinguish between function versions):
to be using bleeding edge ipython for paste to play well with cell magics.


.. ipython:: python
.. code-block:: python

%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)
In [4]: %timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)
10 loops, best of 3: 85.5 ms per loop

Already this has shaved a third off, not too bad for a simple copy and paste.

Expand All @@ -150,9 +152,10 @@ We get another huge improvement simply by providing type information:
...: return s * dx
...:

.. ipython:: python
.. code-block:: python

%timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
In [4]: %timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
10 loops, best of 3: 20.3 ms per loop

Now, we're talking! It's now over ten times faster than the original python
implementation, and we haven't *really* modified the code. Let's have another
Expand Down Expand Up @@ -229,9 +232,10 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra
Loops like this would be *extremely* slow in python, but in Cython looping
over numpy arrays is *fast*.

.. ipython:: python
.. code-block:: python

%timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
In [4]: %timeit apply_integrate_f(df['a'].values, df['b'].values, df['N'].values)
1000 loops, best of 3: 1.25 ms per loop

We've gotten another big improvement. Let's check again where the time is spent:

Expand Down Expand Up @@ -278,20 +282,70 @@ advanced cython techniques:
...: return res
...:

.. ipython:: python
.. code-block:: python

%timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
In [4]: %timeit apply_integrate_f_wrap(df['a'].values, df['b'].values, df['N'].values)
1000 loops, best of 3: 987 us per loop

Even faster, with the caveat that a bug in our cython code (an off-by-one error,
for example) might cause a segfault because memory access isn't checked.


Further topics
~~~~~~~~~~~~~~
.. _enhancingperf.numba:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this link at the end of the cython section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

links already at the top of the section


Using numba
-----------

A recent alternative to statically compiling cython code, is to use a *dynamic jit-compiler*, ``numba``.

Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.

Numba works by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool). Numba supports compilation of Python to run on either CPU or GPU hardware, and is designed to integrate with the Python scientific software stack.

.. note::

You will need to install ``numba``. This is easy with ``conda``, by using: ``conda install numba``, see :ref:`installing using miniconda<install.miniconda>`.

We simply take the plain python code from above and annotate with the ``@jit`` decorator.

.. code-block:: python

import numba

@numba.jit
def f_plain(x):
return x * (x - 1)

@numba.jit
def integrate_f_numba(a, b, N):
s = 0
dx = (b - a) / N
for i in range(N):
s += f_plain(a + i * dx)
return s * dx

@numba.jit
def apply_integrate_f_numba(col_a, col_b, col_N):
n = len(col_N)
result = np.empty(n, dtype='float64')
assert len(col_a) == len(col_b) == n
for i in range(n):
result[i] = integrate_f_numba(col_a[i], col_b[i], col_N[i])
return result

def compute_numba(df):
result = apply_integrate_f_numba(df['a'].values, df['b'].values, df['N'].values)
return Series(result, index=df.index, name='result')

Similar to above, we directly pass ``numpy`` arrays directly to the numba function. Further
we are wrapping the results to provide a nice interface by passing/returning pandas objects.

.. code-block:: python

- Loading C modules into cython.
In [4]: %timeit compute_numba(df)
1000 loops, best of 3: 798 us per loop

Read more in the `cython docs <http://docs.cython.org/>`__.
Read more in the `numba docs <http://numba.pydata.org/>`__.

.. _enhancingperf.eval:

Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.16.2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ We recommend that all users upgrade to this version.

Highlights include:

- Documentation on how to use ``numba`` with *pandas*, see :ref:`here <enhancingperf.numba>`

Check the :ref:`API Changes <whatsnew_0162.api>` before updating.

.. contents:: What's new in v0.16.2
Expand Down