-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH add cython tutorial #3965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
ENH add cython tutorial #3965
Changes from 2 commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,219 @@ | ||
.. _cython: | ||
|
||
.. currentmodule:: pandas | ||
|
||
.. ipython:: python | ||
:suppress: | ||
|
||
import os | ||
import csv | ||
from pandas import DataFrame | ||
import pandas as pd | ||
|
||
import numpy as np | ||
np.random.seed(123456) | ||
randn = np.random.randn | ||
randint = np.random.randint | ||
np.set_printoptions(precision=4, suppress=True) | ||
|
||
|
||
**************************************** | ||
Cython (Writing C extensions for pandas) | ||
**************************************** | ||
|
||
For many use cases writing pandas in pure python and numpy is sufficient. In some computationally heavy applications however, it can be possible to achieve sizeable speed-ups by offloading work to `cython <http://cython.org/>`_. | ||
|
||
- Say something about this being tutorial for "advanced" users? | ||
|
||
.. note:: | ||
|
||
The first thing to do here is to see if we can refactor in python, removing for loops (TODO add some waffle, and maybe trivial example, maybe even just using a for loop rather than apply in this example) a way which could make use of numpy... | ||
|
||
|
||
This tutorial walksthrough a "typical" process of cythonizing a slow computation, we use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_ in the context of pandas: | ||
|
||
We have a function, ``integrate_f``, which we want to apply row-wise across a DataFrame, ``df``: | ||
|
||
.. ipython:: python | ||
|
||
df = DataFrame({'x': 'x', 'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000))}) | ||
df | ||
|
||
.. ipython:: python | ||
|
||
def f(x): | ||
return x * (x - 1) | ||
|
||
def integrate_f(a, b, N): | ||
s = 0 | ||
dx = (b - a) / N | ||
for i in range(N): | ||
s += f(a + i * dx) | ||
return s * dx | ||
|
||
In pure pandas we might achieve this using a row-wise ``apply``: | ||
|
||
.. ipython:: python | ||
|
||
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1) | ||
|
||
Clearly this isn't fast enough for us, so let's take a look and see where the time is spent performing this operation (limited to the most time consuming four calls) using the `prun ipython magic function <http://ipython.org/ipython-doc/stable/api/generated/IPython.core.magics.execution.html#IPython.core.magics.execution.ExecutionMagics.prun>`_: | ||
|
||
.. ipython:: python | ||
|
||
%prun -l 4 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1) | ||
|
||
By far the majority of time is spend inside either ``integrate_f`` or ``f``, hence we concentrate our efforts cythonizing these two functions. | ||
|
||
.. note:: | ||
|
||
In python 2 replacing the ``range`` with its generator counterpart (``xrange``) would mean the ``range`` line would vanish. In python 3 range is already a generator. | ||
|
||
First, let's simply just copy our function over to cython as is (here the ``_plain`` suffix stands for "plain cython", allowing us to distinguish between our cython functions): | ||
|
||
.. ipython:: python | ||
|
||
%load_ext cythonmagic | ||
|
||
.. ipython:: | ||
|
||
In [2]: %%cython | ||
...: def f_plain(x): | ||
...: return x * (x - 1) | ||
...: def integrate_f_plain(a, b, N): | ||
...: s = 0 | ||
...: dx = (b - a) / N | ||
...: for i in range(N): | ||
...: s += f_plain(a + i * dx) | ||
...: return s * dx | ||
...: | ||
|
||
.. ipython:: python | ||
|
||
%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1) | ||
|
||
|
||
We're already shaved a third off, not too bad for a simple copy and paste. We'll get another huge improvement simply by providing type information: | ||
|
||
.. ipython:: | ||
|
||
In [3]: %%cython | ||
...: cdef double f_typed(double x) except? -2: | ||
...: return x * (x - 1) | ||
...: cpdef double integrate_f_typed(double a, double b, int N): | ||
...: cdef int i | ||
...: cdef double s, dx | ||
...: s = 0 | ||
...: dx = (b - a) / N | ||
...: for i in range(N): | ||
...: s += f_typed(a + i * dx) | ||
...: return s * dx | ||
...: | ||
|
||
.. ipython:: python | ||
|
||
%timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1) | ||
|
||
Now, we're talking! Already we're over ten times faster than the original python version, and we haven't *really* modified the code. Let's go back and have another look at what's eating up time now: | ||
|
||
.. ipython:: python | ||
|
||
%prun -l 4 df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1) | ||
|
||
It's calling series and frames... a lot, in fact they're getting called for every row in the DataFrame. Function calls are expensive in python, so maybe we should cythonize the apply part and see if we can minimise these. | ||
|
||
We are now passing ndarrays into the cython function, fortunately cython plays very nicely with numpy. TODO mention the ``Py_ssize_t``. | ||
|
||
.. ipython:: | ||
|
||
In [4]: %%cython | ||
...: cimport numpy as np | ||
...: import numpy as np | ||
...: cdef double f_typed(double x) except? -2: | ||
...: return x**2-x | ||
...: cpdef double integrate_f_typed(double a, double b, int N): | ||
...: cdef int i | ||
...: cdef double s, dx | ||
...: s = 0 | ||
...: dx = (b-a)/N | ||
...: for i in range(N): | ||
...: s += f_typed(a+i*dx) | ||
...: return s * dx | ||
...: cpdef np.ndarray[double] apply_integrate_f(np.ndarray col_a, np.ndarray col_b, np.ndarray col_N): | ||
...: assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int) | ||
...: cdef Py_ssize_t i, n = len(col_N) | ||
...: assert (len(col_a) == len(col_b) == n) | ||
...: cdef np.ndarray[double] res = np.empty(n) | ||
...: for i in range(len(col_a)): | ||
...: res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i]) | ||
...: return res | ||
...: | ||
|
||
|
||
We create an array of zeros and loop over the rows, applying our ``integrate_f_typed`` function to fill it up. It's worth mentioning here that although a loop like this would be extremely slow in python (TODO: "as we saw" considerably slower than the apply?) while looping over a numpy array in cython is *fast*. | ||
|
||
.. ipython:: python | ||
|
||
%timeit apply_integrate_f(df['a'], df['b'], df['N']) | ||
|
||
We've gone another three times faster! Let's check again where the time is spent: | ||
|
||
.. ipython:: python | ||
|
||
%prun -l 4 apply_integrate_f(df['a'], df['b'], df['N']) | ||
|
||
As on might expect, the majority of the time is now spent in ``apply_integrate_f``, so if we wanted to make anymore efficiencies we must continue to concentrate our efforts here... | ||
|
||
TODO explain decorators, and why they make it so fast! | ||
|
||
.. ipython:: | ||
|
||
In [5]: %%cython | ||
...: cimport cython | ||
...: cimport numpy as np | ||
...: import numpy as np | ||
...: cdef double f_typed(double x) except? -2: | ||
...: return x**2-x | ||
...: cpdef double integrate_f_typed(double a, double b, int N): | ||
...: cdef int i | ||
...: cdef double s, dx | ||
...: s = 0 | ||
...: dx = (b-a)/N | ||
...: for i in range(N): | ||
...: s += f_typed(a+i*dx) | ||
...: return s * dx | ||
...: @cython.boundscheck(False) | ||
...: @cython.wraparound(False) | ||
...: cpdef np.ndarray[double] apply_integrate_f_wrap(np.ndarray[double] col_a, np.ndarray[double] col_b, np.ndarray[Py_ssize_t] col_N): | ||
...: cdef Py_ssize_t i, n = len(col_N) | ||
...: assert len(col_a) == len(col_b) == n | ||
...: cdef np.ndarray[double] res = np.empty(n) | ||
...: for i in range(n): | ||
...: res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i]) | ||
...: return res | ||
...: | ||
|
||
.. ipython:: python | ||
|
||
%timeit apply_integrate_f_wrap(df['a'], df['b'], df['N']) | ||
|
||
Again we've shaved another third off, so let's have a look at where the time is spent: | ||
|
||
.. ipython:: python | ||
|
||
%prun -l 4 apply_integrate_f_wrap(df['a'], df['b'], df['N']) | ||
|
||
We can see that now all the time appears to be spent in ``apply_integrate_f_wrap`` and not much anywhere else. It would make sense to continue looking here for efficiencies... | ||
|
||
TODO more? Have a 2D ndarray example? | ||
|
||
Using cython has made our calculation around 100 times faster than the original python only version, and yet we're left with something which doesn't look too dissimilar. | ||
|
||
TODO some warning that you don't need to cythonize every function (!) | ||
|
||
Further topics: | ||
|
||
- One can also load in functions from other C modules you've already written. | ||
- More?? | ||
|
||
Read more in the `cython docs <http://docs.cython.org/>`_. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -131,5 +131,6 @@ See the package overview for more detail about what's in the library. | |
r_interface | ||
related | ||
comparison_with_r | ||
cython | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think you should put this after sparse (up a couple) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure (as "Enhancing Performance"). |
||
api | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be too long for the main docs, maybe
Performance
orEnhancing Performance
....??There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, that way this section can grow in scope later. I prefer Enhancing Performance (think performance would be too ambiguous to a possible section broadly comparing speeds/memory usage :) )