You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/source/enhancingperf.rst
+99-45
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
.. _cython:
1
+
.. _enhancingperf:
2
2
3
3
.. currentmodule:: pandas
4
4
@@ -17,28 +17,42 @@
17
17
np.set_printoptions(precision=4, suppress=True)
18
18
19
19
20
-
****************************************
21
-
Cython (Writing C extensions for pandas)
22
-
****************************************
20
+
*********************
21
+
Enhancing Performance
22
+
*********************
23
23
24
-
For many use cases writing pandas in pure python and numpy is sufficient. In some computationally heavy applications however, it can be possible to achieve sizeable speed-ups by offloading work to `cython<http://cython.org/>`_.
24
+
.. _enhancingperf.cython:
25
25
26
-
- Say something about this being tutorial for "advanced" users?
26
+
Cython (Writing C extensions for pandas)
27
+
----------------------------------------
27
28
28
-
.. note::
29
+
For many use cases writing pandas in pure python and numpy is sufficient. In some
30
+
computationally heavy applications however, it can be possible to achieve sizeable
31
+
speed-ups by offloading work to `cython <http://cython.org/>`_.
29
32
30
-
The first thing to do here is to see if we can refactor in python, removing for loops (TODO add some waffle, and maybe trivial example, maybe even just using a for loop rather than apply in this example) a way which could make use of numpy...
33
+
This tutorial assumes you have refactored as much as possible in python, for example
34
+
trying to remove for loops and making use of numpy vectorization, it's always worth
35
+
optimising in python first.
31
36
37
+
This tutorial walks through a "typical" process of cythonizing a slow computation.
38
+
We use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_
39
+
but in the context of pandas. Our final cythonized solution is around 100 times
40
+
faster than the pure python.
32
41
33
-
This tutorial walksthrough a "typical" process of cythonizing a slow computation, we use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_ in the context of pandas:
42
+
.. _enhancingperf.pure:
34
43
35
-
We have a function, ``integrate_f``, which we want to apply row-wise across a DataFrame, ``df``:
44
+
Pure python
45
+
~~~~~~~~~~~
46
+
47
+
We have a DataFrame to which we want to apply a function row-wise.
Clearly this isn't fast enough for us, so let's take a look and see where the time is spent performing this operation (limited to the most time consuming four calls) using the `prun ipython magic function <http://ipython.org/ipython-doc/stable/api/generated/IPython.core.magics.execution.html#IPython.core.magics.execution.ExecutionMagics.prun>`_:
74
+
But clearly this isn't fast enough for us. Let's take a look and see where the
75
+
time is spent during this operation (limited to the most time consuming
76
+
four calls) using the `prun ipython magic function <http://ipython.org/ipython-doc/stable/api/generated/IPython.core.magics.execution.html#IPython.core.magics.execution.ExecutionMagics.prun>`_:
By far the majority of time is spend inside either ``integrate_f`` or ``f``, hence we concentrate our efforts cythonizing these two functions.
82
+
By far the majority of time is spend inside either ``integrate_f`` or ``f``,
83
+
hence we'll concentrate our efforts cythonizing these two functions.
67
84
68
85
.. note::
69
86
70
-
In python 2 replacing the ``range`` with its generator counterpart (``xrange``) would mean the ``range`` line would vanish. In python 3 range is already a generator.
87
+
In python 2 replacing the ``range`` with its generator counterpart (``xrange``)
88
+
would mean the ``range`` line would vanish. In python 3 range is already a generator.
71
89
72
-
First, let's simply just copy our function over to cython as is (here the ``_plain`` suffix stands for "plain cython", allowing us to distinguish between our cython functions):
90
+
.. _enhancingperf.plain:
91
+
92
+
Plain cython
93
+
~~~~~~~~~~~~
94
+
95
+
First we're going to need to import the cython magic function to ipython:
73
96
74
97
.. ipython:: python
75
98
76
99
%load_ext cythonmagic
77
100
101
+
102
+
Now, let's simply copy our functions over to cython as is (the suffix
103
+
is here to distinguish between function versions):
104
+
78
105
.. ipython::
79
106
80
107
In [2]: %%cython
@@ -88,12 +115,24 @@ First, let's simply just copy our function over to cython as is (here the ``_pla
88
115
...: return s * dx
89
116
...:
90
117
118
+
.. note::
119
+
120
+
If you're having trouble pasting the above into your ipython, you may need
121
+
to be using bleeding edge ipython for paste to play well with cell magics.
Now, we're talking! Already we're over ten times faster than the original python version, and we haven't *really* modified the code. Let's go back and have another look at what's eating up time now:
156
+
Now, we're talking! It's now over ten times faster than the original python
157
+
implementation, and we haven't *really* modified the code. Let's have another
It's calling series and frames... a lot, in fact they're getting called for every row in the DataFrame. Function calls are expensive in python, so maybe we should cythonize the apply part and see if we can minimise these.
164
+
.. _enhancingperf.ndarray:
165
+
166
+
Using ndarray
167
+
~~~~~~~~~~~~~
168
+
169
+
It's calling series... a lot! It's creating a Series from each row, and get-ting from both
170
+
the index and the series (three times for each row). Function calls are expensive
171
+
in python, so maybe we could minimise these by cythonizing the apply part.
172
+
173
+
.. note::
124
174
125
-
We are now passing ndarrays into the cython function, fortunately cython plays very nicely with numpy. TODO mention the ``Py_ssize_t``.
175
+
We are now passing ndarrays into the cython function, fortunately cython plays
176
+
very nicely with numpy.
126
177
127
178
.. ipython::
128
179
129
180
In [4]: %%cython
130
181
...: cimport numpy as np
131
182
...: import numpy as np
132
183
...: cdef double f_typed(double x) except? -2:
133
-
...: return x**2-x
184
+
...: return x * (x - 1)
134
185
...: cpdef double integrate_f_typed(double a, double b, int N):
...: assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int)
@@ -150,7 +201,14 @@ We are now passing ndarrays into the cython function, fortunately cython plays v
150
201
...:
151
202
152
203
153
-
We create an array of zeros and loop over the rows, applying our ``integrate_f_typed`` function to fill it up. It's worth mentioning here that although a loop like this would be extremely slow in python (TODO: "as we saw" considerably slower than the apply?) while looping over a numpy array in cython is *fast*.
204
+
The implementation is simple, it creates an array of zeros and loops over
205
+
the rows, applying our ``integrate_f_typed``, and putting this in the zeros array.
206
+
207
+
208
+
.. note::
209
+
210
+
Loop like this would be *extremely* slow in python, but in cython looping over
211
+
numpy arrays is *fast*.
154
212
155
213
.. ipython:: python
156
214
@@ -162,9 +220,17 @@ We've gone another three times faster! Let's check again where the time is spent
As on might expect, the majority of the time is now spent in ``apply_integrate_f``, so if we wanted to make anymore efficiencies we must continue to concentrate our efforts here...
223
+
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
224
+
so if we wanted to make anymore efficiencies we must continue to concentrate our
225
+
efforts here.
226
+
227
+
.. _enhancingperf.boundswrap:
166
228
167
-
TODO explain decorators, and why they make it so fast!
229
+
More advanced techniques
230
+
~~~~~~~~~~~~~~~~~~~~~~~~
231
+
232
+
There is still scope for improvement, here's an example of using some more
233
+
advanced cython techniques:
168
234
169
235
.. ipython::
170
236
@@ -173,14 +239,14 @@ TODO explain decorators, and why they make it so fast!
173
239
...: cimport numpy as np
174
240
...: import numpy as np
175
241
...: cdef double f_typed(double x) except? -2:
176
-
...: return x**2-x
242
+
...: return x * (x - 1)
177
243
...: cpdef double integrate_f_typed(double a, double b, int N):
178
244
...: cdef int i
179
245
...: cdef double s, dx
180
246
...: s = 0
181
-
...: dx = (b-a)/N
247
+
...: dx = (b - a) / N
182
248
...: for i in range(N):
183
-
...: s += f_typed(a+i*dx)
249
+
...: s += f_typed(a + i * dx)
184
250
...: return s * dx
185
251
...: @cython.boundscheck(False)
186
252
...: @cython.wraparound(False)
@@ -197,23 +263,11 @@ TODO explain decorators, and why they make it so fast!
We can see that now all the time appears to be spent in ``apply_integrate_f_wrap`` and not much anywhere else. It would make sense to continue looking here for efficiencies...
207
-
208
-
TODO more? Have a 2D ndarray example?
209
-
210
-
Using cython has made our calculation around 100 times faster than the original python only version, and yet we're left with something which doesn't look too dissimilar.
211
-
212
-
TODO some warning that you don't need to cythonize every function (!)
266
+
This shaves another third off!
213
267
214
-
Further topics:
268
+
Further topics
269
+
~~~~~~~~~~~~~~
215
270
216
-
- One can also load in functions from other C modules you've already written.
217
-
- More??
271
+
- Loading C modules into cython.
218
272
219
273
Read more in the `cython docs <http://docs.cython.org/>`_.
0 commit comments