Skip to content

Commit 700d8fb

Browse files
committed
FIX remove todos
change name to Enhancing performance add in some sections
1 parent 69330fa commit 700d8fb

File tree

2 files changed

+100
-46
lines changed

2 files changed

+100
-46
lines changed

doc/source/cython.rst renamed to doc/source/enhancingperf.rst

+99-45
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.. _cython:
1+
.. _enhancingperf:
22

33
.. currentmodule:: pandas
44

@@ -17,28 +17,42 @@
1717
np.set_printoptions(precision=4, suppress=True)
1818
1919
20-
****************************************
21-
Cython (Writing C extensions for pandas)
22-
****************************************
20+
*********************
21+
Enhancing Performance
22+
*********************
2323

24-
For many use cases writing pandas in pure python and numpy is sufficient. In some computationally heavy applications however, it can be possible to achieve sizeable speed-ups by offloading work to `cython <http://cython.org/>`_.
24+
.. _enhancingperf.cython:
2525

26-
- Say something about this being tutorial for "advanced" users?
26+
Cython (Writing C extensions for pandas)
27+
----------------------------------------
2728

28-
.. note::
29+
For many use cases writing pandas in pure python and numpy is sufficient. In some
30+
computationally heavy applications however, it can be possible to achieve sizeable
31+
speed-ups by offloading work to `cython <http://cython.org/>`_.
2932

30-
The first thing to do here is to see if we can refactor in python, removing for loops (TODO add some waffle, and maybe trivial example, maybe even just using a for loop rather than apply in this example) a way which could make use of numpy...
33+
This tutorial assumes you have refactored as much as possible in python, for example
34+
trying to remove for loops and making use of numpy vectorization, it's always worth
35+
optimising in python first.
3136

37+
This tutorial walks through a "typical" process of cythonizing a slow computation.
38+
We use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_
39+
but in the context of pandas. Our final cythonized solution is around 100 times
40+
faster than the pure python.
3241

33-
This tutorial walksthrough a "typical" process of cythonizing a slow computation, we use an `example from the cython documentation <http://docs.cython.org/src/quickstart/cythonize.html>`_ in the context of pandas:
42+
.. _enhancingperf.pure:
3443

35-
We have a function, ``integrate_f``, which we want to apply row-wise across a DataFrame, ``df``:
44+
Pure python
45+
~~~~~~~~~~~
46+
47+
We have a DataFrame to which we want to apply a function row-wise.
3648

3749
.. ipython:: python
3850
39-
df = DataFrame({'x': 'x', 'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000))})
51+
df = DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})
4052
df
4153
54+
Here's the function in pure python:
55+
4256
.. ipython:: python
4357
4458
def f(x):
@@ -51,30 +65,43 @@ We have a function, ``integrate_f``, which we want to apply row-wise across a Da
5165
s += f(a + i * dx)
5266
return s * dx
5367
54-
In pure pandas we might achieve this using a row-wise ``apply``:
68+
We achieve our result by by using ``apply`` (row-wise):
5569

5670
.. ipython:: python
5771
5872
%timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
5973
60-
Clearly this isn't fast enough for us, so let's take a look and see where the time is spent performing this operation (limited to the most time consuming four calls) using the `prun ipython magic function <http://ipython.org/ipython-doc/stable/api/generated/IPython.core.magics.execution.html#IPython.core.magics.execution.ExecutionMagics.prun>`_:
74+
But clearly this isn't fast enough for us. Let's take a look and see where the
75+
time is spent during this operation (limited to the most time consuming
76+
four calls) using the `prun ipython magic function <http://ipython.org/ipython-doc/stable/api/generated/IPython.core.magics.execution.html#IPython.core.magics.execution.ExecutionMagics.prun>`_:
6177

6278
.. ipython:: python
6379
6480
%prun -l 4 df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)
6581
66-
By far the majority of time is spend inside either ``integrate_f`` or ``f``, hence we concentrate our efforts cythonizing these two functions.
82+
By far the majority of time is spend inside either ``integrate_f`` or ``f``,
83+
hence we'll concentrate our efforts cythonizing these two functions.
6784

6885
.. note::
6986

70-
In python 2 replacing the ``range`` with its generator counterpart (``xrange``) would mean the ``range`` line would vanish. In python 3 range is already a generator.
87+
In python 2 replacing the ``range`` with its generator counterpart (``xrange``)
88+
would mean the ``range`` line would vanish. In python 3 range is already a generator.
7189

72-
First, let's simply just copy our function over to cython as is (here the ``_plain`` suffix stands for "plain cython", allowing us to distinguish between our cython functions):
90+
.. _enhancingperf.plain:
91+
92+
Plain cython
93+
~~~~~~~~~~~~
94+
95+
First we're going to need to import the cython magic function to ipython:
7396

7497
.. ipython:: python
7598
7699
%load_ext cythonmagic
77100
101+
102+
Now, let's simply copy our functions over to cython as is (the suffix
103+
is here to distinguish between function versions):
104+
78105
.. ipython::
79106

80107
In [2]: %%cython
@@ -88,12 +115,24 @@ First, let's simply just copy our function over to cython as is (here the ``_pla
88115
...: return s * dx
89116
...:
90117

118+
.. note::
119+
120+
If you're having trouble pasting the above into your ipython, you may need
121+
to be using bleeding edge ipython for paste to play well with cell magics.
122+
123+
91124
.. ipython:: python
92125
93126
%timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)
94127
128+
Already this has shaved a third off, not too bad for a simple copy and paste.
129+
130+
.. _enhancingperf.type:
131+
132+
Adding type
133+
~~~~~~~~~~~
95134

96-
We're already shaved a third off, not too bad for a simple copy and paste. We'll get another huge improvement simply by providing type information:
135+
We get another huge improvement simply by providing type information:
97136

98137
.. ipython::
99138

@@ -114,30 +153,42 @@ We're already shaved a third off, not too bad for a simple copy and paste. We'll
114153
115154
%timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
116155
117-
Now, we're talking! Already we're over ten times faster than the original python version, and we haven't *really* modified the code. Let's go back and have another look at what's eating up time now:
156+
Now, we're talking! It's now over ten times faster than the original python
157+
implementation, and we haven't *really* modified the code. Let's have another
158+
look at what's eating up time:
118159

119160
.. ipython:: python
120161
121162
%prun -l 4 df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1)
122163
123-
It's calling series and frames... a lot, in fact they're getting called for every row in the DataFrame. Function calls are expensive in python, so maybe we should cythonize the apply part and see if we can minimise these.
164+
.. _enhancingperf.ndarray:
165+
166+
Using ndarray
167+
~~~~~~~~~~~~~
168+
169+
It's calling series... a lot! It's creating a Series from each row, and get-ting from both
170+
the index and the series (three times for each row). Function calls are expensive
171+
in python, so maybe we could minimise these by cythonizing the apply part.
172+
173+
.. note::
124174

125-
We are now passing ndarrays into the cython function, fortunately cython plays very nicely with numpy. TODO mention the ``Py_ssize_t``.
175+
We are now passing ndarrays into the cython function, fortunately cython plays
176+
very nicely with numpy.
126177

127178
.. ipython::
128179

129180
In [4]: %%cython
130181
...: cimport numpy as np
131182
...: import numpy as np
132183
...: cdef double f_typed(double x) except? -2:
133-
...: return x**2-x
184+
...: return x * (x - 1)
134185
...: cpdef double integrate_f_typed(double a, double b, int N):
135186
...: cdef int i
136187
...: cdef double s, dx
137188
...: s = 0
138-
...: dx = (b-a)/N
189+
...: dx = (b - a) / N
139190
...: for i in range(N):
140-
...: s += f_typed(a+i*dx)
191+
...: s += f_typed(a + i * dx)
141192
...: return s * dx
142193
...: cpdef np.ndarray[double] apply_integrate_f(np.ndarray col_a, np.ndarray col_b, np.ndarray col_N):
143194
...: assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int)
@@ -150,7 +201,14 @@ We are now passing ndarrays into the cython function, fortunately cython plays v
150201
...:
151202

152203

153-
We create an array of zeros and loop over the rows, applying our ``integrate_f_typed`` function to fill it up. It's worth mentioning here that although a loop like this would be extremely slow in python (TODO: "as we saw" considerably slower than the apply?) while looping over a numpy array in cython is *fast*.
204+
The implementation is simple, it creates an array of zeros and loops over
205+
the rows, applying our ``integrate_f_typed``, and putting this in the zeros array.
206+
207+
208+
.. note::
209+
210+
Loop like this would be *extremely* slow in python, but in cython looping over
211+
numpy arrays is *fast*.
154212

155213
.. ipython:: python
156214
@@ -162,9 +220,17 @@ We've gone another three times faster! Let's check again where the time is spent
162220
163221
%prun -l 4 apply_integrate_f(df['a'], df['b'], df['N'])
164222
165-
As on might expect, the majority of the time is now spent in ``apply_integrate_f``, so if we wanted to make anymore efficiencies we must continue to concentrate our efforts here...
223+
As one might expect, the majority of the time is now spent in ``apply_integrate_f``,
224+
so if we wanted to make anymore efficiencies we must continue to concentrate our
225+
efforts here.
226+
227+
.. _enhancingperf.boundswrap:
166228

167-
TODO explain decorators, and why they make it so fast!
229+
More advanced techniques
230+
~~~~~~~~~~~~~~~~~~~~~~~~
231+
232+
There is still scope for improvement, here's an example of using some more
233+
advanced cython techniques:
168234

169235
.. ipython::
170236

@@ -173,14 +239,14 @@ TODO explain decorators, and why they make it so fast!
173239
...: cimport numpy as np
174240
...: import numpy as np
175241
...: cdef double f_typed(double x) except? -2:
176-
...: return x**2-x
242+
...: return x * (x - 1)
177243
...: cpdef double integrate_f_typed(double a, double b, int N):
178244
...: cdef int i
179245
...: cdef double s, dx
180246
...: s = 0
181-
...: dx = (b-a)/N
247+
...: dx = (b - a) / N
182248
...: for i in range(N):
183-
...: s += f_typed(a+i*dx)
249+
...: s += f_typed(a + i * dx)
184250
...: return s * dx
185251
...: @cython.boundscheck(False)
186252
...: @cython.wraparound(False)
@@ -197,23 +263,11 @@ TODO explain decorators, and why they make it so fast!
197263
198264
%timeit apply_integrate_f_wrap(df['a'], df['b'], df['N'])
199265
200-
Again we've shaved another third off, so let's have a look at where the time is spent:
201-
202-
.. ipython:: python
203-
204-
%prun -l 4 apply_integrate_f_wrap(df['a'], df['b'], df['N'])
205-
206-
We can see that now all the time appears to be spent in ``apply_integrate_f_wrap`` and not much anywhere else. It would make sense to continue looking here for efficiencies...
207-
208-
TODO more? Have a 2D ndarray example?
209-
210-
Using cython has made our calculation around 100 times faster than the original python only version, and yet we're left with something which doesn't look too dissimilar.
211-
212-
TODO some warning that you don't need to cythonize every function (!)
266+
This shaves another third off!
213267

214-
Further topics:
268+
Further topics
269+
~~~~~~~~~~~~~~
215270

216-
- One can also load in functions from other C modules you've already written.
217-
- More??
271+
- Loading C modules into cython.
218272

219273
Read more in the `cython docs <http://docs.cython.org/>`_.

doc/source/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,11 @@ See the package overview for more detail about what's in the library.
126126
visualization
127127
rplot
128128
io
129+
performance
129130
sparse
130131
gotchas
131132
r_interface
132133
related
133134
comparison_with_r
134-
cython
135135
api
136136

0 commit comments

Comments
 (0)