Using cython in pandas tutorial #3923

hayd · 2013-06-16T12:25:46Z

Please see #3965 for current draft.

Re: this topic https://groups.google.com/forum/?fromgroups#!topic/pydata/aLxALYqosOU (cc @jreback).

I think this is one of the killer features of pandas so I think merits being in the docs. (I'd definitely be interested in reading it!) Maybe something structured like this with a good example:

write in python first (unit-test, and check for speed, it may be good enough!)
try and rewrite in python to be more efficient (now, it may be good enough!)
profile to work out which part is slow (and needs cython love)
writing and calling a cython function (to do that slow bit faster)

I'm not sure what would make a good toy example for this (and I think that choosing a good one is crucial).

Thoughts?

cpcloud · 2013-06-16T16:49:24Z

fyi it's not entirely true that you must "manually dispatch" in the "f_float64" "f_int32" etc way you can use fused types which do the dispatch for you, but they can be very difficult to debug and kind of awkward to use.

hayd · 2013-06-16T20:40:43Z

I propose using the example from the cython docs:

def f(x):
    return x**2-x

def integrate_f(a, b, N):
    s = 0
    dx = (b-a)/N
    for i in range(int(N)):  # annoyingly int seems to be required here:  #3928
        s += f(a+i*dx)
    return s * dx

We want to apply that to DataFrame:

df = pd.DataFrame({'a': randn(100), 'b': randn(100), 'N': randint(10, 100, (100))})
    N         a         b
0  93 -0.017216  0.329569
1  84  0.354537  0.314897
2  39  2.948030 -0.263055
3  57  0.751853  1.753032
4  42 -0.378684  2.685732

df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1).head()
0   -0.041781
1    0.008825
2   -4.461920
3    0.386990
4    2.797359
dtype: float64

It has ints and float columns, so may require the blocks trick...

hayd · 2013-06-16T22:07:51Z

I think this is a good example of using cython (I can put something together for this) - it shows a big speed improvement, but I'm not sure if it's a good example for leveraging numpy arrays.... ?

jreback · 2013-06-16T22:12:53Z

ideally prob have an extended example of solving this problem using apply
then maybe using a function passed to cython (which is a cython function) which operates on and returns ndarrays (which are then wrapped in frames)

kind of like the cython ndarray example

hayd · 2013-06-16T22:18:53Z

So essentially do the apply yourself (all in cython)?

jreback · 2013-06-16T22:24:08Z

I think that would be a nice non-trivial example
maybe pass in the floats, ints
supply the integrate and f as cython functions snd return the final ndarray
and provide a wrapping frame

hayd · 2013-06-16T22:54:39Z

Created working cython f and integrate f (plain and typed), working great.

Any ideas why this might compile but not import (is this the kind of thing you meant?):

import numpy as np
cimport numpy as np

cpdef apply_integrate_f(np.ndarray col_a, np.ndarray col_b, np.ndarray col_N):
    assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int)
    assert (len(col_a) == len(col_b) == len(col_N))
    cdef np.ndarray res = np.zeros(len(col_a), dtype=np.float)
    # cdef np.ndarray dx = col_a * col_b / col_N
    for i in range(len(col_a)):
        res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
    return res

It comes up with a lovely message part way though the stacktrace :)

# XXX -- this is a Vile HACK!

...lots like this
Users/234BroadWalk/.pyxbld/temp.macosx-10.6-intel-2.7/pyrex/integrate.c:4069: error: ‘PyUFuncObject’ undeclared (first use in this function)
lipo: can't figure out the architecture type of: /var/folders/hc/qwq7bjd535xgr4_vl7kjkxsw0000gp/T//cc3pLpao.out
---------------------------------------------------------------------------
... can post all if helpful?
ImportError: Building module integrate failed: ["CompileError: command 'gcc-4.2' failed with exit status 1\n"]

jreback · 2013-06-16T23:03:08Z

never seen that one
can u show integrated_f_typed?

hayd · 2013-06-16T23:22:02Z

cdef double f_typed(double x) except? -2:
    return x**2-x

cpdef integrate_f_typed(double a, double b, int N):
    cdef int i
    cdef double s, dx
    s = 0
    dx = (b-a)/N
    for i in range(N):
        s += f_typed(a+i*dx)
    return s * dx

These are direct copies from the cython example. :s

cpcloud · 2013-06-16T23:23:28Z

@hayd is that error the first one out of the compiler?

hayd · 2013-06-16T23:28:39Z

So I compile it just like this

[~/pandas]$ cython integrate.pyx
[~/pandas]$

and import in ipython like this:

In [3]: import pyximport; pyximport.install()
Out[3]: (None, <pyximport.pyximport.PyxImporter at 0x1042a5dd0>)

In [4]: import integrate
# ImportError: Building module integrate failed: ["CompileError: command 'gcc-4.2' failed with exit status 1\n"]

This method works for the other functions (when apply_integrate_f is not in the pyx file)...

cpcloud · 2013-06-16T23:34:20Z

why not just paste into ipython

hayd · 2013-06-16T23:54:33Z

@cpcloud ? I think I'm missing something fundamental here.

I just tried using %%cython_inline but I get a CompilerCrash, from AssertionError: Not yet supporting any cimports/includes from string code snippets on the cimport numpy line. :S

cpcloud · 2013-06-17T00:05:41Z

oh i used %%cython and then copypasted each function separately

hayd · 2013-06-17T00:27:08Z

Somewhat confusingly this worked first time...!! :) So, I guess nothing was wrong with the functions!

In [13]: %timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['N']), axis=1)  # python
10 loops, best of 3: 37.6 ms per loop

In [14]: %timeit df.apply(lambda x: integrate_f_plain(x['a'], x['b'], x['N']), axis=1)  # cythonised
100 loops, best of 3: 11.8 ms per loop

In [15]: %timeit df.apply(lambda x: integrate_f_typed(x['a'], x['b'], x['N']), axis=1) # cythonised with type
100 loops, best of 3: 3.57 ms per loop

In [16]: %timeit apply_integrate_f(df['a'], df['b'], df['N']) # cythonised apply
1000 loops, best of 3: 1.2 ms per loop

I think this probably makes quite an ok example, it doesn't make use of a ndarray (only 1D) but nonetheless I think it's not too bad. Definitely shows the benefits!

Oh... maybe I can grab the float blocks using .blocks will see if that makes it even faster.

cpcloud · 2013-06-17T00:32:38Z

this

cpdef apply_integrate_f(np.ndarray col_a, np.ndarray col_b, np.ndarray col_N):
    assert (col_a.dtype == np.float and col_b.dtype == np.float and col_N.dtype == np.int)
    assert (len(col_a) == len(col_b) == len(col_N))
    cdef np.ndarray res = np.zeros(len(col_a), dtype=np.float)
    # cdef np.ndarray dx = col_a * col_b / col_N
    for i in range(len(col_a)):
        res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
    return res

could be changed to

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[double] apply_integrate_f(np.ndarray[double] col_a, np.ndarray[double] col_b, np.ndarray[Py_ssize_t] col_N):
    cdef Py_ssize_t i, n = len(col_N)
    assert len(col_a) == len(col_b) == n  # only because of above decorators
    cdef np.ndarray[double] res = np.empty(n)  # does float by default
    for i in range(n):
        res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
    return res

for some more speedup

hayd · 2013-06-17T00:42:30Z

Wowza, that looking swish! For me though: Cdef functions/classes cannot take arbitrary decorators. :s

cpcloud · 2013-06-17T00:43:42Z

u need to do cimport cython sorry i will correct.

cpcloud · 2013-06-17T00:44:38Z

fyi make sure u give the loop variable a type if it makes sense since i think cython will use an object if u don't

hayd · 2013-06-17T00:47:10Z

ah of course! It's getting quite late, my brain has stopped working.

What's the Py_ssize_t i stuff about?

cpcloud · 2013-06-17T00:49:59Z

python indexing type

hayd · 2013-06-17T00:52:13Z

Wow!

In [35]: %timeit apply_integrate_f_wrap(df['a'], df['b'], df['N'])
1000 loops, best of 3: 354 us per loop

cpcloud · 2013-06-17T00:52:54Z

u might even be able to squeeze even more out if u use cython memoryviews

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[double] apply_integrate_f(double[:] col_a, double[:] col_b, Py_ssize_t[:] col_N):
    cdef Py_ssize_t i, n = len(col_N)
    assert len(col_a) == len(col_b) == n  # only because of above decorators
    cdef np.ndarray[double] res = np.empty(n)  # does float by default
    for i in range(n):
        res[i] = integrate_f_typed(col_a[i], col_b[i], col_N[i])
    return res

hayd · 2013-06-17T01:10:21Z

Was wondering if we could have had an example using the float block, but I can only make it twice as slow as yours...

apply_integrate_f_wrap_blocks(df.blocks['float64'].values, df['N'])

cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef np.ndarray[double] apply_integrate_f_wrap_blocks(np.ndarray[double, ndim=2] cols_ab, np.ndarray[Py_ssize_t] col_N):
    cdef Py_ssize_t i, n = len(col_N)
    # assert shape
    assert len(cols_ab) == n  # only because of above decorators
    cdef np.ndarray[double] res = np.empty(n)  # does float by default
    for i in range(n):
        res[i] = integrate_f_typed(cols_ab[i][0], cols_ab[i][1], col_N[i])
    return res

Barking up the wrong tree here?

(I think already this looking like it's going to be a nice thing to write up!)

cpcloud · 2013-06-17T01:12:38Z

i suppose. but u could also just do

apply_integrate_f(*df.blocks['float64'].values.T, col_N=df['N'])

a bit terse, but it gets the job done.

hayd · 2013-06-17T10:53:43Z

Is there a neat way to get the stdout from things like prun and timeit into the docs:

.. ipython:: python
   :verbatim

   %timeit df.apply(lambda x: integrate_f(x['a'], x['b'], x['c']), axis=1)

seems to only capture printed things... (i.e. nothing in this case)

hayd · 2013-06-17T19:06:03Z

very WIP, but here's a initial draft (see PR)

hayd · 2013-06-17T19:26:46Z

@cpcloud Thanks for catching that, now I can see all the other errors... :s

cpcloud · 2013-06-17T19:28:59Z

no prob. doc builds are finnicky...

hayd · 2013-06-19T23:40:06Z

@cpcloud it somehow came together magically at the end, it is insanely sensitive to spacing.. et al. Thanks for your help and being so patient!

cpcloud · 2013-06-19T23:40:39Z

glad it worked out!

jtratner · 2013-06-20T00:48:17Z

@hayd what editor do you use? I use vim and it makes it eas(ier) to see restructuredtext errors (not perfect though, it can be frustratingly sensitive).

jtratner · 2013-06-20T00:54:58Z

@hayd @cpcloud Does pandas do anything special if you pass a cythonized function to things like groupby or apply? It'd be cool to be able to stay on the other side of the C ABI if you can pass a cythonized function.

jtratner · 2013-06-20T00:55:18Z

(and clearly I know very little about cython right now...)

cpcloud · 2013-06-20T00:55:57Z

toctree isn't that complicated :) it's basically an index that allows you to refer to other documents without having use paths and what not and also allows customization of the table of contents output. @hayd u should put cython.rst in the toctree if you want it to show up in the navbar in the docs

cpcloud · 2013-06-20T01:00:37Z

@jtratner i don't think so. i'm not sure if there's any extra metadata in a cython function that would allow u to tell the difference between it and a python function. @jreback probably knows more. you can pass a cythonized function anyway, but if in fact there's a cython function being called at some lower level that would call your cythonized function it will be typed as object and might not give much of a performance gain. assuming your cythonized function doesn't have all sorts of loops, e.g., a polynomial, then you'll probably gain some constant factor which of course may still be useful.

cpcloud · 2013-06-20T01:03:45Z

@jtratner cython is roughly python with types. it's useful for 2 things: making array looping faster and interfacing with other C code in a sane way. it does all the refcounting for u and also has some limited generic typing abilities among other things... the loops are actually rewritten almost exactly as u would hand code the c loops which really gives a lot of speedup. it also has the ability to execute some code in parallel and bypass the GIL which so far i've only found to be useful in one situation (unrelated to pandas) @hayd's tutorial is a nice starting point and then if u want more u can read the cython docs :)

hayd · 2013-06-20T11:01:18Z

@cpcloud The toctree I think I've had issues with is for to_pickle and read_pickle, I'm sure I switched all uses of save/load with to_pickle/read_pickle (and removed the deprecated ways of calling them). Guess I missed something...

I've added in cython at the end of the toctree (I think it warrants it's own section?).

@jtratner Once we worked out the correct syntax (and what it was caring about) it came out ok (I went through a whack-a-mole of indentation choices before that though). :(

hayd mentioned this issue Jun 19, 2013

ENH add cython tutorial #3965

Merged

hayd closed this as completed Jun 21, 2013

Using cython in pandas tutorial #3923

Using cython in pandas tutorial #3923

Comments

hayd commented Jun 16, 2013

Please see #3965 for current draft.

cpcloud commented Jun 16, 2013

hayd commented Jun 16, 2013

hayd commented Jun 16, 2013

jreback commented Jun 16, 2013

hayd commented Jun 16, 2013

jreback commented Jun 16, 2013

hayd commented Jun 16, 2013

jreback commented Jun 16, 2013

hayd commented Jun 16, 2013

cpcloud commented Jun 16, 2013

hayd commented Jun 16, 2013

cpcloud commented Jun 16, 2013

hayd commented Jun 16, 2013

cpcloud commented Jun 17, 2013

hayd commented Jun 17, 2013

cpcloud commented Jun 17, 2013

hayd commented Jun 17, 2013

cpcloud commented Jun 17, 2013

cpcloud commented Jun 17, 2013

hayd commented Jun 17, 2013

cpcloud commented Jun 17, 2013

hayd commented Jun 17, 2013

cpcloud commented Jun 17, 2013

hayd commented Jun 17, 2013

cpcloud commented Jun 17, 2013

hayd commented Jun 17, 2013

hayd commented Jun 17, 2013

hayd commented Jun 17, 2013

cpcloud commented Jun 17, 2013

hayd commented Jun 19, 2013

cpcloud commented Jun 19, 2013

jtratner commented Jun 20, 2013

jtratner commented Jun 20, 2013

jtratner commented Jun 20, 2013

cpcloud commented Jun 20, 2013

cpcloud commented Jun 20, 2013

cpcloud commented Jun 20, 2013

hayd commented Jun 20, 2013