PERF: GH10213 kth_smallest GIL release #10927

kawochen · 2015-08-29T06:37:53Z

One tiny part of #10213

import timeit

setup_seq = '''
import pandas
import numpy
numpy.random.seed(1234)
x = numpy.random.randn(10000000)
a = x.copy()
b = x.copy()

def f(s):
    pandas.algos.kth_smallest(s, 500000)

def seq():
    f(a)
    f(b)
'''

setup_parallel = '''
import pandas
import numpy
import threading
numpy.random.seed(1234)
x = numpy.random.randn(10000000)
a = x.copy()
b = x.copy()

def f(s):
    pandas.algos.kth_smallest(s, 500000)

def parallel():
    thread1 = threading.Thread(target=f, args=(a,))
    thread2 = threading.Thread(target=f, args=(b,))
    thread1.start()
    thread2.start()
    thread1.join()
    thread2.join()
'''

print(min(timeit.repeat(stmt='seq()', setup=setup_seq, repeat=100, number=1)))
print(min(timeit.repeat(stmt='parallel()', setup=setup_parallel, repeat=100, number=1)))

On master

0.15268295999976544
0.15424678300041705

On branch

0.1544521670002723
0.08813380599985976

Testing of nsmallest/nlargest/median (I don't think median calls kth_smallest though)

import pandas, numpy
from pandas.util.testing import test_parallel
n = 1000000
k = 50000
numpy.random.seed(1234)
s = pandas.Series(numpy.random.randn(n))

def f():
    s.nlargest(k)

def seq():
    f()
    f()

@test_parallel(num_threads=2)
def g():
    f()

Master nsmallest

In [2]: %timeit f()
10 loops, best of 3: 42.4 ms per loop

In [3]: %timeit g()
10 loops, best of 3: 79.9 ms per loop

In [4]: %timeit seq()
10 loops, best of 3: 84.9 ms per loop

Branch nsmallest

In [10]: %timeit f()
10 loops, best of 3: 42.8 ms per loop

In [11]: %timeit g()
10 loops, best of 3: 68.6 ms per loop

In [12]: %timeit seq()
10 loops, best of 3: 91.2 ms per loop

Master nlargest

In [2]: %timeit f()
10 loops, best of 3: 47.5 ms per loop

In [3]: %timeit g()
10 loops, best of 3: 86 ms per loop

In [4]: %timeit seq()

Branch nlargest

In [10]: %timeit f()
10 loops, best of 3: 48.1 ms per loop

In [11]: %timeit g()
10 loops, best of 3: 71 ms per loop

In [12]: %timeit seq()
10 loops, best of 3: 95.7 ms per loop

Master median

In [2]: %timeit f()
100 loops, best of 3: 15.4 ms per loop

In [3]: %timeit g()
10 loops, best of 3: 20.7 ms per loop

In [4]: %timeit seq()
10 loops, best of 3: 30.7 ms per loop

Branch median

In [12]: %timeit f()
100 loops, best of 3: 15 ms per loop

In [13]: %timeit g()
10 loops, best of 3: 21.5 ms per loop

In [14]: %timeit seq()
10 loops, best of 3: 30 ms per loop

Results are pretty in line with expectations -- nsmallest does quite a bit more than kth_smallest, such as copying and indexing.

jreback · 2015-08-29T13:44:18Z

can you add a asv benchmark (note that #10928 ) changes this so will merge this after.

This should affect median, nlargest, nsmallest. So those would be nice to test.

Further use the test_parallel decorator to show the perf changes.

You can also add this issue to the GIL release issues in the whatsnew.

jreback · 2015-08-29T13:48:10Z

pandas/algos.pyx

+
+            if j < k: l = i
+            if k < i: m = j
+        return a[k]


 cdef inline kth_smallest_c(float64_t* a, Py_ssize_t k, Py_ssize_t n):


I don't think kth_smallest_c is called anywhere in the codebase, see if you can take it out

this group_median
calls
group_median here,
which calls _median_linear, which in turn calls kth_smallest_c

ok, good start on this.....

kawochen · 2015-08-29T14:59:01Z

OK, I will update after 10928. I had looked at test_parallel, but wasn't sure how to make it work for this function.

The function alters the state of the array, so I needed one set-up per iteration (hence repeat=n and number=1), but obviously didn't want that to be included in the timing. In-place sorting and functions that drop stuff should have the same problem, so maybe there's machinery to deal with this that I don't know about. Plus test_parallel has a function level import, which I thought might skew the timing for this one.

I will check what asv has.

kawochen · 2015-08-30T16:16:39Z

pandas/util/testing.py

@@ -2078,6 +2079,48 @@ def inner(*args, **kwargs):
    return wrapper


+def test_parallel_diff_args(num_threads=2, args_list=None, kwargs_list=None):


Something like this is often needed for testing functions that alter the state of the input, such as some sorting algos and operations like drop_duplicates. This could be combined with test_parallel to reduce code duplication, but having it in a separate function saves some if-else that might affect timing.

As a side note I think import threading should be done somewhere else, because a second function call would otherwise incur a lower cost, although I haven't really observed an appreciable difference yet.

I would just add onto test_parallel

jreback · 2015-09-01T12:16:09Z

@kawochen can you update

kawochen · 2015-09-01T13:11:40Z

Will do. I haven't been able to get the asv benchmarks to match the improvement I saw, even when I add class attributes like number and repeat. asv does use a different default timer, but it looked more like number and repeat are ignored. I will update tonight.

kawochen · 2015-09-02T11:42:09Z

Updated. asv has a million gotchas!

jreback · 2015-09-02T11:46:54Z

@kawochen ahh ok, when you have time can you expand the doc section on asv would be appreciated.thxs.

PERF: GH10213 kth_smallest GIL release

jreback · 2015-09-02T11:49:33Z

thanks!

pv · 2015-09-03T17:50:10Z

asv repeat does not actually re-run setup each time, which probably should be considered a bug. (If you want to elaborate on the other million gotchas, that could also be useful)

kawochen · 2015-09-03T18:56:56Z

@pv forgive my hyperbole! It's a great package 😄 I looked at the code before and saw that repeat is passed into timeit, but you are saying it actually doesn't quite work? Is there an open issue for this? I'll be more constructive and translate some of what I think are gotchas into feature requests.

pv · 2015-09-03T19:14:47Z

Repeat is indeed passed to timeit, but the setup function is not run by timeit, but is done separately.
cf airspeed-velocity/asv#316

jreback added the Performance Memory or execution speed performance label Aug 29, 2015

jreback added this to the 0.17.0 milestone Aug 29, 2015

jreback reviewed Aug 29, 2015
View reviewed changes

kawochen force-pushed the PERF-10213-kth branch from 186df16 to e970381 Compare August 30, 2015 16:00

kawochen reviewed Aug 30, 2015
View reviewed changes

kawochen force-pushed the PERF-10213-kth branch from e970381 to 56358f6 Compare September 2, 2015 07:52

PERF: GH10213 kth_smallest GIL release

6db459a

kawochen force-pushed the PERF-10213-kth branch from 56358f6 to 6db459a Compare September 2, 2015 07:56

jreback added a commit that referenced this pull request Sep 2, 2015

Merge pull request #10927 from kawochen/PERF-10213-kth

3db4c4b

PERF: GH10213 kth_smallest GIL release

jreback merged commit 3db4c4b into pandas-dev:master Sep 2, 2015

pv mentioned this pull request Sep 3, 2015

Run benchmark setup on every repeat, similarly as timeit does airspeed-velocity/asv#316

Merged

jreback mentioned this pull request Aug 10, 2016

Unified common dtype discovery #13947

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: GH10213 kth_smallest GIL release #10927

PERF: GH10213 kth_smallest GIL release #10927

kawochen commented Aug 29, 2015

jreback commented Aug 29, 2015

jreback Aug 29, 2015

kawochen Aug 29, 2015

jreback Aug 31, 2015

kawochen commented Aug 29, 2015

kawochen Aug 30, 2015

jreback Aug 30, 2015

jreback commented Sep 1, 2015

kawochen commented Sep 1, 2015

kawochen commented Sep 2, 2015

jreback commented Sep 2, 2015

jreback commented Sep 2, 2015

pv commented Sep 3, 2015

kawochen commented Sep 3, 2015

pv commented Sep 3, 2015

		@@ -2078,6 +2079,48 @@ def inner(args, *kwargs):
		return wrapper


		def test_parallel_diff_args(num_threads=2, args_list=None, kwargs_list=None):

PERF: GH10213 kth_smallest GIL release #10927

PERF: GH10213 kth_smallest GIL release #10927

Conversation

kawochen commented Aug 29, 2015

jreback commented Aug 29, 2015

jreback Aug 29, 2015

Choose a reason for hiding this comment

kawochen Aug 29, 2015

Choose a reason for hiding this comment

jreback Aug 31, 2015

Choose a reason for hiding this comment

kawochen commented Aug 29, 2015

kawochen Aug 30, 2015

Choose a reason for hiding this comment

jreback Aug 30, 2015

Choose a reason for hiding this comment

jreback commented Sep 1, 2015

kawochen commented Sep 1, 2015

kawochen commented Sep 2, 2015

jreback commented Sep 2, 2015

jreback commented Sep 2, 2015

pv commented Sep 3, 2015

kawochen commented Sep 3, 2015

pv commented Sep 3, 2015