Skip to content

PERF: GH10213 kth_smallest GIL release #10927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 2, 2015

Conversation

kawochen
Copy link
Contributor

One tiny part of #10213

import timeit

setup_seq = '''
import pandas
import numpy
numpy.random.seed(1234)
x = numpy.random.randn(10000000)
a = x.copy()
b = x.copy()

def f(s):
    pandas.algos.kth_smallest(s, 500000)

def seq():
    f(a)
    f(b)
'''

setup_parallel = '''
import pandas
import numpy
import threading
numpy.random.seed(1234)
x = numpy.random.randn(10000000)
a = x.copy()
b = x.copy()

def f(s):
    pandas.algos.kth_smallest(s, 500000)

def parallel():
    thread1 = threading.Thread(target=f, args=(a,))
    thread2 = threading.Thread(target=f, args=(b,))
    thread1.start()
    thread2.start()
    thread1.join()
    thread2.join()
'''

print(min(timeit.repeat(stmt='seq()', setup=setup_seq, repeat=100, number=1)))
print(min(timeit.repeat(stmt='parallel()', setup=setup_parallel, repeat=100, number=1)))

On master

0.15268295999976544
0.15424678300041705

On branch

0.1544521670002723
0.08813380599985976

Testing of nsmallest/nlargest/median (I don't think median calls kth_smallest though)

import pandas, numpy
from pandas.util.testing import test_parallel
n = 1000000
k = 50000
numpy.random.seed(1234)
s = pandas.Series(numpy.random.randn(n))

def f():
    s.nlargest(k)

def seq():
    f()
    f()

@test_parallel(num_threads=2)
def g():
    f()

Master nsmallest

In [2]: %timeit f()
10 loops, best of 3: 42.4 ms per loop

In [3]: %timeit g()
10 loops, best of 3: 79.9 ms per loop

In [4]: %timeit seq()
10 loops, best of 3: 84.9 ms per loop

Branch nsmallest

In [10]: %timeit f()
10 loops, best of 3: 42.8 ms per loop

In [11]: %timeit g()
10 loops, best of 3: 68.6 ms per loop

In [12]: %timeit seq()
10 loops, best of 3: 91.2 ms per loop

Master nlargest

In [2]: %timeit f()
10 loops, best of 3: 47.5 ms per loop

In [3]: %timeit g()
10 loops, best of 3: 86 ms per loop

In [4]: %timeit seq()

Branch nlargest

In [10]: %timeit f()
10 loops, best of 3: 48.1 ms per loop

In [11]: %timeit g()
10 loops, best of 3: 71 ms per loop

In [12]: %timeit seq()
10 loops, best of 3: 95.7 ms per loop

Master median

In [2]: %timeit f()
100 loops, best of 3: 15.4 ms per loop

In [3]: %timeit g()
10 loops, best of 3: 20.7 ms per loop

In [4]: %timeit seq()
10 loops, best of 3: 30.7 ms per loop

Branch median

In [12]: %timeit f()
100 loops, best of 3: 15 ms per loop

In [13]: %timeit g()
10 loops, best of 3: 21.5 ms per loop

In [14]: %timeit seq()
10 loops, best of 3: 30 ms per loop

Results are pretty in line with expectations -- nsmallest does quite a bit more than kth_smallest, such as copying and indexing.

@jreback jreback added the Performance Memory or execution speed performance label Aug 29, 2015
@jreback jreback added this to the 0.17.0 milestone Aug 29, 2015
@jreback
Copy link
Contributor

jreback commented Aug 29, 2015

can you add a asv benchmark (note that #10928 ) changes this so will merge this after.

This should affect median, nlargest, nsmallest. So those would be nice to test.

Further use the test_parallel decorator to show the perf changes.

You can also add this issue to the GIL release issues in the whatsnew.


if j < k: l = i
if k < i: m = j
return a[k]


cdef inline kth_smallest_c(float64_t* a, Py_ssize_t k, Py_ssize_t n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think kth_smallest_c is called anywhere in the codebase, see if you can take it out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this group_median
calls
group_median here,
which calls _median_linear, which in turn calls kth_smallest_c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, good start on this.....

@kawochen
Copy link
Contributor Author

OK, I will update after 10928. I had looked at test_parallel, but wasn't sure how to make it work for this function.

The function alters the state of the array, so I needed one set-up per iteration (hence repeat=n and number=1), but obviously didn't want that to be included in the timing. In-place sorting and functions that drop stuff should have the same problem, so maybe there's machinery to deal with this that I don't know about. Plus test_parallel has a function level import, which I thought might skew the timing for this one.

I will check what asv has.

@@ -2078,6 +2079,48 @@ def inner(*args, **kwargs):
return wrapper


def test_parallel_diff_args(num_threads=2, args_list=None, kwargs_list=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is often needed for testing functions that alter the state of the input, such as some sorting algos and operations like drop_duplicates. This could be combined with test_parallel to reduce code duplication, but having it in a separate function saves some if-else that might affect timing.

As a side note I think import threading should be done somewhere else, because a second function call would otherwise incur a lower cost, although I haven't really observed an appreciable difference yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add onto test_parallel

@jreback
Copy link
Contributor

jreback commented Sep 1, 2015

@kawochen can you update

@kawochen
Copy link
Contributor Author

kawochen commented Sep 1, 2015

Will do. I haven't been able to get the asv benchmarks to match the improvement I saw, even when I add class attributes like number and repeat. asv does use a different default timer, but it looked more like number and repeat are ignored. I will update tonight.

@kawochen
Copy link
Contributor Author

kawochen commented Sep 2, 2015

Updated. asv has a million gotchas!

@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

@kawochen ahh ok, when you have time can you expand the doc section on asv would be appreciated.thxs.

jreback added a commit that referenced this pull request Sep 2, 2015
PERF: GH10213 kth_smallest GIL release
@jreback jreback merged commit 3db4c4b into pandas-dev:master Sep 2, 2015
@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

thanks!

@pv
Copy link
Contributor

pv commented Sep 3, 2015

asv repeat does not actually re-run setup each time, which probably should be considered a bug. (If you want to elaborate on the other million gotchas, that could also be useful)

@kawochen
Copy link
Contributor Author

kawochen commented Sep 3, 2015

@pv forgive my hyperbole! It's a great package 😄 I looked at the code before and saw that repeat is passed into timeit, but you are saying it actually doesn't quite work? Is there an open issue for this? I'll be more constructive and translate some of what I think are gotchas into feature requests.

@pv
Copy link
Contributor

pv commented Sep 3, 2015

Repeat is indeed passed to timeit, but the setup function is not run by timeit, but is done separately.
cf airspeed-velocity/asv#316

@jreback jreback mentioned this pull request Aug 10, 2016
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants