asv bench cleanup - groupby #10998

jorisvandenbossche · 2015-09-05T11:37:01Z

Clean-up of the groupby benchmarks.

This is an example of how they can be cleaned up (grouping benchmarks with the same setup in common classes, removing a lot of the setup functions in this way)

jorisvandenbossche · 2015-09-09T22:18:52Z

@jreback @qwhelan suppose this is OK to merge?
This was one of the larger files, but as you can see, quite some line reduction is possible with some manual clean-up (also makes it clearer to add benchmarks)

jreback · 2015-09-09T23:06:11Z

lgtm. I did sligthly different when I added some benchmarks here
as I subclassed. But both approaches make sense (and shorter boilerplate code)

jreback · 2015-09-09T23:08:12Z

a couple fail: asv dev --bench groupby

[ 15.38%] Â·Â·Â· Running groupby.groupby_first_last.time_groupby_nth_float32_any                                                                                                                                                     fed
[ 15.38%] Â·Â·Â·Â·Â· Traceback (most recent call last):
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 774, in <module>
                    commands[mode](args)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 751, in main_run
                    result = benchmark.do_run()
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 443, in do_run
                    return self.run(*self._current_params)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 510, in run
                    timing = timer.timeit(number)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 201, in timeit
                    timing = self.inner(it, self.timer)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 100, in inner
                    _func()
                  File "/Users/jreback/pandas/asv_bench/benchmarks/groupby.py", line 64, in time_groupby_nth_float32_any
                    self.data2.groupby(self.labels).nth(0, dropna='all')
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 903, in nth
                    level=self.level, sort=self.sort)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2158, in _get_grouper
                    level=level, sort=sort, in_axis=in_axis)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 1879, in __init__
                    self.grouper = _convert_grouper(index, grouper)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2185, in _convert_grouper
                    raise AssertionError('Grouper and axis must be same length')
                AssertionError: Grouper and axis must be same length


[ 16.92%] Â·Â·Â· Running groupby.groupby_first_last.time_groupby_nth_float64_any                                                                                                                                                     fed
[ 16.92%] Â·Â·Â·Â·Â· Traceback (most recent call last):
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 774, in <module>
                    commands[mode](args)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 751, in main_run
                    result = benchmark.do_run()
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 443, in do_run
                    return self.run(*self._current_params)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 510, in run
                    timing = timer.timeit(number)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 201, in timeit
                    timing = self.inner(it, self.timer)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 100, in inner
                    _func()
                  File "/Users/jreback/pandas/asv_bench/benchmarks/groupby.py", line 70, in time_groupby_nth_float64_any
                    self.data.groupby(self.labels).nth(0, dropna='all')
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 903, in nth
                    level=self.level, sort=self.sort)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2158, in _get_grouper
                    level=level, sort=sort, in_axis=in_axis)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 1879, in __init__
                    self.grouper = _convert_grouper(index, grouper)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2185, in _convert_grouper
                    raise AssertionError('Grouper and axis must be same length')
                AssertionError: Grouper and axis must be same length

jorisvandenbossche · 2015-09-09T23:10:44Z

yes, I know, but I didn't yet address that here (only restructuring the file, not fixing the tests themselves)

The subclassing is also fine, but there you had the same test function but different creation setup, so in that case it's more awkward to put it in one test class

jorisvandenbossche · 2015-09-09T23:11:44Z

although, probably even the easier option is to use params in this case, as only the value for N changes, http://asv.readthedocs.org/en/latest/writing_benchmarks.html#parameterized-benchmarks

jreback · 2015-09-09T23:20:52Z

oh, all these cool features. Ok, once the suite is cleaned up prob need to put a some docs in place when/how to do various things.

qwhelan · 2015-09-10T00:11:20Z

It'll be a few hours until I can take a look (onsite + bunch of meetings
this evening), but I'm all for refactoring provided we prevent future
changes to the vbench groupby benchmarks (delete/comment out/scary warning).
On Sep 9, 2015 4:20 PM, "Jeff Reback" [email protected] wrote:

oh, all these cool features. Ok, once the suite is cleaned up prob need to
put a some docs in place when/how to do various things.

—
Reply to this email directly or view it on GitHub
#10998 (comment).

jorisvandenbossche · 2015-09-10T08:12:51Z

Yes, I think we should certainly ensure we don't change the vbenches anymore. Otherwise it has no point in starting to adapt the asv benches.

Will do a PR with a notice. I think we can leave them in for some time for those we didn't switch yet

qwhelan · 2015-09-10T23:38:23Z

asv_bench/benchmarks/groupby.py

@@ -1535,7 +674,9 @@ def setup(self):
        self.secid_max = int('F0000000', 16)
        self.step = ((self.secid_max - self.secid_min) // (self.n_securities - 1))
        self.security_ids = map((lambda x: hex(x)[2:10].upper()), range(self.secid_min, (self.secid_max + 1), self.step))
-        self.data_index = MultiIndex(levels=[self.dates.values, self.security_ids], labels=[[i for i in range(self.n_dates) for _ in xrange(self.n_securities)], (range(self.n_securities) * self.n_dates)], names=['date', 'security_id'])
+        self.data_index = MultiIndex(levels=[self.dates.values, self.security_ids],
+                                     labels=[[i for i in range(self.n_dates) for _ in xrange(self.n_securities)], (range(self.n_securities) * self.n_dates)],


@jorisvandenbossche Looks like I missed this in the first pass, but this xrange call will break things on Py3. Compatability is being handled by overwriting range in pandas_vb_common.

thanks, fixed!

qwhelan · 2015-09-10T23:38:56Z

@jorisvandenbossche Other than the xrange call that's my fault, looks good to me.

jreback · 2015-09-12T20:13:05Z

FYI, I have run benchmarks since 0.14 (not exactly on the tag as not a real easy way to do this). and pushed to gh-pages

http://pydata.github.io/pandas/

qwhelan · 2015-09-12T20:18:26Z

Yeah, I've got some local patches to fix the run-against-tag issues that
result from trying to linearize history by using the first parent. I'll
chat with the asv devs to see if they're dead-set on that or whether that
can be configurable.
On Sep 12, 2015 1:13 PM, "Jeff Reback" [email protected] wrote:

FYI, I have run benchmarks since 0.14 (not exactly on the tag as not a
real easy way to do this). and pushed to gh-pages

http://pydata.github.io/pandas/

—
Reply to this email directly or view it on GitHub
#10998 (comment).

asv bench cleanup - groupby

jorisvandenbossche · 2015-09-12T23:54:56Z

@jreback there seems to be some regressions since the last release (eg http://pydata.github.io/pandas/#plotting.plot_timeseries_period.time_plot_timeseries_period) maybe start listing them in the 0.17 release issue?

jorisvandenbossche · 2015-09-12T23:56:04Z

@jreback @qwhelan another question: how do you handle the fact that some tests will fail on older pandas versions? (with vbench there was the start date you could give with the benchmark)

jreback · 2015-09-12T23:59:40Z

@jorisvandenbossche I put that one on the 0.17.0 RLS issue. How do you 'find' regressions though (e.g. except for looking at charts directly). Did I not do something?

qwhelan · 2015-09-13T00:01:45Z

Raising NotImplementedError is equivalent to SkipTest. Probably best done
in setup().
On Sep 12, 2015 4:56 PM, "Joris Van den Bossche" [email protected]
wrote:

@jreback https://github.com/jreback @qwhelan
https://github.com/qwhelan another question: how do you handle the fact
that some tests will fail on older pandas versions? (with vbench there was
the start date you could give with the benchmark)

—
Reply to this email directly or view it on GitHub
#10998 (comment).

qwhelan · 2015-09-13T00:04:41Z

In the "all benchmarks" view, there's a "Regressions" button in the upper
left. It's not spitting out anything due the tag issue I mentioned earlier.
On Sep 12, 2015 4:59 PM, "Jeff Reback" [email protected] wrote:

@jorisvandenbossche https://github.com/jorisvandenbossche I put that
one on the 0.17.0 RLS issue. How do you 'find' regressions though (e.g.
except for looking at charts directly). Did I not do something?

—
Reply to this email directly or view it on GitHub
#10998 (comment).

jreback · 2015-09-13T00:18:54Z

ahh I c

ok I did it with the mark I eyeball :)

jorisvandenbossche · 2015-09-13T00:19:30Z

@jreback In Christophers version it works, so you can see there how it looks: http://qwhelan.github.io/pandas_asv/#regressions?sort=3&dir=desc

But for checking a release, I think the asv continuous between the previous release and the current release is also an good option to have an overview of possible regressions

jreback · 2015-09-13T00:20:51Z

@qwhelan also maybe I am using it wrongly

but asv gh-pages crashed for me

I just did: asv publish

the copied the html dir

qwhelan · 2015-09-13T00:29:16Z

No, the regression page not working is definitely a bug in master. Also, I
did not use asv gh-pages (didn't realize it was there). I did the pages
repo setup manually but I can investigate the crash tomorrow.
On Sep 12, 2015 5:20 PM, "Jeff Reback" [email protected] wrote:

@qwhelan https://github.com/qwhelan also maybe I am using it wrongly

but asv gh-pages crashed for me

I just did: asv publish

the copied the html dir

—
Reply to this email directly or view it on GitHub
#10998 (comment).

jorisvandenbossche added Performance Memory or execution speed performance Clean labels Sep 5, 2015

jorisvandenbossche force-pushed the asv-cleanup branch from c651f87 to fad6484 Compare September 5, 2015 15:40

jorisvandenbossche added this to the 0.17.0 milestone Sep 5, 2015

qwhelan reviewed Sep 10, 2015
View reviewed changes

asv bench cleanup - groupby

788217f

jorisvandenbossche force-pushed the asv-cleanup branch from fad6484 to 788217f Compare September 12, 2015 23:46

jorisvandenbossche added a commit that referenced this pull request Sep 12, 2015

Merge pull request #10998 from jorisvandenbossche/asv-cleanup

f476000

asv bench cleanup - groupby

jorisvandenbossche merged commit f476000 into pandas-dev:master Sep 12, 2015

jorisvandenbossche mentioned this pull request Aug 27, 2016

CLN/PERF: clean-up of the benchmarks #14099

Merged

Uh oh!

asv bench cleanup - groupby #10998

asv bench cleanup - groupby #10998

Uh oh!

Conversation

jorisvandenbossche commented Sep 5, 2015

Uh oh!

jorisvandenbossche commented Sep 9, 2015

Uh oh!

jreback commented Sep 9, 2015

Uh oh!

jreback commented Sep 9, 2015

Uh oh!

jorisvandenbossche commented Sep 9, 2015

Uh oh!

jorisvandenbossche commented Sep 9, 2015

Uh oh!

jreback commented Sep 9, 2015

Uh oh!

qwhelan commented Sep 10, 2015

Uh oh!

jorisvandenbossche commented Sep 10, 2015

Uh oh!

qwhelan Sep 10, 2015

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Sep 12, 2015

Choose a reason for hiding this comment

Uh oh!

qwhelan commented Sep 10, 2015

Uh oh!

jreback commented Sep 12, 2015

Uh oh!

qwhelan commented Sep 12, 2015

Uh oh!

jorisvandenbossche commented Sep 12, 2015

Uh oh!

jorisvandenbossche commented Sep 12, 2015

Uh oh!

jreback commented Sep 12, 2015

Uh oh!

qwhelan commented Sep 13, 2015

Uh oh!

qwhelan commented Sep 13, 2015

Uh oh!

jreback commented Sep 13, 2015

Uh oh!

jorisvandenbossche commented Sep 13, 2015

Uh oh!

jreback commented Sep 13, 2015

Uh oh!

qwhelan commented Sep 13, 2015

Uh oh!

Uh oh!