Skip to content

asv bench cleanup - groupby #10998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Clean-up of the groupby benchmarks.

This is an example of how they can be cleaned up (grouping benchmarks with the same setup in common classes, removing a lot of the setup functions in this way)

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Clean labels Sep 5, 2015
@jorisvandenbossche jorisvandenbossche added this to the 0.17.0 milestone Sep 5, 2015
@jorisvandenbossche
Copy link
Member Author

@jreback @qwhelan suppose this is OK to merge?
This was one of the larger files, but as you can see, quite some line reduction is possible with some manual clean-up (also makes it clearer to add benchmarks)

@jreback
Copy link
Contributor

jreback commented Sep 9, 2015

lgtm. I did sligthly different when I added some benchmarks here
as I subclassed. But both approaches make sense (and shorter boilerplate code)

@jreback
Copy link
Contributor

jreback commented Sep 9, 2015

a couple fail: asv dev --bench groupby

[ 15.38%] ··· Running groupby.groupby_first_last.time_groupby_nth_float32_any                                                                                                                                                     fed
[ 15.38%] ····· Traceback (most recent call last):
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 774, in <module>
                    commands[mode](args)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 751, in main_run
                    result = benchmark.do_run()
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 443, in do_run
                    return self.run(*self._current_params)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 510, in run
                    timing = timer.timeit(number)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 201, in timeit
                    timing = self.inner(it, self.timer)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 100, in inner
                    _func()
                  File "/Users/jreback/pandas/asv_bench/benchmarks/groupby.py", line 64, in time_groupby_nth_float32_any
                    self.data2.groupby(self.labels).nth(0, dropna='all')
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 903, in nth
                    level=self.level, sort=self.sort)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2158, in _get_grouper
                    level=level, sort=sort, in_axis=in_axis)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 1879, in __init__
                    self.grouper = _convert_grouper(index, grouper)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2185, in _convert_grouper
                    raise AssertionError('Grouper and axis must be same length')
                AssertionError: Grouper and axis must be same length


[ 16.92%] ··· Running groupby.groupby_first_last.time_groupby_nth_float64_any                                                                                                                                                     fed
[ 16.92%] ····· Traceback (most recent call last):
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 774, in <module>
                    commands[mode](args)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 751, in main_run
                    result = benchmark.do_run()
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 443, in do_run
                    return self.run(*self._current_params)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/asv-0.2.dev795+c27e60f7-py2.7.egg/asv/benchmark.py", line 510, in run
                    timing = timer.timeit(number)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 201, in timeit
                    timing = self.inner(it, self.timer)
                  File "/Users/jreback/miniconda/lib/python2.7/timeit.py", line 100, in inner
                    _func()
                  File "/Users/jreback/pandas/asv_bench/benchmarks/groupby.py", line 70, in time_groupby_nth_float64_any
                    self.data.groupby(self.labels).nth(0, dropna='all')
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 903, in nth
                    level=self.level, sort=self.sort)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2158, in _get_grouper
                    level=level, sort=sort, in_axis=in_axis)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 1879, in __init__
                    self.grouper = _convert_grouper(index, grouper)
                  File "/Users/jreback/miniconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 2185, in _convert_grouper
                    raise AssertionError('Grouper and axis must be same length')
                AssertionError: Grouper and axis must be same length

@jorisvandenbossche
Copy link
Member Author

yes, I know, but I didn't yet address that here (only restructuring the file, not fixing the tests themselves)

The subclassing is also fine, but there you had the same test function but different creation setup, so in that case it's more awkward to put it in one test class

@jorisvandenbossche
Copy link
Member Author

although, probably even the easier option is to use params in this case, as only the value for N changes, http://asv.readthedocs.org/en/latest/writing_benchmarks.html#parameterized-benchmarks

@jreback
Copy link
Contributor

jreback commented Sep 9, 2015

oh, all these cool features. Ok, once the suite is cleaned up prob need to put a some docs in place when/how to do various things.

@qwhelan
Copy link
Contributor

qwhelan commented Sep 10, 2015

It'll be a few hours until I can take a look (onsite + bunch of meetings
this evening), but I'm all for refactoring provided we prevent future
changes to the vbench groupby benchmarks (delete/comment out/scary warning).
On Sep 9, 2015 4:20 PM, "Jeff Reback" [email protected] wrote:

oh, all these cool features. Ok, once the suite is cleaned up prob need to
put a some docs in place when/how to do various things.


Reply to this email directly or view it on GitHub
#10998 (comment).

@jorisvandenbossche
Copy link
Member Author

Yes, I think we should certainly ensure we don't change the vbenches anymore. Otherwise it has no point in starting to adapt the asv benches.

Will do a PR with a notice. I think we can leave them in for some time for those we didn't switch yet

@@ -1535,7 +674,9 @@ def setup(self):
self.secid_max = int('F0000000', 16)
self.step = ((self.secid_max - self.secid_min) // (self.n_securities - 1))
self.security_ids = map((lambda x: hex(x)[2:10].upper()), range(self.secid_min, (self.secid_max + 1), self.step))
self.data_index = MultiIndex(levels=[self.dates.values, self.security_ids], labels=[[i for i in range(self.n_dates) for _ in xrange(self.n_securities)], (range(self.n_securities) * self.n_dates)], names=['date', 'security_id'])
self.data_index = MultiIndex(levels=[self.dates.values, self.security_ids],
labels=[[i for i in range(self.n_dates) for _ in xrange(self.n_securities)], (range(self.n_securities) * self.n_dates)],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Looks like I missed this in the first pass, but this xrange call will break things on Py3. Compatability is being handled by overwriting range in pandas_vb_common.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed!

@qwhelan
Copy link
Contributor

qwhelan commented Sep 10, 2015

@jorisvandenbossche Other than the xrange call that's my fault, looks good to me.

@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

FYI, I have run benchmarks since 0.14 (not exactly on the tag as not a real easy way to do this). and pushed to gh-pages

http://pydata.github.io/pandas/

@qwhelan
Copy link
Contributor

qwhelan commented Sep 12, 2015

Yeah, I've got some local patches to fix the run-against-tag issues that
result from trying to linearize history by using the first parent. I'll
chat with the asv devs to see if they're dead-set on that or whether that
can be configurable.
On Sep 12, 2015 1:13 PM, "Jeff Reback" [email protected] wrote:

FYI, I have run benchmarks since 0.14 (not exactly on the tag as not a
real easy way to do this). and pushed to gh-pages

http://pydata.github.io/pandas/


Reply to this email directly or view it on GitHub
#10998 (comment).

jorisvandenbossche added a commit that referenced this pull request Sep 12, 2015
@jorisvandenbossche jorisvandenbossche merged commit f476000 into pandas-dev:master Sep 12, 2015
@jorisvandenbossche
Copy link
Member Author

@jreback there seems to be some regressions since the last release (eg http://pydata.github.io/pandas/#plotting.plot_timeseries_period.time_plot_timeseries_period) maybe start listing them in the 0.17 release issue?

@jorisvandenbossche
Copy link
Member Author

@jreback @qwhelan another question: how do you handle the fact that some tests will fail on older pandas versions? (with vbench there was the start date you could give with the benchmark)

@jreback
Copy link
Contributor

jreback commented Sep 12, 2015

@jorisvandenbossche I put that one on the 0.17.0 RLS issue. How do you 'find' regressions though (e.g. except for looking at charts directly). Did I not do something?

@qwhelan
Copy link
Contributor

qwhelan commented Sep 13, 2015

Raising NotImplementedError is equivalent to SkipTest. Probably best done
in setup().
On Sep 12, 2015 4:56 PM, "Joris Van den Bossche" [email protected]
wrote:

@jreback https://github.com/jreback @qwhelan
https://github.com/qwhelan another question: how do you handle the fact
that some tests will fail on older pandas versions? (with vbench there was
the start date you could give with the benchmark)


Reply to this email directly or view it on GitHub
#10998 (comment).

@qwhelan
Copy link
Contributor

qwhelan commented Sep 13, 2015

In the "all benchmarks" view, there's a "Regressions" button in the upper
left. It's not spitting out anything due the tag issue I mentioned earlier.
On Sep 12, 2015 4:59 PM, "Jeff Reback" [email protected] wrote:

@jorisvandenbossche https://github.com/jorisvandenbossche I put that
one on the 0.17.0 RLS issue. How do you 'find' regressions though (e.g.
except for looking at charts directly). Did I not do something?


Reply to this email directly or view it on GitHub
#10998 (comment).

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

ahh I c

ok I did it with the mark I eyeball :)

@jorisvandenbossche
Copy link
Member Author

@jreback In Christophers version it works, so you can see there how it looks: http://qwhelan.github.io/pandas_asv/#regressions?sort=3&dir=desc

But for checking a release, I think the asv continuous between the previous release and the current release is also an good option to have an overview of possible regressions

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

@qwhelan also maybe I am using it wrongly

but asv gh-pages crashed for me

I just did: asv publish

the copied the html dir

@qwhelan
Copy link
Contributor

qwhelan commented Sep 13, 2015

No, the regression page not working is definitely a bug in master. Also, I
did not use asv gh-pages (didn't realize it was there). I did the pages
repo setup manually but I can investigate the crash tomorrow.
On Sep 12, 2015 5:20 PM, "Jeff Reback" [email protected] wrote:

@qwhelan https://github.com/qwhelan also maybe I am using it wrongly

but asv gh-pages crashed for me

I just did: asv publish

the copied the html dir


Reply to this email directly or view it on GitHub
#10998 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants