-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Some asv tests take a long time to setup #16803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One of the things I already have thought about before is that we probably can do less in the Eg https://github.com/pandas-dev/pandas/blob/master/asv_bench/benchmarks/groupby.py#L399 we could do the actual creation of the frame outside setup, and in the setup only pick the right one based on the parameters. |
There's also the setup_cache method, which executes once
… On Jun 30, 2017, at 3:17 AM, Joris Van den Bossche ***@***.***> wrote:
One of the things I already have thought about before is that we probably can do less in the setup ?
I mean, is it needed to create the DataFrames in a setup (which gets repeated a lot), or would it be sufficient to just create them once in the file? (as long as the benchmarks don't modify it)
Eg https://github.com/pandas-dev/pandas/blob/master/asv_bench/benchmarks/groupby.py#L399 we could do the actual creation of the frame outside setup, and in the setup only pick the right one based on the parameters.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
is this actually a big deal? |
Is there a measure of the relative importance of these benchmarks? i.e. if I find improvements in subset A and slowdowns in subset B, when can this count as a win? |
@jbrockmendel there is no 'measure' for that, apart from common sense. But note that this is issue is not about that, it is about the fact that some tests take a long time to run (not because of pandas being too slow, but because of the set-up of the test) |
For some of these, e.g. reindex.Reindexing, there is a lot of unnecessary setup being done. |
@TomAugspurger can you post the hack you used to measure setup time? I'm poking at this instead of doing real work. |
Happy to encourage productive procrastination :) This is the diff in my local copy (which is based on a pretty old commit now) diff --git a/asv/benchmarks.py b/asv/benchmarks.py
index 60442b9..b3b96ba 100644
--- a/asv/benchmarks.py
+++ b/asv/benchmarks.py
@@ -89,6 +89,8 @@ def run_benchmark(benchmark, root, env, show_stderr=False,
- `errcode`: Error return code.
"""
+ import time
+ t0 = time.time()
name = benchmark['name']
result = {'stderr': '', 'profile': None, 'errcode': 0}
bench_results = []
@@ -230,6 +232,10 @@ def run_benchmark(benchmark, root, env, show_stderr=False,
with log.indent():
log.error(result['stderr'])
+ t1 = time.time()
+ line = '{},{}\n'.format(name, t1 - t0)
+ with open("log.csv", "a") as f:
+ f.write(line)
return result
@@ -535,7 +541,6 @@ class Benchmarks(dict):
and be a byte string containing the cProfile data.
"""
log.info("Benchmarking {0}".format(env.name))
-
with log.indent():
benchmarks = sorted(list(six.iteritems(self)))
|
Thanks, worked like a charm. What didn't work was my attempt to short-cut |
I think using 'global' data works, so defined on the module level (as long as it is only used for benchmarks that don't modify data). In some cases (if the module is not too long, something generic can be reused) that can be used I think. |
I think this has been superseded by #44450 so closing |
I hacked asv to log the total execution time, including setup, for each benchmark. Some of these are parametrized over several cases, so they may not actually be slow.
time
is in seconds.Ideally we could optimize the setup time on these as well. We could maybe modify the benchmark to do less work / run faster, but I'd like to avoid that if possible.
Link to the full CSV: https://gist.github.com/9d80aa45750224d7453863f2f754160d
The text was updated successfully, but these errors were encountered: