PERF: Some asv tests take a long time to setup #16803

TomAugspurger · 2017-06-30T03:51:28Z

I hacked asv to log the total execution time, including setup, for each benchmark. Some of these are parametrized over several cases, so they may not actually be slow. time is in seconds.

	time	file	klass	method
68	00:03:11.379267	frame_ctor	FrameConstructorDTIndexFromOffsets	time_frame_ctor
394	00:01:01.671905	inference	to_numeric_downcast	time_downcast
199	00:00:46.330012	groupby	GroupBySuite	time_describe
559	00:00:24.698904	replace	replace_large_dict	time_replace_large_dict
204	00:00:24.212386	groupby	GroupBySuite	time_mad
210	00:00:22.481497	groupby	GroupBySuite	time_pct_change
215	00:00:18.909368	groupby	GroupBySuite	time_skew
200	00:00:18.732072	groupby	GroupBySuite	time_diff
212	00:00:18.317290	groupby	GroupBySuite	time_rank
219	00:00:16.845357	groupby	GroupBySuite	time_unique

Ideally we could optimize the setup time on these as well. We could maybe modify the benchmark to do less work / run faster, but I'd like to avoid that if possible.

Link to the full CSV: https://gist.github.com/9d80aa45750224d7453863f2f754160d

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2017-06-30T08:17:18Z

One of the things I already have thought about before is that we probably can do less in the setup ?
I mean, is it needed to create the DataFrames in a setup (which gets repeated a lot), or would it be sufficient to just create them once in the file? (as long as the benchmarks don't modify it)

Eg https://github.com/pandas-dev/pandas/blob/master/asv_bench/benchmarks/groupby.py#L399 we could do the actual creation of the frame outside setup, and in the setup only pick the right one based on the parameters.

TomAugspurger · 2017-06-30T12:12:20Z

There's also the setup_cache method, which executes once

…

On Jun 30, 2017, at 3:17 AM, Joris Van den Bossche ***@***.***> wrote: One of the things I already have thought about before is that we probably can do less in the setup ? I mean, is it needed to create the DataFrames in a setup (which gets repeated a lot), or would it be sufficient to just create them once in the file? (as long as the benchmarks don't modify it) Eg https://github.com/pandas-dev/pandas/blob/master/asv_bench/benchmarks/groupby.py#L399 we could do the actual creation of the frame outside setup, and in the setup only pick the right one based on the parameters. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jreback · 2017-07-01T08:11:33Z

is this actually a big deal?

jbrockmendel · 2017-07-29T21:38:39Z

Is there a measure of the relative importance of these benchmarks? i.e. if I find improvements in subset A and slowdowns in subset B, when can this count as a win?

jorisvandenbossche · 2017-07-30T00:13:16Z

@jbrockmendel there is no 'measure' for that, apart from common sense.

But note that this is issue is not about that, it is about the fact that some tests take a long time to run (not because of pandas being too slow, but because of the set-up of the test)

jbrockmendel · 2017-11-13T23:39:12Z

For some of these, e.g. reindex.Reindexing, there is a lot of unnecessary setup being done. df2 is constructed for every call of all three benchmarks, but is only used in one. Same for s1 and s2. It isn't obvious how much mileage can be gained by trimming these down though.

jbrockmendel · 2017-11-14T02:22:52Z

@TomAugspurger can you post the hack you used to measure setup time? I'm poking at this instead of doing real work.

TomAugspurger · 2017-11-14T02:29:28Z

Happy to encourage productive procrastination :)

This is the diff in my local copy (which is based on a pretty old commit now)

diff --git a/asv/benchmarks.py b/asv/benchmarks.py
index 60442b9..b3b96ba 100644
--- a/asv/benchmarks.py
+++ b/asv/benchmarks.py
@@ -89,6 +89,8 @@ def run_benchmark(benchmark, root, env, show_stderr=False,
 
         - `errcode`: Error return code.
     """
+    import time
+    t0 = time.time()
     name = benchmark['name']
     result = {'stderr': '', 'profile': None, 'errcode': 0}
     bench_results = []
@@ -230,6 +232,10 @@ def run_benchmark(benchmark, root, env, show_stderr=False,
             with log.indent():
                 log.error(result['stderr'])
 
+        t1 = time.time()
+        line = '{},{}\n'.format(name, t1 - t0)
+        with open("log.csv", "a") as f:
+            f.write(line)
         return result
 
 
@@ -535,7 +541,6 @@ class Benchmarks(dict):
               and be a byte string containing the cProfile data.
         """
         log.info("Benchmarking {0}".format(env.name))
-
         with log.indent():
             benchmarks = sorted(list(six.iteritems(self)))

jbrockmendel · 2017-11-14T16:35:28Z

Thanks, worked like a charm. What didn't work was my attempt to short-cut setup calls by defining attributes at the class level.

jorisvandenbossche · 2017-11-14T16:38:22Z

Thanks, worked like a charm. What didn't work was my attempt to short-cut setup calls by defining attributes at the class level.

I think using 'global' data works, so defined on the module level (as long as it is only used for benchmarks that don't modify data). In some cases (if the module is not too long, something generic can be reused) that can be used I think.

mroeschke · 2023-03-29T19:02:50Z

I think this has been superseded by #44450 so closing

TomAugspurger added the Performance Memory or execution speed performance label Jun 30, 2017

TomAugspurger added this to the Next Major Release milestone Jun 30, 2017

TomAugspurger changed the title ~~PERF: Some vbench tests take a long time to setup~~ PERF: Some asv tests take a long time to setup Jun 30, 2017

This was referenced Nov 26, 2017

CLN: ASV frame_ctor benchmark #18499

Merged

BENCH: FromDictwithTimestampOffset benchmark is not useful #18511

Closed

mroeschke added the Benchmark Performance (ASV) benchmarks label Feb 4, 2021

datapythonista mentioned this issue Jan 5, 2022

Benchmarks grant #45049

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

mroeschke closed this as completed Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Some asv tests take a long time to setup #16803

PERF: Some asv tests take a long time to setup #16803

TomAugspurger commented Jun 30, 2017 •

edited

Loading

jorisvandenbossche commented Jun 30, 2017

TomAugspurger commented Jun 30, 2017 via email

jreback commented Jul 1, 2017

jbrockmendel commented Jul 29, 2017

jorisvandenbossche commented Jul 30, 2017

jbrockmendel commented Nov 13, 2017

jbrockmendel commented Nov 14, 2017

TomAugspurger commented Nov 14, 2017 •

edited

Loading

jbrockmendel commented Nov 14, 2017

jorisvandenbossche commented Nov 14, 2017

mroeschke commented Mar 29, 2023

PERF: Some asv tests take a long time to setup #16803

PERF: Some asv tests take a long time to setup #16803

Comments

TomAugspurger commented Jun 30, 2017 • edited Loading

jorisvandenbossche commented Jun 30, 2017

TomAugspurger commented Jun 30, 2017 via email

jreback commented Jul 1, 2017

jbrockmendel commented Jul 29, 2017

jorisvandenbossche commented Jul 30, 2017

jbrockmendel commented Nov 13, 2017

jbrockmendel commented Nov 14, 2017

TomAugspurger commented Nov 14, 2017 • edited Loading

jbrockmendel commented Nov 14, 2017

jorisvandenbossche commented Nov 14, 2017

mroeschke commented Mar 29, 2023

TomAugspurger commented Jun 30, 2017 •

edited

Loading

TomAugspurger commented Nov 14, 2017 •

edited

Loading