-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ASV: docs, runtime, stability, build acceleration, PR integration? #23412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do you have a way of figuring out how long individual benchmarks are running? IIRC there is a |
@h-vetinari what you've written matches my experience. @WillAyd my understanding (grain of salt) of Most of our benchmarks are both fairly fast non-stateful. i.e. if there were an option to only run Another intuition I have no evidence for: For the noise, the idea I've pitched to the asv folks is to interleave the benchmarks instead of running all-benchmarks-for-commit-A followed by all-benchmarks-for-commit B. IIRC they said that would require a pretty serious re-working of asv in order to implement. One piece of bright-side: there has been some discussion within asv (I haven't checked on it for a while, no idea how it went) about an option to aggregate results across runs. So if I run |
@jbrockmendel: the interleaved benchmarks were already implemented in 0.3, ditto for the option for aggregating results. |
On a laptops I'd recommend to at least pin the CPU frequency/performance settings to get stable results https://asv.readthedocs.io/en/stable/tuning.html#tuning-machines-for-benchmarking (and on linux disable thermald which also tunes the frequencies on the fly). How much collecting statistics across multiple runs helps depends on how large the CPU frequency fluctuations are. The default setting for asv is to collect statistics over k=2 reruns of the whole suite (and of course multiple samples on each single benchmark, but here they are strongly correlated). Suppose e.g. that you have probability p chance for each benchmark to be run with "slow" CPU frequency, otherwise with "fast" frequency. Pandas has N=2618 benchmarks (counting parameterized separately), and suppose the suite is interleaved k times. Then Pr[each benchmark sampled with "fast" CPU frequency] = (1 - p^k)^N. For k=2, N=2618 and e.g. p=0.05 this is ~0; for k=5 ( The choices are that either the benchmark run takes a long time (i.e. not asv default settings), or the system is configured to produce stable timings. On linux, the forkserver mode pushes benchmarks overheads to <~50msec, so most time is taken by running the user code, on Windows you have bigger overhead as Pandas has 2618 benchmarks (counting parameter combinations), so theoretically it should take 45min, maybe double that on Windows. This doesn't count time taken by setup methods, and pandas probably maybe has several slow ones, and benchmarks that have runtime in several seconds. Finding slow setup() routines: https://gist.github.com/pv/9f51381ec8a488c34920749d0f00c43e Pandas also seems to set |
Here's the output of @pv 's script if anyone was curious |
Currently, https://pandas.pydata.org/pandas-docs/stable/contributing.html says:
On a standard laptop with 8GB RAM and 4 cores, this was more like 6.5h last night.
I recently updated the ASV code (as recommended by
contributing.html
) withpip install git+https://github.com/spacetelescope/asv
and it seems that in
v.0.4
, ASV runs each commit/benchmark in 2 rounds, effectively doubling the runtime? (It may well be that I don't understand what the rounds are supposed to do exactly, but the ASV ran much faster before).Quite a lot of time is also spent doing the environment builds, and I was wondering if it wouldn't be possible to reuse the logic from
python setup.py build_ext --inplace -j 4
to only cythonize the modules for which the code has changed (probably more an asv issue).Finally, the runs are annoyingly noisy. For example, after running
asv continuous -f 1.1 upstream/master HEAD
overnight, with nothing else running on the machine (all other applications closed), I got something like this,However, I didn't trust the results because there were equally strong divergences in the other direction.
Upon rerunning
asv continuous -f 1.1 upstream/master HEAD -b "^(re)?index"
, all those divergences vanished, and got replaced by the following (with other divergences):This is a larger point that letting people run the ASVs on their private machines is not the most thorough approach, prone to bias (or even manipulation), and exposed to whatever else is running on their machines at the time.
Finally, a lot of the divergences are not shown if the results are too noisy according to the ASV internals - this is a general point to keep in mind, because IMO, this can mask real regressions just because the runs are noisy. I've opened airspeed-velocity/asv#752 for that.
Summing up, I think that:
-b "some_regex"
)The text was updated successfully, but these errors were encountered: