-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ASV Benchmark - Time Standards #29165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The timeit documentation does not apply as is, see here for details: https://asv.readthedocs.io/en/stable/benchmarks.html#timing-benchmarks
…On October 22, 2019 5:59:32 PM UTC, William Ayd ***@***.***> wrote:
TLDR - I think we need to cap our benchmarks at a maximum of .2
seconds. That's a long way off though, so I think should start with a
cap of 1 second per benchmark
Right now we have some very long running benchmarks:
https://pandas.pydata.org/speed/pandas/#summarylist?sort=1&dir=desc
I haven't seen a definitive answer, but I think ASV leverages the
builtin timeit functionality to figure out how long a given benchmark
should run.
https://docs.python.org/3.7/library/timeit.html#command-line-interface
Quoting what I thinks is important:
> If -n is not given, a suitable number of loops is calculated by
trying successive powers of 10 until the total time is at least 0.2
seconds.
So IIUC a particular statement is executed n times (where n is a power
of 10) to the point where it reaches 0.2 seconds to run, and then is
repeated `repeat` times to get a reading. asv continuous would do this
4 times (2 runs for each commit being compared). In Python 3.6 `repeat`
is 3 (we currently pin ASVs to 3.6) but in future versions that gets
bumped to 5.
We have a handful of benchmarks that are 20s a piece to run, so if we
stick to the 3.6 timing these statements would run n=1 times repeated 3
times per benchmark session * 4 sessions per continuous run. 20s * 3
repeats * 4 sessions = 4 minutes for one benchmark alone
rolling.Apply.time_rolling is a serious offender here so I think can
start with that. Would take community PRs to improve performance of any
of these, though maybe should prioritize anything currently taking over
1 second
cc @qwhelan and @pv who may have additional insights
|
Thanks for the link - reading through it definitely gives more guidance. So we if we track something that itself takes more than 10 milliseconds to run do you know the Alternately do you have thoughts here on general best practices? Right now our benchmarks are pretty slow (ex: running the groupby module alone takes over an hour) |
If it takes longer that |
@WillAyd It appears there's a few issues:
I'll submit a PR shortly that pares down the test size so each iteration runs in under a second. |
I'll get the asv updated in the env running these.
…On Sat, Oct 26, 2019 at 4:02 PM Christopher Whelan ***@***.***> wrote:
@WillAyd <https://github.com/WillAyd> It appears there's a few issues:
- The benchmark takes 20s per run, there's not much asv can do as long
as that's the case.
- It's being run over 48 parameter combinations (half fast/half slow).
- These two factors mean 8 minutes for a n=1 run (24 * 20s), so it's
slow and noisy
- The pydata speed site is using an older version of asv that includes
memory addresses in run names:
https://pandas.pydata.org/speed/pandas/#rolling.Apply.time_rolling?p-function=%3Cbuilt-in%20function%20sum%3E&p-function=%3Cfunction%20sum%20at%200x7f39b3ee6bf8%3E&p-function=%3Cfunction%20Apply.%3Clambda%3E%20at%200x7f399f5f0510%3E&p-window=1000&p-contructor='DataFrame'&p-raw=True&p-dtype='float'
- *This means history is being lost as they probably don't match
across runs.*
- You can tell because the number of plotted lines is far less than
the number in the legend; those probably were all produced in a single run.
- Upgrade asv to a version that includes airspeed-velocity/asv#771
<airspeed-velocity/asv#771>
I'll submit a PR shortly that pares down the test size so each iteration
runs in under a second.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#29165?email_source=notifications&email_token=AAKAOIX5CXEOZYPGP622TO3QQSV5ZA5CNFSM4JDTIMDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECKQ2DA#issuecomment-546639116>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIQ5BH7HQQIVEXCRNYLQQSV5ZANCNFSM4JDTIMDA>
.
|
@TomAugspurger lmk if you need help with that; might not be a bad idea to refresh knowledge on that env |
TLDR - I think we need to cap our benchmarks at a maximum of .2 seconds. That's a long way off though, so I think should start with a cap of 1 second per benchmark
Right now we have some very long running benchmarks:
https://pandas.pydata.org/speed/pandas/#summarylist?sort=1&dir=desc
I haven't seen a definitive answer, but I think ASV leverages the builtin timeit functionality to figure out how long a given benchmark should run.
https://docs.python.org/3.7/library/timeit.html#command-line-interface
Quoting what I thinks is important:
So IIUC a particular statement is executed n times (where n is a power of 10) to the point where it reaches 0.2 seconds to run, and then is repeated
repeat
times to get a reading. asv continuous would do this 4 times (2 runs for each commit being compared). In Python 3.6repeat
is 3 (we currently pin ASVs to 3.6) but in future versions that gets bumped to 5.We have a handful of benchmarks that are 20s a piece to run, so if we stick to the 3.6 timing these statements would run n=1 times repeated 3 times per benchmark session * 4 sessions per continuous run. 20s * 3 repeats * 4 sessions = 4 minutes for one benchmark alone
rolling.Apply.time_rolling is a serious offender here so I think can start with that. Would take community PRs to improve performance of any of these, though maybe should prioritize anything currently taking over 1 second
cc @qwhelan and @pv who may have additional insights
The text was updated successfully, but these errors were encountered: