Skip to content

CI: Continuous benchmarking #36860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsaxton opened this issue Oct 4, 2020 · 3 comments
Closed

CI: Continuous benchmarking #36860

dsaxton opened this issue Oct 4, 2020 · 3 comments
Labels
Benchmark Performance (ASV) benchmarks CI Continuous Integration Performance Memory or execution speed performance

Comments

@dsaxton
Copy link
Member

dsaxton commented Oct 4, 2020

I think it would be helpful if pandas had a set of performance benchmarks to run automatically during every CI run (it looks like there is something for the asvs, but it seems they almost never actually run?). It would reduce some of the friction involved in manually running the asv suite and pasting results as comments, and also help prevent certain things from slipping by just not thinking to run benchmarks.

There exists the pytest plugin pytest-benchmark which seems even more lightweight than asv, and appears to be what RAPIDS uses for a benchmark tool of their own https://github.com/rapidsai/benchmark. Could it be an option to have a GitHub Action that runs pytest-benchmark on every PR (I don't know a lot about the plugin, but I am assuming it could be configured to either take a delta between master and the given branch, as well as possibly between that branch and some baseline commit on master such as a major release)? It may also be possible to cache the result from master somewhere to prevent it from being rerun every time.

What constitutes "failure" is another question, and maybe it's best to configure things to only warn on failure instead of making the whole run red (would help address flaky benchmarks as well). Also we would presumably need fine-grained control over the architecture used by GitHub to make sure it doesn't drift, and I'm not sure if that's possible.

@dsaxton dsaxton added Performance Memory or execution speed performance CI Continuous Integration Benchmark Performance (ASV) benchmarks labels Oct 4, 2020
@jreback
Copy link
Contributor

jreback commented Oct 4, 2020

this is nearly impossible to do because you actually want to run the entire benchmark suite and not just a few

if you could configure to run a subset then it could at least be manually triggered (arrow does this)

but -1 on rewriting benchmarks themselves

@dsaxton
Copy link
Member Author

dsaxton commented Oct 4, 2020

if you could configure to run a subset then it could at least be manually triggered (arrow does this)

Might be possible using workflow_dispatch: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/

@TomAugspurger
Copy link
Contributor

(it looks like there is something for the asvs, but it seems they almost never actually run?)

There's a machine in my basement that runs these. It crashed a while back and I haven't had time to debug it. Hopefully this week some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks CI Continuous Integration Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

3 participants