Skip to content

Reduce Benchmark Times of stat_ops, particularly kendall correlation #29270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Oct 29, 2019 · 2 comments · Fixed by #29401
Closed

Reduce Benchmark Times of stat_ops, particularly kendall correlation #29270

WillAyd opened this issue Oct 29, 2019 · 2 comments · Fixed by #29401
Labels
Benchmark Performance (ASV) benchmarks good first issue

Comments

@WillAyd
Copy link
Member

WillAyd commented Oct 29, 2019

ref #29165 the stat_ops benchmarks add a significant amount of time to the ASV runs. The "kendall" correlation method in particular takes over 10 seconds per run in some cases, and there are at least 4 occurrences of this. So for that method alone 40s across 4 runs per benchmark suite would be close to 3 minutes just for those benchmarks

There are a few others mixed in there as well that can be tuned. I think a .2 ms runtime would be ideal:

https://pandas.pydata.org/speed/pandas/#summarylist?sort=1&dir=desc

Of course making that method faster to be in line with the rest of the correlation methods is an option as well and this is mentioned by @dsaxton in #28329. Depending on how much effort that is might be OK to move the benchmark first and align back if ever improved

@WillAyd WillAyd added good first issue Benchmark Performance (ASV) benchmarks labels Oct 29, 2019
@deepandas11
Copy link

Hello! I would like to work on this. I understand that we can change the number of times it is called in the asv setup, but I am not really sure what you mean by moving the benchmark?

Sorry for the extra trouble. I am really new at this.

@WillAyd
Copy link
Member Author

WillAyd commented Nov 1, 2019

kendall is much slower than all of the other methods, so it might make sense to break that off from the parametrization if it makes the runtime more reasonable, at least until the performance of it gets improved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants