Skip to content

BENCH: fix noisy asv benchmarks that were running on exhausted generators #26772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 21, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Jun 10, 2019

Generators get consumed on first use, yielding abnormally fast benchmark times on the n>1 iterations. Fortunately we can instruct asv to call setup() prior to every sample by setting number = 1 and repeat appropriately. My local runs suggest the typical number of samples is ~150-250, so an upper limit of 250 appears to be a good fit.

Here is current master with old benchmark version:

[ 75.00%] ··· Running (ctors.SeriesConstructors.time_series_constructor--).
[100.00%] ··· ctors.SeriesConstructors.time_series_constructor                                                                                                   4/40 failed
[100.00%] ··· ===================================== =============== ============= ============== =============
              --                                                        with_index / dtype
              ------------------------------------- ----------------------------------------------------------
                             data_fmt                False / float   False / int   True / float    True / int
              ===================================== =============== ============= ============== =============
                      <function gen_of_str>            130±0.9μs       116±6μs        failed         failed
                     <function gen_of_tuples>           113±3μs        112±20μs       failed         failed
              ===================================== =============== ============= ============== =============

And with the fixed benchmarks, we see the existing ones falsely report a 40x speedup:

[ 50.00%] ··· ===================================== =============== ============= ============== ============
              --                                                        with_index / dtype
              ------------------------------------- ---------------------------------------------------------
                             data_fmt                False / float   False / int   True / float   True / int
              ===================================== =============== ============= ============== ============
                      <function gen_of_str>            4.61±0.1ms    4.21±0.07ms       n/a           n/a
                     <function gen_of_tuples>          3.39±0.1ms    3.43±0.07ms       n/a           n/a
              ===================================== =============== ============= ============== ============

Additionally, this PR resolves a few failing asv tests that I introduced in the first iteration.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@WillAyd WillAyd added the Benchmark Performance (ASV) benchmarks label Jun 10, 2019
@@ -55,7 +55,14 @@ class SeriesConstructors:
[False, True],
['float', 'int']]

# Generators get exhausted on use, so run setup before every call
number = 1
repeat = (3, 250, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we care about setting repeat for these? Can't we just use the default max_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default repeat is (min_n=1, max_n=10, max_time=20.0), which assumes number can be much greater than 1. We're not trying to modify the number of samples (number * repeat), just how they're collected and setting repeat is required to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd still prefer if we just used the default max_time here (assuming that to be bottleneck). Understood it may look weird historically with these but they were wrong anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, @qwhelan's approach is preferred since we should get more stable benchmark. With the default repeat, we'd observe higher variance in the timings.

@codecov
Copy link

codecov bot commented Jun 10, 2019

Codecov Report

Merging #26772 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26772      +/-   ##
==========================================
- Coverage   91.71%   91.71%   -0.01%     
==========================================
  Files         178      178              
  Lines       50771    50771              
==========================================
- Hits        46567    46563       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.3% <ø> (ø) ⬆️
#single 41.19% <ø> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d47fc0c...bd5b433. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Jun 10, 2019

Codecov Report

Merging #26772 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26772      +/-   ##
==========================================
- Coverage   91.71%   91.71%   -0.01%     
==========================================
  Files         178      178              
  Lines       50771    50771              
==========================================
- Hits        46567    46563       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.3% <ø> (ø) ⬆️
#single 41.19% <ø> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d47fc0c...bd5b433. Read the comment docs.

@jreback jreback added this to the 0.25.0 milestone Jun 21, 2019
@jreback jreback merged commit 984514e into pandas-dev:master Jun 21, 2019
@jreback
Copy link
Contributor

jreback commented Jun 21, 2019

thanks @qwhelan

@qwhelan qwhelan deleted the asv_generators branch August 14, 2019 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants