Skip to content

BENCH: fix noisy asv benchmarks that were running on exhausted generators #26772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 21, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions asv_bench/benchmarks/ctors.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,14 @@ class SeriesConstructors:
[False, True],
['float', 'int']]

# Generators get exhausted on use, so run setup before every call
number = 1
repeat = (3, 250, 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we care about setting repeat for these? Can't we just use the default max_time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default repeat is (min_n=1, max_n=10, max_time=20.0), which assumes number can be much greater than 1. We're not trying to modify the number of samples (number * repeat), just how they're collected and setting repeat is required to do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd still prefer if we just used the default max_time here (assuming that to be bottleneck). Understood it may look weird historically with these but they were wrong anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, @qwhelan's approach is preferred since we should get more stable benchmark. With the default repeat, we'd observe higher variance in the timings.


def setup(self, data_fmt, with_index, dtype):
if data_fmt in (gen_of_str, gen_of_tuples) and with_index:
raise NotImplementedError('Series constructors do not support '
'using generators with indexes')
N = 10**4
if dtype == 'float':
arr = np.random.randn(N)
Expand Down
4 changes: 4 additions & 0 deletions asv_bench/benchmarks/frame_ctor.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ class FromRecords:
params = [None, 1000]
param_names = ['nrows']

# Generators get exhausted on use, so run setup before every call
number = 1
repeat = (3, 250, 10)

def setup(self, nrows):
N = 100000
self.gen = ((x, (x * 20), (x * 100)) for x in range(N))
Expand Down