Skip to content

REF/BENCH: tslibs-specific parts of asvs #29292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 31, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

Motivation: asv runs take way too long. One way to get around that is to make it easier to identify* which benchmarks to run for a given PR. To that extent, if we can identify benchmarks that don't rely on anything outside of tslibs**, then a PR that doesn't touch that code can skip those benchmarks. We can do the same thing for other low-dependency modules like _libs.algos.

* Ideally in an automated way, but I'm not there yet.
** Here and elsewhere, "tslibs" informally includes tseries.offsets and parts of tseries.frequences (until we can get all of that into the tslibs directory)

@TomAugspurger will moving benchmarks cause asv to start new time series, or will it successfully link up the new location to the old?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 30, 2019

Not sure, but I imagine that renaming will start a new timeseries.

I don't think that's a huge problem though (assuming this benchmark doesn't introduce a slowdown).

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what tests are taking the longest? I have an unverified suspicion that parametrized benchmarks like PeriodProperties might be taking longer than you would expect. I've seen something like that with the GroupBy tests where the parametrized ones there alone take close to an hour but don't report a runtime anywhere near that; just haven't dug deep enough yet to confirm



class TimedeltaProperties:
def setup_cache(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal but maybe easy to change here if you want - using setup_cache here seems overkill. Mostly used for longer running setup calls, which I can't imagine this is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally agreed, but for now prefer to keep this move-only.

Longer-term I hope to get asv to use something like setup_class so that it only gets run once instead of many times

@WillAyd WillAyd added the Benchmark Performance (ASV) benchmarks label Oct 30, 2019
@jbrockmendel
Copy link
Member Author

Do you know what tests are taking the longest? I have an unverified suspicion that parametrized benchmarks like PeriodProperties might be taking longer than you would expect

I don't have a good a answer for that, but I can tell you

  • I just did a time asv continuous -E virtualenv -f 1.1 master HEAD -b algorithms and it took 55 minutes.
  • A few weeks ago I checked and found that a full asv run for a single commit had just over 5000 subprocesses import pandas.

@WillAyd
Copy link
Member

WillAyd commented Oct 30, 2019 via email

@jbrockmendel
Copy link
Member Author

Can you post the JSON result file? I think should have start / end time stamps in it

asv_log.txt This was produced with -b algorithms, so not a full run.

@jbrockmendel
Copy link
Member Author

For kicks I tried running asv continuous without selecting any benchmarks, and with with environments already built, and it took 15 minutes

@WillAyd
Copy link
Member

WillAyd commented Oct 31, 2019

For kicks I tried running asv continuous without selecting any benchmarks, and with with environments already built, and it took 15 minutes

It would be interesting if you ran with the --verbose flag to see where all of that time was spent. Also do you have ccache installed?

@jbrockmendel
Copy link
Member Author

It would be interesting if you ran with the --verbose flag to see where all of that time was spent.

Will give this a go.

Also do you have ccache installed?

I don't know. is this a pip thing or an apt/brew thing? Is it relevant that I make heavy use of git worktree?

@WillAyd
Copy link
Member

WillAyd commented Oct 31, 2019

It is a homebrew thing

@jreback jreback added this to the 1.0 milestone Oct 31, 2019
@jreback
Copy link
Contributor

jreback commented Oct 31, 2019

lgtm.

@WillAyd
Copy link
Member

WillAyd commented Oct 31, 2019

Yea lgtm as well. Can continue figuring out the ASV nuances outside of this

@WillAyd WillAyd merged commit 3c0cf22 into pandas-dev:master Oct 31, 2019
@jbrockmendel jbrockmendel deleted the zbench branch October 31, 2019 17:53
@jbrockmendel
Copy link
Member Author

FWIW looks like ccache helped quite a bit on OSX but not on ubuntu (maybe needs config after install)

@jbrockmendel
Copy link
Member Author

For kicks I tried running asv continuous without selecting any benchmarks, and with with environments already built, and it took 15 minutes

Update: upgrading asv took this down to a few seconds.

Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants