Skip to content

add a page on optimizations and profiling #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 18, 2023

Conversation

the8472
Copy link
Member

@the8472 the8472 commented Oct 3, 2022

@jyn514 requested a guide how to benchmark std changes.

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic, thank you so much!! I wouldn't have thought of half of these ideas :)

e.g. to run it under `perf stat` or cachegrind.

Build and link the [stage1](https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html#creating-a-rustup-toolchain)
compiler as rustup toolchain and then use that to build the standalone benchmark with a modified standard library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for linking this! I want to reland that PR but haven't had time.

* [disable ASLR](https://man7.org/linux/man-pages/man8/setarch.8.html)
* [pinning](https://man7.org/linux/man-pages/man1/taskset.1.html) the benchmark process to a specific core
* [disable clock boosts](https://wiki.archlinux.org/title/CPU_frequency_scaling#Configuring_frequency_boosting),
especially on thermal-limited systems such as laptops
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@the8472 the8472 Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of those things may not be relevant to std benchmarks, which are mostly are CPU- or memory-bandwidth-bound and single-threaded. They shouldn't suffer much from swap, IRQs or SMT-siblibgs if you ensured the system is mostly idle since they depend on system activity (well, depends on how many cores one has... maybe core isolation is still worth it).

Scheduling and throttling have the biggest impact in my experience. If we had a benchmark that tried to do a parallel sort on a huge dataset that would be a different story.

Adjusting the scaling governor is a good point.

- mention scaling governors
- linking stage0 as rustup toolchain is now supported
@jyn514 jyn514 merged commit b61d0a2 into rust-lang:master Feb 18, 2023
@jyn514
Copy link
Member

jyn514 commented Feb 18, 2023

Thank you!

github-actions bot pushed a commit that referenced this pull request Feb 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants