Skip to content

Commit 722cb2f

Browse files
committed
- reword vectorization section
- mention scaling governors - linking stage0 as rustup toolchain is now supported
1 parent fbb1d07 commit 722cb2f

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

src/development/perf-benchmarking.md

+10-8
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,11 @@ and those about code size as [I-heavy T-libs](https://github.com/rust-lang/rust/
1313

1414
## Vectorization
1515

16-
Currently explicit SIMD features can't be used in alloc or core because runtime feature-detection is only available in std
17-
and they are compiled with each target's baseline feature set.
18-
19-
Vectorization can only be achieved by shaping code in a way that the compiler backend's auto-vectorization passes can understand.
16+
Currently only baseline target features (e.g. SSE2 on x86_64-unknown-linux-gnu) can be used in core and alloc because
17+
runtime feature-detection is only available in std.
18+
Where possible the preferred way to achieve vectorization is by shaping code in a way that the compiler
19+
backend's auto-vectorization passes can understand. This benefits user crates compiled with additional target features
20+
when they instantiate generic library functions, e.g. iterators.
2021

2122
## rustc-perf
2223

@@ -47,6 +48,8 @@ reproducible:
4748
* ensure the system is as idle as possible
4849
* [disable ASLR](https://man7.org/linux/man-pages/man8/setarch.8.html)
4950
* [pinning](https://man7.org/linux/man-pages/man1/taskset.1.html) the benchmark process to a specific core
51+
* change the CPU [scaling governor](https://wiki.archlinux.org/title/CPU_frequency_scaling#Scaling_governors)
52+
to a fixed-frequency one (`performance` or `powersave`)
5053
* [disable clock boosts](https://wiki.archlinux.org/title/CPU_frequency_scaling#Configuring_frequency_boosting),
5154
especially on thermal-limited systems such as laptops
5255

@@ -55,11 +58,10 @@ reproducible:
5558
If `x` or the cargo benchmark harness get in the way it can be useful to extract the benchmark into a separate crate,
5659
e.g. to run it under `perf stat` or cachegrind.
5760

58-
Build and link the [stage1](https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html#creating-a-rustup-toolchain)
59-
compiler as rustup toolchain and then use that to build the standalone benchmark with a modified standard library.
61+
Build the standard library and link [stage0-sysroot](https://rustc-dev-guide.rust-lang.org/building/how-to-build-and-run.html#creating-a-rustup-toolchain)
62+
as rustup toolchain and then use that to build the standalone benchmark with a modified standard library.
6063

61-
[Currently](https://github.com/rust-lang/rust/issues/101691) there is no convenient way to invoke a stage0 toolchain with
62-
a modified standard library. To avoid the compiler rebuild it can be useful to not only extract the benchmark but also
64+
If the std rebuild times are too long for fast iteration it can be useful to not only extract the benchmark but also
6365
the code under test into a separate crate.
6466

6567
## Running under perf-record

0 commit comments

Comments
 (0)