Skip to content

Commit 7068677

Browse files
michaelwoeristermark-i-m
authored andcommitted
Add documentation about profile-guided optimization.
1 parent 443668b commit 7068677

File tree

2 files changed

+133
-0
lines changed

2 files changed

+133
-0
lines changed

src/SUMMARY.md

+1
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@
8585
- [Debugging LLVM](./codegen/debugging.md)
8686
- [Emitting Diagnostics](./diag.md)
8787
- [JSON diagnostic format](./diag/json-format.md)
88+
- [Profile-guided Optimization](./profile-guided-optimization.md)
8889

8990
---
9091

src/profile-guided-optimization.md

+132
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Profile Guided Optimization
2+
3+
`rustc` supports doing profile-guided optimization (PGO).
4+
This chapter describes what PGO is and how the support for it is
5+
implemented in `rustc`.
6+
7+
## What Is Profiled-Guided Optimization?
8+
9+
The basic concept of PGO is to collect data about the typical execution of
10+
a program (e.g. which branches it is likely to take) and then use this data
11+
to inform optimizations such as inlining, machine-code layout,
12+
register allocation, etc.
13+
14+
There are different ways of collecting data about a program's execution.
15+
One is to run the program inside a profiler (such as `perf`) and another
16+
is to create an instrumented binary, that is, a binary that has data
17+
collection built into it, and run that.
18+
The latter usually provides more accurate data.
19+
20+
## How is PGO implemented in `rustc`?
21+
22+
`rustc` current PGO implementation relies entirely on LLVM.
23+
LLVM actually [supports multiple forms][clang-pgo] of PGO:
24+
25+
[clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
26+
27+
- Sampling-based PGO where an external profiling tool like `perf` is used
28+
to collect data about a program's execution.
29+
- GCOV-based profiling, where code coverage infrastructure is used to collect
30+
profiling information.
31+
- Front-end based instrumentation, where the compiler front-end (e.g. Clang)
32+
inserts instrumentation intrinsics into the LLVM IR it generates.
33+
- IR-level instrumentation, where LLVM inserts the instrumentation intrinsics
34+
itself during optimization passes.
35+
36+
`rustc` supports only the last approach, IR-level instrumentation, mainly
37+
because it is almost exclusively implemented in LLVM and needs little
38+
maintenance on the Rust side. Fortunately, it is also the most modern approach,
39+
yielding the best results.
40+
41+
So, we are dealing with an instrumentation-based approach, i.e. profiling data
42+
is generated by a specially instrumented version of the program that's being
43+
optimized. Instrumentation-based PGO has two components: a compile-time
44+
component and run-time component, and one needs to understand the overall
45+
workflow to see how they interact.
46+
47+
### Overall Workflow
48+
49+
Generating a PGO-optimized program involves the following four steps:
50+
51+
1. Compile the program with instrumentation enabled (e.g. `rustc -Cprofile-generate main.rs`)
52+
2. Run the instrumented program (e.g. `./main`) which generates a `default-<id>.profraw` file
53+
3. Convert the `.profraw` file into a `.profdata` file using LLVM's `llvm-profdata` tool.
54+
4. Compile the program again, this time making use of the profiling data
55+
(e.g. `rustc -Cprofile-use=merged.profdata main.rs`)
56+
57+
### Compile-Time Aspects
58+
59+
Depending on which step in the above workflow we are in, two different things
60+
can happen at compile time:
61+
62+
#### Create Binaries with Instrumentation
63+
64+
As mentioned above, the profiling instrumentation is added by LLVM.
65+
`rustc` instructs LLVM to do so [by setting the appropriate][pgo-gen-passmanager]
66+
flags when creating LLVM `PassManager`s:
67+
68+
```C
69+
// `PMBR` is an `LLVMPassManagerBuilderRef`
70+
unwrap(PMBR)->EnablePGOInstrGen = true;
71+
// Instrumented binaries have a default output path for the `.profraw` file
72+
// hard-coded into them:
73+
unwrap(PMBR)->PGOInstrGen = PGOGenPath;
74+
```
75+
76+
`rustc` also has to make sure that some of the symbols from LLVM's profiling
77+
runtime are not removed [by marking the with the right export level][pgo-gen-symbols].
78+
79+
[pgo-gen-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L412-L416
80+
[pgo-gen-symbols]:https://github.com/rust-lang/rust/blob/1.34.1/src/librustc_codegen_ssa/back/symbol_export.rs#L212-L225
81+
82+
83+
#### Compile Binaries Where Optimizations Make Use Of Profiling Data
84+
85+
In the final step of the workflow described above, the program is compiled
86+
again, with the compiler using the gathered profiling data in order to drive
87+
optimization decisions. `rustc` again leaves most of the work to LLVM here,
88+
basically [just telling][pgo-use-passmanager] the LLVM `PassManagerBuilder`
89+
where the profiling data can be found:
90+
91+
```C
92+
unwrap(PMBR)->PGOInstrUse = PGOUsePath;
93+
```
94+
95+
[pgo-use-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L417-L420
96+
97+
LLVM does the rest (e.g. setting branch weights, marking functions with
98+
`cold` or `inlinehint`, etc).
99+
100+
101+
### Runtime Aspects
102+
103+
Instrumentation-based approaches always also have a runtime component, i.e.
104+
once we have an instrumented program, that program needs to be run in order
105+
to generate profiling data, and collecting and persisting this profiling
106+
data needs some infrastructure in place.
107+
108+
In the case of LLVM, these runtime components are implemented in
109+
[compiler-rt][compiler-rt-profile] and statically linked into any instrumented
110+
binaries.
111+
The `rustc` version of this can be found in `src/libprofiler_builtins` which
112+
basically packs the C code from `compiler-rt` into a Rust crate.
113+
114+
In order for `libprofiler_builtins` to be built, `profiler = true` must be set
115+
in `rustc`'s `config.toml`.
116+
117+
[compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/master/compiler-rt/lib/profile
118+
119+
## Testing PGO
120+
121+
Since the PGO workflow spans multiple compiler invocations most testing happens
122+
in [run-make tests][rmake-tests] (the relevant tests have `pgo` in their name).
123+
There is also a [codegen test][codegen-test] that checks that some expected
124+
instrumentation artifacts show up in LLVM IR.
125+
126+
[rmake-tests]: https://github.com/rust-lang/rust/tree/master/src/test/run-make-fulldeps
127+
[codegen-test]: https://github.com/rust-lang/rust/blob/master/src/test/codegen/pgo-instrumentation.rs
128+
129+
## Additional Information
130+
131+
Clang's documentation contains a good overview on PGO in LLVM here:
132+
https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization

0 commit comments

Comments
 (0)