|
| 1 | +# Profile Guided Optimization |
| 2 | + |
| 3 | +`rustc` supports doing profile-guided optimization (PGO). |
| 4 | +This chapter describes what PGO is and how the support for it is |
| 5 | +implemented in `rustc`. |
| 6 | + |
| 7 | +## What Is Profiled-Guided Optimization? |
| 8 | + |
| 9 | +The basic concept of PGO is to collect data about the typical execution of |
| 10 | +a program (e.g. which branches it is likely to take) and then use this data |
| 11 | +to inform optimizations such as inlining, machine-code layout, |
| 12 | +register allocation, etc. |
| 13 | + |
| 14 | +There are different ways of collecting data about a program's execution. |
| 15 | +One is to run the program inside a profiler (such as `perf`) and another |
| 16 | +is to create an instrumented binary, that is, a binary that has data |
| 17 | +collection built into it, and run that. |
| 18 | +The latter usually provides more accurate data. |
| 19 | + |
| 20 | +## How is PGO implemented in `rustc`? |
| 21 | + |
| 22 | +`rustc` current PGO implementation relies entirely on LLVM. |
| 23 | +LLVM actually [supports multiple forms][clang-pgo] of PGO: |
| 24 | + |
| 25 | +[clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization |
| 26 | + |
| 27 | +- Sampling-based PGO where an external profiling tool like `perf` is used |
| 28 | + to collect data about a program's execution. |
| 29 | +- GCOV-based profiling, where code coverage infrastructure is used to collect |
| 30 | + profiling information. |
| 31 | +- Front-end based instrumentation, where the compiler front-end (e.g. Clang) |
| 32 | + inserts instrumentation intrinsics into the LLVM IR it generates. |
| 33 | +- IR-level instrumentation, where LLVM inserts the instrumentation intrinsics |
| 34 | + itself during optimization passes. |
| 35 | + |
| 36 | +`rustc` supports only the last approach, IR-level instrumentation, mainly |
| 37 | +because it is almost exclusively implemented in LLVM and needs little |
| 38 | +maintenance on the Rust side. Fortunately, it is also the most modern approach, |
| 39 | +yielding the best results. |
| 40 | + |
| 41 | +So, we are dealing with an instrumentation-based approach, i.e. profiling data |
| 42 | +is generated by a specially instrumented version of the program that's being |
| 43 | +optimized. Instrumentation-based PGO has two components: a compile-time |
| 44 | +component and run-time component, and one needs to understand the overall |
| 45 | +workflow to see how they interact. |
| 46 | + |
| 47 | +### Overall Workflow |
| 48 | + |
| 49 | +Generating a PGO-optimized program involves the following four steps: |
| 50 | + |
| 51 | +1. Compile the program with instrumentation enabled (e.g. `rustc -Cprofile-generate main.rs`) |
| 52 | +2. Run the instrumented program (e.g. `./main`) which generates a `default-<id>.profraw` file |
| 53 | +3. Convert the `.profraw` file into a `.profdata` file using LLVM's `llvm-profdata` tool. |
| 54 | +4. Compile the program again, this time making use of the profiling data |
| 55 | + (e.g. `rustc -Cprofile-use=merged.profdata main.rs`) |
| 56 | + |
| 57 | +### Compile-Time Aspects |
| 58 | + |
| 59 | +Depending on which step in the above workflow we are in, two different things |
| 60 | +can happen at compile time: |
| 61 | + |
| 62 | +#### Create Binaries with Instrumentation |
| 63 | + |
| 64 | +As mentioned above, the profiling instrumentation is added by LLVM. |
| 65 | +`rustc` instructs LLVM to do so [by setting the appropriate][pgo-gen-passmanager] |
| 66 | +flags when creating LLVM `PassManager`s: |
| 67 | + |
| 68 | +```C |
| 69 | + // `PMBR` is an `LLVMPassManagerBuilderRef` |
| 70 | + unwrap(PMBR)->EnablePGOInstrGen = true; |
| 71 | + // Instrumented binaries have a default output path for the `.profraw` file |
| 72 | + // hard-coded into them: |
| 73 | + unwrap(PMBR)->PGOInstrGen = PGOGenPath; |
| 74 | +``` |
| 75 | +
|
| 76 | +`rustc` also has to make sure that some of the symbols from LLVM's profiling |
| 77 | +runtime are not removed [by marking the with the right export level][pgo-gen-symbols]. |
| 78 | +
|
| 79 | +[pgo-gen-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L412-L416 |
| 80 | +[pgo-gen-symbols]:https://github.com/rust-lang/rust/blob/1.34.1/src/librustc_codegen_ssa/back/symbol_export.rs#L212-L225 |
| 81 | +
|
| 82 | +
|
| 83 | +#### Compile Binaries Where Optimizations Make Use Of Profiling Data |
| 84 | +
|
| 85 | +In the final step of the workflow described above, the program is compiled |
| 86 | +again, with the compiler using the gathered profiling data in order to drive |
| 87 | +optimization decisions. `rustc` again leaves most of the work to LLVM here, |
| 88 | +basically [just telling][pgo-use-passmanager] the LLVM `PassManagerBuilder` |
| 89 | +where the profiling data can be found: |
| 90 | +
|
| 91 | +```C |
| 92 | + unwrap(PMBR)->PGOInstrUse = PGOUsePath; |
| 93 | +``` |
| 94 | + |
| 95 | +[pgo-use-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L417-L420 |
| 96 | + |
| 97 | +LLVM does the rest (e.g. setting branch weights, marking functions with |
| 98 | +`cold` or `inlinehint`, etc). |
| 99 | + |
| 100 | + |
| 101 | +### Runtime Aspects |
| 102 | + |
| 103 | +Instrumentation-based approaches always also have a runtime component, i.e. |
| 104 | +once we have an instrumented program, that program needs to be run in order |
| 105 | +to generate profiling data, and collecting and persisting this profiling |
| 106 | +data needs some infrastructure in place. |
| 107 | + |
| 108 | +In the case of LLVM, these runtime components are implemented in |
| 109 | +[compiler-rt][compiler-rt-profile] and statically linked into any instrumented |
| 110 | +binaries. |
| 111 | +The `rustc` version of this can be found in `src/libprofiler_builtins` which |
| 112 | +basically packs the C code from `compiler-rt` into a Rust crate. |
| 113 | + |
| 114 | +In order for `libprofiler_builtins` to be built, `profiler = true` must be set |
| 115 | +in `rustc`'s `config.toml`. |
| 116 | + |
| 117 | +[compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/master/compiler-rt/lib/profile |
| 118 | + |
| 119 | +## Testing PGO |
| 120 | + |
| 121 | +Since the PGO workflow spans multiple compiler invocations most testing happens |
| 122 | +in [run-make tests][rmake-tests] (the relevant tests have `pgo` in their name). |
| 123 | +There is also a [codegen test][codegen-test] that checks that some expected |
| 124 | +instrumentation artifacts show up in LLVM IR. |
| 125 | + |
| 126 | +[rmake-tests]: https://github.com/rust-lang/rust/tree/master/src/test/run-make-fulldeps |
| 127 | +[codegen-test]: https://github.com/rust-lang/rust/blob/master/src/test/codegen/pgo-instrumentation.rs |
| 128 | + |
| 129 | +## Additional Information |
| 130 | + |
| 131 | +Clang's documentation contains a good overview on PGO in LLVM here: |
| 132 | +https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization |
0 commit comments