From 888688517eadbe37c646c2830bebc9697e7fca9b Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 11:30:06 -0500 Subject: [PATCH 01/13] LLVM has moved its development and issue tracking to github, so fix that URL to point there. --- src/backend/debugging.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 673660167..31313395c 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -164,7 +164,9 @@ create a minimal working example with Godbolt. Go to optimizations transform it. 5. Once you have a godbolt link demonstrating the issue, it is pretty easy to - fill in an LLVM bug. Just visit [bugs.llvm.org](https://bugs.llvm.org/). + fill in an LLVM bug. Just visit their [github issues page][llvm-issues]. + +[llvm-issues]: https://github.com/llvm/llvm-project/issues ### Porting bug fixes from LLVM From 44166aa9cc79493ccc4f7989333b23aeb2f3dfd2 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 11:35:12 -0500 Subject: [PATCH 02/13] Added some additional structure to the document by adding more headers. (This is useful in part to provide *me* guidance about where I will add additional material...) --- src/backend/debugging.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 31313395c..57b2b74ea 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -11,6 +11,8 @@ project on its own that probably needs to have its own debugging document (not that I could find one). But here are some tips that are important in a rustc context: +### Minimize the example + As a general rule, compilers generate lots of information from analyzing code. Thus, a useful first step is usually to find a minimal example. One way to do this is to @@ -24,6 +26,8 @@ everything relevant to the new crate 3. further minimize the issue by making the code shorter (there are tools that help with this like `creduce`) +### Enable LLVM internal checks + The official compilers (including nightlies) have LLVM assertions disabled, which means that LLVM assertion failures can show up as compiler crashes (not ICEs but "real" crashes) and other sorts of weird behavior. If you are @@ -40,6 +44,8 @@ hard to replicate manually and means that LLVM is called multiple times in parallel. If you can get away with it (i.e. if it doesn't make your bug disappear), passing `-C codegen-units=1` to rustc will make debugging easier. +### Get your hands on raw LLVM input + For rustc to generate LLVM IR, you need to pass the `--emit=llvm-ir` flag. If you are building via cargo, use the `RUSTFLAGS` environment variable (e.g. `RUSTFLAGS='--emit=llvm-ir'`). This causes rustc to spit out LLVM IR into the @@ -56,6 +62,8 @@ different stages during compilation, which is sometimes useful. One just needs to convert the bitcode files to `.ll` files using `llvm-dis` which should be in the target local compilation of rustc. +### Investigate LLVM optimization passes + If you are seeing incorrect behavior due to an optimization pass, a very handy LLVM option is `-opt-bisect-limit`, which takes an integer denoting the index value of the highest pass to run. Index values for taken passes are stable @@ -80,6 +88,8 @@ $ OPT=./build/$TRIPLE/llvm/bin/opt $ $OPT -S -O2 < my-file.ll > my ``` +### Get your hands on raw LLVM input, part II + If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which IR causes an optimization-time assertion to fail, or to see when LLVM performs a particular optimization, you can pass the rustc flag `-C From 5c7223e06b2911249ca62f65ab0dcf87c98a87cb Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 11:39:12 -0500 Subject: [PATCH 03/13] I couldn't help myself: there's so much relevant stuff in my MCVE blog post that I had to include a link. --- src/backend/debugging.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 57b2b74ea..f74965c0e 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -26,6 +26,11 @@ everything relevant to the new crate 3. further minimize the issue by making the code shorter (there are tools that help with this like `creduce`) +For more discussion on methodology for steps 2 and 3 above, there is an +[epic blog post][mcve-blog] from pnkfelix specifically about Rust program minimization. + +[mcve-blog]: https://blog.pnkfx.org/blog/2019/11/18/rust-bug-minimization-patterns/ + ### Enable LLVM internal checks The official compilers (including nightlies) have LLVM assertions disabled, From fcabdce5270fe597ffc7517b78bb6d0f1d728c20 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 14:06:52 -0500 Subject: [PATCH 04/13] break paragraph into two parts; suggestion of using `opt` tool is orthogonal to the point about the details of LLVM IR generated by rustc itself. --- src/backend/debugging.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index f74965c0e..1e2a2e896 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -81,8 +81,9 @@ pass was run or skipped. Setting the limit to an index of -1 (e.g., their corresponding index values. If you want to play with the optimization pipeline, you can use the `opt` tool -from `./build//llvm/bin/` with the LLVM IR emitted by rustc. Note -that rustc emits different IR depending on whether `-O` is enabled, even +from `./build//llvm/bin/` with the LLVM IR emitted by rustc. + +Note that rustc emits different IR depending on whether `-O` is enabled, even without LLVM's optimizations, so if you want to play with the IR rustc emits, you should: From 29c3c4c51de7e831a1e4ec96b9afdf705dba78fc Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 14:23:09 -0500 Subject: [PATCH 05/13] migrate details of varied LLVM IR generation into appropriate section on the matter. --- src/backend/debugging.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 1e2a2e896..998ada2da 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -83,6 +83,8 @@ their corresponding index values. If you want to play with the optimization pipeline, you can use the `opt` tool from `./build//llvm/bin/` with the LLVM IR emitted by rustc. +### Get your hands on raw LLVM input, part II + Note that rustc emits different IR depending on whether `-O` is enabled, even without LLVM's optimizations, so if you want to play with the IR rustc emits, you should: @@ -94,8 +96,6 @@ $ OPT=./build/$TRIPLE/llvm/bin/opt $ $OPT -S -O2 < my-file.ll > my ``` -### Get your hands on raw LLVM input, part II - If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which IR causes an optimization-time assertion to fail, or to see when LLVM performs a particular optimization, you can pass the rustc flag `-C From ee25c88b795c2fe7e7e2aa7a41756d41e056acbc Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 14:24:02 -0500 Subject: [PATCH 06/13] Move the "Investigate LLVM optimization passes" section beneath the now-adjacent "Get your hands on raw LLVM input" sections. --- src/backend/debugging.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 998ada2da..1bb6fdbe0 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -67,22 +67,6 @@ different stages during compilation, which is sometimes useful. One just needs to convert the bitcode files to `.ll` files using `llvm-dis` which should be in the target local compilation of rustc. -### Investigate LLVM optimization passes - -If you are seeing incorrect behavior due to an optimization pass, a very handy -LLVM option is `-opt-bisect-limit`, which takes an integer denoting the index -value of the highest pass to run. Index values for taken passes are stable -from run to run; by coupling this with software that automates bisecting the -search space based on the resulting program, an errant pass can be quickly -determined. When an `-opt-bisect-limit` is specified, all runs are displayed -to standard error, along with their index and output indicating if the -pass was run or skipped. Setting the limit to an index of -1 (e.g., -`RUSTFLAGS="-C llvm-args=-opt-bisect-limit=-1"`) will show all passes and -their corresponding index values. - -If you want to play with the optimization pipeline, you can use the `opt` tool -from `./build//llvm/bin/` with the LLVM IR emitted by rustc. - ### Get your hands on raw LLVM input, part II Note that rustc emits different IR depending on whether `-O` is enabled, even @@ -121,6 +105,22 @@ $ ./build/$TRIPLE/llvm/bin/llvm-extract \ > extracted.ll ``` +### Investigate LLVM optimization passes + +If you are seeing incorrect behavior due to an optimization pass, a very handy +LLVM option is `-opt-bisect-limit`, which takes an integer denoting the index +value of the highest pass to run. Index values for taken passes are stable +from run to run; by coupling this with software that automates bisecting the +search space based on the resulting program, an errant pass can be quickly +determined. When an `-opt-bisect-limit` is specified, all runs are displayed +to standard error, along with their index and output indicating if the +pass was run or skipped. Setting the limit to an index of -1 (e.g., +`RUSTFLAGS="-C llvm-args=-opt-bisect-limit=-1"`) will show all passes and +their corresponding index values. + +If you want to play with the optimization pipeline, you can use the `opt` tool +from `./build//llvm/bin/` with the LLVM IR emitted by rustc. + ### Getting help and asking questions If you have some questions, head over to the [rust-lang Zulip] and From f929d6e938c46376ee005d71070363e0cfacd514 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 14:24:34 -0500 Subject: [PATCH 07/13] We don't need the part II distinction now that the sections are adjacent. --- src/backend/debugging.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 1bb6fdbe0..43ffe87bd 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -67,8 +67,6 @@ different stages during compilation, which is sometimes useful. One just needs to convert the bitcode files to `.ll` files using `llvm-dis` which should be in the target local compilation of rustc. -### Get your hands on raw LLVM input, part II - Note that rustc emits different IR depending on whether `-O` is enabled, even without LLVM's optimizations, so if you want to play with the IR rustc emits, you should: From 5bd062f14a1bc0a2ce87461ad0ec748756faf049 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 14:24:34 -0500 Subject: [PATCH 08/13] Some notes on the LLVM tools that are generated under `build/$TRIPLE/llvm/`. --- src/backend/debugging.md | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 43ffe87bd..9a65f403f 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -43,6 +43,21 @@ anything turns up. The rustc build process builds the LLVM tools into `./build//llvm/bin`. They can be called directly. +These tools include: + * [`llc`], which compiles bitcode (`.bc` files) to executable code; this can be used to + replicate LLVM backend bugs. + * [`opt`], a bitcode transformer that runs LLVM optimization passes. + * [`bugpoint`], which reduces large test cases to small, useful ones. + * and many others, some of which are referenced in the text below. + +[`llc`]: https://llvm.org/docs/CommandGuide/llc.html +[`opt`]: https://llvm.org/docs/CommandGuide/opt.html +[`bugpoint`]: https://llvm.org/docs/Bugpoint.html + +By default, the Rust build system does not check for changes to the LLVM source code or +its build configuration settings. So, if you need to rebuild the LLVM that is linked +into `rustc`, first delete the file `llvm-finished-building`, which should be located +in `build//llvm/`. The default rustc compilation pipeline has multiple codegen units, which is hard to replicate manually and means that LLVM is called multiple times in @@ -116,7 +131,7 @@ pass was run or skipped. Setting the limit to an index of -1 (e.g., `RUSTFLAGS="-C llvm-args=-opt-bisect-limit=-1"`) will show all passes and their corresponding index values. -If you want to play with the optimization pipeline, you can use the `opt` tool +If you want to play with the optimization pipeline, you can use the [`opt`] tool from `./build//llvm/bin/` with the LLVM IR emitted by rustc. ### Getting help and asking questions From eed68311b665e358326161587ba98974a48f25bf Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 20:21:59 -0500 Subject: [PATCH 09/13] An important note that I wish I had heeded when I was trying to dissect issue 91671. --- src/backend/debugging.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 9a65f403f..2c705ff9b 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -78,9 +78,20 @@ other useful options. Also, debug info in LLVM IR can clutter the output a lot: `RUSTFLAGS="-C debuginfo=0"` is really useful. `RUSTFLAGS="-C save-temps"` outputs LLVM bitcode (not the same as IR) at -different stages during compilation, which is sometimes useful. One just needs -to convert the bitcode files to `.ll` files using `llvm-dis` which should be in -the target local compilation of rustc. +different stages during compilation, which is sometimes useful. The output LLVM +bitcode will be in `.bc` files in the compiler's output directory, set via the +`--out-dir DIR` argument to `rustc`. + + * If you are hitting an assertion failure or segmentation fault from the LLVM + backend when invoking `rustc` itself, it is a good idea to try passing each + of these `.bc` files to the `llc` command, and see if you get the same + failure. (LLVM developers often prefer a bug reduced to a `.bc` file over one + that uses a Rust crate for its minimized reproduction.) + + * To get human readable versions of the LLVM bitcode, one just needs to convert + the bitcode (`.bc`) files to `.ll` files using `llvm-dis`, which should be in + the target local compilation of rustc. + Note that rustc emits different IR depending on whether `-O` is enabled, even without LLVM's optimizations, so if you want to play with the IR rustc emits, From 64f48a2d7f6ff2c9803a53f8f6f0a7170bdaeef2 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 20:43:15 -0500 Subject: [PATCH 10/13] Added notes on how to turn on and control LLVM debugging output. --- src/backend/debugging.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 2c705ff9b..b2b872b00 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -145,6 +145,27 @@ their corresponding index values. If you want to play with the optimization pipeline, you can use the [`opt`] tool from `./build//llvm/bin/` with the LLVM IR emitted by rustc. +When investigating the implementation of LLVM itself, you should be +aware of its [internal debug infrastructure][llvm-debug]. +This is provided in LLVM Debug builds, which you enable for rustc +LLVM builds by changing this setting in the config.toml: +``` +[llvm] +# Indicates whether the LLVM assertions are enabled or not +assertions = true + +# Indicates whether the LLVM build is a Release or Debug build +optimize = false +``` +The quick summary is: + * Setting `assertions=true` enables coarse-grain debug messaging. + * beyond that, setting `optimize=false` enables fine-grain debug messaging. + * `LLVM_DEBUG(dbgs() << msg)` in LLVM is like `debug!(msg)` in `rustc`. + * The `-debug` option turns on all messaging; it is like setting the environment variable `RUSTC_LOG=debug` in `rustc`. + * The `-debug-only=,` variant of that option is more selective; it is like setting the environment variable `RUSTC_LOG=path1,path2` in `rustc`. + +[llvm-debug]: https://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option + ### Getting help and asking questions If you have some questions, head over to the [rust-lang Zulip] and From b43426a073c2f163590ea6a307ec982e60c38240 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 25 Jan 2022 20:51:43 -0500 Subject: [PATCH 11/13] Added a caveat about the effectiveness of the `-print` family of options to LLVM. --- src/backend/debugging.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index b2b872b00..b4b3309a2 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -117,6 +117,18 @@ to some file. Also, if you are using neither `-filter-print-funcs` nor `-C codegen-units=1`, then, because the multiple codegen units run in parallel, the printouts will mix together and you won't be able to read anything. + * One caveat to the aforementioned methodology: the `-print` family of options + to LLVM only prints the IR unit that the pass runs on (e.g., just a + function), and does not include any referenced declarations, globals, + metadata, etc. This means you cannot in general feed the ouptut of `-print` + into `llc` to reproduce a given problem. + + * Within LLVM itself, calling `F.getParent()->dump()` at the beginning of + `SafeStackLegacyPass::runOnFunction` will dump the whole module, which + may provide better basis for reproduction. (However, you + should be able to get that same dump from the `.bc` files dumped by + `-C save-temps`.) + If you want just the IR for a specific function (say, you want to see why it causes an assertion or doesn't optimize correctly), you can use `llvm-extract`, e.g. From 8492d5ff791bacd4edd0ef5393193d46cfb85e2c Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Wed, 26 Jan 2022 15:07:13 -0500 Subject: [PATCH 12/13] fix typo. --- src/backend/debugging.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index b4b3309a2..414fd3143 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -120,7 +120,7 @@ printouts will mix together and you won't be able to read anything. * One caveat to the aforementioned methodology: the `-print` family of options to LLVM only prints the IR unit that the pass runs on (e.g., just a function), and does not include any referenced declarations, globals, - metadata, etc. This means you cannot in general feed the ouptut of `-print` + metadata, etc. This means you cannot in general feed the output of `-print` into `llc` to reproduce a given problem. * Within LLVM itself, calling `F.getParent()->dump()` at the beginning of From 85db9f73a9482300292d34631c0d40d81b477189 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Wed, 26 Jan 2022 15:09:52 -0500 Subject: [PATCH 13/13] Address line length lints by, well, breaking lines. --- src/backend/debugging.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/backend/debugging.md b/src/backend/debugging.md index 414fd3143..791a61fbe 100644 --- a/src/backend/debugging.md +++ b/src/backend/debugging.md @@ -173,8 +173,10 @@ The quick summary is: * Setting `assertions=true` enables coarse-grain debug messaging. * beyond that, setting `optimize=false` enables fine-grain debug messaging. * `LLVM_DEBUG(dbgs() << msg)` in LLVM is like `debug!(msg)` in `rustc`. - * The `-debug` option turns on all messaging; it is like setting the environment variable `RUSTC_LOG=debug` in `rustc`. - * The `-debug-only=,` variant of that option is more selective; it is like setting the environment variable `RUSTC_LOG=path1,path2` in `rustc`. + * The `-debug` option turns on all messaging; it is like setting the + environment variable `RUSTC_LOG=debug` in `rustc`. + * The `-debug-only=,` variant is more selective; it is like + setting the environment variable `RUSTC_LOG=path1,path2` in `rustc`. [llvm-debug]: https://llvm.org/docs/ProgrammersManual.html#the-llvm-debug-macro-and-debug-option