|
| 1 | +# Debugging support in the Rust compiler |
| 2 | + |
| 3 | +This document explains the state of debugging tools support in the Rust compiler (rustc). |
| 4 | +The document gives an overview of debugging tools like GDB, LLDB etc. and infrastrcture |
| 5 | +around Rust compiler to debug Rust code. If you want to learn how to debug the Rust compiler |
| 6 | +itself, then you must see [Debugging the Compiler] page. |
| 7 | + |
| 8 | +The material is gathered from YouTube video [Tom Tromey discusses debugging support in rustc]. |
| 9 | + |
| 10 | +## Preliminaries |
| 11 | + |
| 12 | +### Debuggers |
| 13 | + |
| 14 | +According to Wikipedia |
| 15 | + |
| 16 | +> A [debugger or debugging tool] is a computer program that is used to test and debug |
| 17 | +> other programs (the "target" program). |
| 18 | +
|
| 19 | +Writing a debugger from scratch for a language requires a lot of work, especially if |
| 20 | +debuggers have to be supported on various platforms. GDB and LLDB, however, can be |
| 21 | +extended to support debugging a language. This is the path that Rust has chosen. |
| 22 | +This document's main goal is to document the said debuggers support in Rust compiler. |
| 23 | + |
| 24 | +### DWARF |
| 25 | + |
| 26 | +According to the [DWARF] standard website |
| 27 | + |
| 28 | +> DWARF is a debugging file format used by many compilers and debuggers to support source level |
| 29 | +> debugging. It addresses the requirements of a number of procedural languages, |
| 30 | +> such as C, C++, and Fortran, and is designed to be extensible to other languages. |
| 31 | +> DWARF is architecture independent and applicable to any processor or operating system. |
| 32 | +> It is widely used on Unix, Linux and other operating systems, |
| 33 | +> as well as in stand-alone environments. |
| 34 | +
|
| 35 | +DWARF reader is a program that consumes the DWARF format and creates debugger compatible output. |
| 36 | +This program may live in the compiler itself. DWARF uses a data structure called |
| 37 | +Debugging Information Entry (DIE) which stores the information as "tags" to denote functions, |
| 38 | +variables etc., e.g., `DW_TAG_variable`, `DW_TAG_pointer_type`, `DW_TAG_subprogram` etc. |
| 39 | +You can also invent your own tags and attributes. |
| 40 | + |
| 41 | +## Supported debuggers |
| 42 | + |
| 43 | +### GDB |
| 44 | + |
| 45 | +We have our own fork of GDB - [https://github.com/rust-dev-tools/gdb] |
| 46 | + |
| 47 | +#### Rust expression parser |
| 48 | + |
| 49 | +To be able to show debug output we need an expression parser. |
| 50 | +This (GDB) expression parser is written in [Bison] and is only a subset of Rust expressions. |
| 51 | +This means that this parser can parse only a subset of Rust expressions. |
| 52 | +GDB parser was written from scratch and has no relation to any other parser. |
| 53 | +For example, this parser is not related to Rustc's parser. |
| 54 | + |
| 55 | +GDB has Rust like value and type output. It can print values and types in a way |
| 56 | +that look like Rust syntax in the output. Or when you print a type as [ptype] in GDB, |
| 57 | +it also looks like Rust source code. Checkout the documentation in the [manual for GDB/Rust]. |
| 58 | + |
| 59 | +#### Parser extensions |
| 60 | + |
| 61 | +Expression parser has a couple of extensions in it to facilitate features that you cannot do |
| 62 | +with Rust. Some limitations are listed in the [manual for GDB/Rust]. There is some special |
| 63 | +code in the DWARF reader in GDB to support the extensions. |
| 64 | + |
| 65 | +A couple of examples of DWARF reader support needed are as follows - |
| 66 | + |
| 67 | +1. Enum: Needed for support for enum types. The Rustc writes the information about enum into |
| 68 | +DWARF and GDB reads the DWARF to understand where is the tag field or is there a tag |
| 69 | +field or is the tag slot shared with non-zero optimization etc. |
| 70 | + |
| 71 | +2. Dissect trait objects: DWARF extension where the trait object's description in the DWARF |
| 72 | +also points to a stub description of the corresponding vtable which in turn points to the |
| 73 | +concrete type for which this trait object exists. This means that you can do a `print *object` |
| 74 | +for that trait object, and GDB will understand how to find the correct type of the payload in |
| 75 | +the trait object. |
| 76 | + |
| 77 | +**TODO**: Figure out if the following should be mentioned in the GDB-Rust document rather than |
| 78 | +this guide page so there is no duplication. This is regarding the following comments: |
| 79 | + |
| 80 | +[This comment by Tom](https://github.com/rust-lang/rustc-guide/pull/316#discussion_r284027340) |
| 81 | +> gdb's Rust extensions and limitations are documented in the gdb manual: |
| 82 | +https://sourceware.org/gdb/onlinedocs/gdb/Rust.html -- however, this neglects to mention that |
| 83 | +gdb convenience variables and registers follow the gdb $ convention, and that the Rust parser |
| 84 | +implements the gdb @ extension. |
| 85 | + |
| 86 | +[This question by Aman](https://github.com/rust-lang/rustc-guide/pull/316#discussion_r285401353) |
| 87 | +> @tromey do you think we should mention this part in the GDB-Rust document rather than this |
| 88 | +document so there is no duplication etc.? |
| 89 | + |
| 90 | +#### Developer notes |
| 91 | + |
| 92 | +* This work is now upstream. Bugs can be reported in [GDB Bugzilla]. |
| 93 | + |
| 94 | +### LLDB |
| 95 | + |
| 96 | +We have our own fork of LLDB - [https://github.com/rust-lang/lldb] |
| 97 | + |
| 98 | +Fork of LLVM project - [https://github.com/rust-lang/llvm-project] |
| 99 | + |
| 100 | +LLDB currently only works on macOS because of a dependency issue. This issue was easier to |
| 101 | +solve for macOS as compared to Linux. However, Tom has a possible solution which can enable |
| 102 | +us to ship LLDB everywhere. |
| 103 | + |
| 104 | +#### Rust expression parser |
| 105 | + |
| 106 | +This expression parser is written in C++. It is a type of [Recursive Descent parser]. |
| 107 | +Implements slightly less of the Rust language than GDB. LLDB has Rust like value and type output. |
| 108 | + |
| 109 | +#### Parser extensions |
| 110 | + |
| 111 | +There is some special code in the DWARF reader in LLDB to support the extensions. |
| 112 | +A couple of examples of DWARF reader support needed are as follows - |
| 113 | + |
| 114 | +1. Enum: Needed for support for enum types. The Rustc writes the information about |
| 115 | +enum into DWARF and LLDB reads the DWARF to understand where is the tag field or |
| 116 | +is there a tag field or is the tag slot shared with non-zero optimization etc. |
| 117 | +In other words, it has enum support as well. |
| 118 | + |
| 119 | +#### Developer notes |
| 120 | + |
| 121 | +* None of the LLDB work is upstream. This [rust-lang/lldb wiki page] explains a few details. |
| 122 | +* The reason for forking LLDB is that LLDB recently removed all the other language plugins |
| 123 | +due to lack of maintenance. |
| 124 | +* LLDB has a plugin architecture but that does not work for language support. |
| 125 | +* LLDB is available via Rust build (`rustup`). |
| 126 | +* GDB generally works better on Linux. |
| 127 | + |
| 128 | +## DWARF and Rustc |
| 129 | + |
| 130 | +[DWARF] is the standard way compilers generate debugging information that debuggers read. |
| 131 | +It is _the_ debugging format on macOS and Linux. It is a multi-language, extensible format |
| 132 | +and is mostly good enough for Rust's purposes. Hence, the current implementation reuses DWARF's |
| 133 | +concepts. This is true even if some of the concepts in DWARF do not align with Rust |
| 134 | +semantically because generally there can be some kind of mapping between the two. |
| 135 | + |
| 136 | +We have some DWARF extensions that the Rust compiler emits and the debuggers understand that |
| 137 | +are _not_ in the DWARF standard. |
| 138 | + |
| 139 | +* Rust compiler will emit DWARF for a virtual table, and this `vtable` object will have a |
| 140 | + `DW_AT_containing_type` that points to the real type. This lets debuggers dissect a trait object |
| 141 | + pointer to correctly find the payload. E.g., here's such a DIE, from a test case in the gdb |
| 142 | + repository: |
| 143 | + |
| 144 | + ```asm |
| 145 | + <1><1a9>: Abbrev Number: 3 (DW_TAG_structure_type) |
| 146 | + <1aa> DW_AT_containing_type: <0x1b4> |
| 147 | + <1ae> DW_AT_name : (indirect string, offset: 0x23d): vtable |
| 148 | + <1b2> DW_AT_byte_size : 0 |
| 149 | + <1b3> DW_AT_alignment : 8 |
| 150 | + ``` |
| 151 | + |
| 152 | +* The other extension is that the Rust compiler can emit a tagless discriminated union. |
| 153 | + See [DWARF feature request] for this item. |
| 154 | + |
| 155 | +### Current limitations of DWARF |
| 156 | + |
| 157 | +* Traits - require a bigger change than normal to DWARF, on how to represent Traits in DWARF. |
| 158 | +* DWARF provides no way to differentiate between Structs and Tuples. Rust compiler emits |
| 159 | +fields with `__0` and debuggers look for a sequence of such names to overcome this limitation. |
| 160 | +For example, in this case the debugger would look at a field via `x.__0` instead of `x.0`. |
| 161 | +This is resolved via the Rust parser in the debugger so now you can do `x.0`. |
| 162 | + |
| 163 | +DWARF relies on debuggers to know some information about platform ABI. |
| 164 | +Rust does not do that all the time. |
| 165 | + |
| 166 | +## Developer notes |
| 167 | + |
| 168 | +This section is from the talk about certain aspects of development. |
| 169 | + |
| 170 | +## What is missing |
| 171 | + |
| 172 | +### Shipping GDB in Rustup |
| 173 | + |
| 174 | +Tracking issue: [https://github.com/rust-lang/rust/issues/34457] |
| 175 | + |
| 176 | +Shipping GDB requires change to Rustup delivery system. To manage Rustup build size and |
| 177 | +times we need to build GDB separately, on its own and somehow provide the artifacts produced |
| 178 | +to be included in the final build. However, if we can ship GDB with rustup, it will simplify |
| 179 | +the development process by having compiler emit new debug info which can be readily consumed. |
| 180 | + |
| 181 | +Main issue in achieving this is setting up dependencies. One such dependency is Python. That |
| 182 | +is why we have our own fork of GDB because one of the drivers is patched on Rust's side to |
| 183 | +check the correct version of Python (Python 2.7 in this case. *Note: Python3 is not chosen |
| 184 | +for this purpose because Python's stable ABI is limited and is not sufficient for GDB's needs. |
| 185 | +See [https://docs.python.org/3/c-api/stable.html]*). |
| 186 | + |
| 187 | +This is to keep updates to debugger as fast as possible as we make changes to the debugging symbols. |
| 188 | +In essence, to ship the debugger as soon as new debugging info is added. GDB only releases |
| 189 | +every six months or so. However, the changes that are |
| 190 | +not related to Rust itself should ideally be first merged to upstream eventually. |
| 191 | + |
| 192 | +### Code signing for LLDB debug server on macOS |
| 193 | + |
| 194 | +According to Wikipedia, [System Integrity Protection] is |
| 195 | + |
| 196 | +> System Integrity Protection (SIP, sometimes referred to as rootless) is a security feature |
| 197 | +> of Apple's macOS operating system introduced in OS X El Capitan. It comprises a number of |
| 198 | +> mechanisms that are enforced by the kernel. A centerpiece is the protection of system-owned |
| 199 | +> files and directories against modifications by processes without a specific "entitlement", |
| 200 | +> even when executed by the root user or a user with root privileges (sudo). |
| 201 | +
|
| 202 | +It prevents processes using `ptrace` syscall. If a process wants to use `ptrace` it has to be |
| 203 | +code signed. The certificate that signs it has to be trusted on your machine. |
| 204 | + |
| 205 | +See [Apple developer documentation for System Integrity Protection]. |
| 206 | + |
| 207 | +We may need to sign up with Apple and get the keys to do this signing. Tom has looked into if |
| 208 | +Mozilla cannot do this because it is at the maximum number of |
| 209 | +keys it is allowed to sign. Tom does not know if Mozilla could get more keys. |
| 210 | + |
| 211 | +Alternatively, Tom suggests that maybe a Rust legal entity is needed to get the keys via Apple. |
| 212 | +This problem is not technical in nature. If we had such a key we could sign GDB as well and |
| 213 | +ship that. |
| 214 | + |
| 215 | +### DWARF and Traits |
| 216 | + |
| 217 | +Rust traits are not emitted into DWARF at all. The impact of this is calling a method `x.method()` |
| 218 | +does not work as is. The reason being that method is implemented by a trait, as opposed |
| 219 | +to a type. That information is not present so finding trait methods is missing. |
| 220 | + |
| 221 | +DWARF has a notion of interface types (possibly added for Java). Tom's idea was to use this |
| 222 | +interface type as traits. |
| 223 | + |
| 224 | +DWARF only deals with concrete names, not the reference types. So, a given implementation of a |
| 225 | +trait for a type would be one of these interfaces (`DW_tag_interface` type). Also, the type for |
| 226 | +which it is implemented would describe all the interfaces this type implements. This requires a |
| 227 | +DWARF extension. |
| 228 | + |
| 229 | +Issue on Github: [https://github.com/rust-lang/rust/issues/33014] |
| 230 | + |
| 231 | +## Typical process for a Debug Info change (LLVM) |
| 232 | + |
| 233 | +LLVM has Debug Info (DI) builders. This is the primary thing that Rust calls into. |
| 234 | +This is why we need to change LLVM first because that is emitted first and not DWARF directly. |
| 235 | +This is a kind of metadata that you construct and hand-off to LLVM. For the Rustc/LLVM hand-off |
| 236 | +some LLVM DI builder methods are called to construct representation of a type. |
| 237 | + |
| 238 | +The steps of this process are as follows - |
| 239 | + |
| 240 | +1. LLVM needs changing. |
| 241 | + |
| 242 | + LLVM does not emit Interface types at all, so this needs to be implemented in the LLVM first. |
| 243 | + |
| 244 | + Get sign off on LLVM maintainers that this is a good idea. |
| 245 | + |
| 246 | +2. Change the DWARF extension. |
| 247 | + |
| 248 | +3. Update the debuggers. |
| 249 | + |
| 250 | + Update DWARF readers, expression evaluators. |
| 251 | + |
| 252 | +4. Update Rust compiler. |
| 253 | + |
| 254 | + Change it to emit this new information. |
| 255 | + |
| 256 | +### Procedural macro stepping |
| 257 | + |
| 258 | +A deeply profound question is that how do you actually debug a procedural macro? |
| 259 | +What is the location you emit for a macro expansion? Consider some of the following cases - |
| 260 | + |
| 261 | +* You can emit location of the invocation of the macro. |
| 262 | +* You can emit the location of the definition of the macro. |
| 263 | +* You can emit locations of the content of the macro. |
| 264 | + |
| 265 | +RFC: [https://github.com/rust-lang/rfcs/pull/2117] |
| 266 | + |
| 267 | +Focus is to let macros decide what to do. This can be achieved by having some kind of attribute |
| 268 | +that lets the macro tell the compiler where the line marker should be. This affects where you |
| 269 | +set the breakpoints and what happens when you step it. |
| 270 | + |
| 271 | +## Future work |
| 272 | + |
| 273 | +#### Name mangling changes |
| 274 | + |
| 275 | +* New demangler in `libiberty` (gcc source tree). |
| 276 | +* New demangler in LLVM or LLDB. |
| 277 | + |
| 278 | +**TODO**: Check the location of the demangler source. |
| 279 | +[Question on Github](https://github.com/rust-lang/rustc-guide/pull/316#discussion_r283062536). |
| 280 | + |
| 281 | +#### Reuse Rust compiler for expressions |
| 282 | + |
| 283 | +This is an important idea because debuggers by and large do not try to implement type |
| 284 | +inference. You need to be much more explicit when you type into the debugger than your |
| 285 | +actual source code. So, you cannot just copy and paste an expression from your source |
| 286 | +code to debugger and expect the same answer but this would be nice. This can be helped |
| 287 | +by using compiler. |
| 288 | + |
| 289 | +It is certainly doable but it is a large project. You certainly need a bridge to the |
| 290 | +debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang) |
| 291 | +have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC. |
| 292 | + |
| 293 | +Both debuggers expression evaluation implement both a superset and a subset of Rust. |
| 294 | +They implement just the expression language but they also add some extensions like GDB has |
| 295 | +convenience variables. Therefore, if you are taking this route then you not only need |
| 296 | +to do this bridge but may have to add some mode to let the compiler understand some extensions. |
| 297 | + |
| 298 | +#### Windows debugging (PDB) is missing |
| 299 | + |
| 300 | +This is a complete unknown. |
| 301 | + |
| 302 | +[Tom Tromey discusses debugging support in rustc]: https://www.youtube.com/watch?v=elBxMRSNYr4 |
| 303 | +[Debugging the Compiler]: compiler-debugging.md |
| 304 | +[debugger or debugging tool]: https://en.wikipedia.org/wiki/Debugger |
| 305 | +[Bison]: https://www.gnu.org/software/bison/ |
| 306 | +[ptype]: https://ftp.gnu.org/old-gnu/Manuals/gdb/html_node/gdb_109.html |
| 307 | +[rust-lang/lldb wiki page]: https://github.com/rust-lang/lldb/wiki |
| 308 | +[DWARF]: http://dwarfstd.org |
| 309 | +[manual for GDB/Rust]: https://sourceware.org/gdb/onlinedocs/gdb/Rust.html |
| 310 | +[GDB Bugzilla]: https://sourceware.org/bugzilla/ |
| 311 | +[Recursive Descent parser]: https://en.wikipedia.org/wiki/Recursive_descent_parser |
| 312 | +[System Integrity Protection]: https://en.wikipedia.org/wiki/System_Integrity_Protection |
| 313 | +[https://github.com/rust-dev-tools/gdb]: https://github.com/rust-dev-tools/gdb |
| 314 | +[DWARF feature request]: http://dwarfstd.org/ShowIssue.php?issue=180517.2 |
| 315 | +[https://docs.python.org/3/c-api/stable.html]: https://docs.python.org/3/c-api/stable.html |
| 316 | +[https://github.com/rust-lang/rfcs/pull/2117]: https://github.com/rust-lang/rfcs/pull/2117 |
| 317 | +[https://github.com/rust-lang/rust/issues/33014]: https://github.com/rust-lang/rust/issues/33014 |
| 318 | +[https://github.com/rust-lang/rust/issues/34457]: https://github.com/rust-lang/rust/issues/34457 |
| 319 | +[Apple developer documentation for System Integrity Protection]: https://developer.apple.com/library/archive/releasenotes/MacOSX/WhatsNewInOSX/Articles/MacOSX10_11.html#//apple_ref/doc/uid/TP40016227-SW11 |
| 320 | +[https://github.com/rust-lang/lldb]: https://github.com/rust-lang/lldb |
| 321 | +[https://github.com/rust-lang/llvm-project]: https://github.com/rust-lang/llvm-project |
0 commit comments