From 02fd7f63eebcc80e659df8633eaae7dec1adfbdf Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Thu, 25 Aug 2022 04:36:45 +0200 Subject: [PATCH 1/4] there are 3 backends now --- src/part-5-intro.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/src/part-5-intro.md b/src/part-5-intro.md index 4b7c25797..1c11eacd5 100644 --- a/src/part-5-intro.md +++ b/src/part-5-intro.md @@ -13,7 +13,7 @@ will finally take the MIR and produce some executable machine code. > NOTE: This part of a compiler is often called the _backend_. The term is a bit > overloaded because in the compiler source, it usually refers to the "codegen -> backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" +> backend" (i.e. LLVM, Cranelift, or GCC). Usually, when you see the word "backend" > in this part, we are referring to the "codegen backend". So what do we need to do? @@ -26,7 +26,7 @@ So what do we need to do? collecting all the concrete types is called _monomorphization collection_. 1. Next, we need to actually lower the MIR to a codegen IR (usually LLVM IR) for each concrete type we collected. -2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of +2. Finally, we need to invoke the codegen backend, which runs a bunch of optimization passes, generates executable code, and links together an executable binary. @@ -34,16 +34,15 @@ So what do we need to do? The code for codegen is actually a bit complex due to a few factors: -- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much +- Support for multiple codegen backends (LLVM, Cranelift, and GCC). We try to share as much backend code between them as possible, so a lot of it is generic over the codegen implementation. This means that there are often a lot of layers of abstraction. - Codegen happens asynchronously in another thread for performance. -- The actual codegen is done by a third-party library (either LLVM or Cranelift). +- The actual codegen is done by a third-party library (either of the 3 backends). -Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code -(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm] -crate contains code specific to LLVM codegen. +Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code, +while the [`rustc_codegen_llvm`][llvm] crate contains code specific to LLVM codegen. [ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html [llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html @@ -51,4 +50,3 @@ crate contains code specific to LLVM codegen. At a very high level, the entry point is [`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the process discussed in the rest of this chapter. - From 334d037464bef72aafba43b9de9ed085108ccf65 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Thu, 25 Aug 2022 04:48:46 +0200 Subject: [PATCH 2/4] obey #1132 --- src/part-5-intro.md | 57 ++++++++++++++++++++++++--------------------- 1 file changed, 31 insertions(+), 26 deletions(-) diff --git a/src/part-5-intro.md b/src/part-5-intro.md index 1c11eacd5..faa12f484 100644 --- a/src/part-5-intro.md +++ b/src/part-5-intro.md @@ -1,43 +1,48 @@ # From MIR to Binaries -All of the preceding chapters of this guide have one thing in common: we never -generated any executable machine code at all! With this chapter, all of that -changes. +All of the preceding chapters of this guide have one thing in common: +we never generated any executable machine code at all! +With this chapter, all of that changes. -So far, we've shown how the compiler can take raw source code in text format -and transform it into [MIR]. We have also shown how the compiler does various -analyses on the code to detect things like type or lifetime errors. Now, we -will finally take the MIR and produce some executable machine code. +So far, +we've shown how the compiler can take raw source code in text format +and transform it into [MIR]. +We have also shown how the compiler does various +analyses on the code to detect things like type or lifetime errors. +Now, we will finally take the MIR and produce some executable machine code. [MIR]: ./mir/index.md -> NOTE: This part of a compiler is often called the _backend_. The term is a bit -> overloaded because in the compiler source, it usually refers to the "codegen -> backend" (i.e. LLVM, Cranelift, or GCC). Usually, when you see the word "backend" -> in this part, we are referring to the "codegen backend". +> NOTE: This part of a compiler is often called the _backend_. +> The term is a bit overloaded because in the compiler source, +> it usually refers to the "codegen backend" (i.e. LLVM, Cranelift, or GCC). +> Usually, when you see the word "backend" in this part, +> we are referring to the "codegen backend". So what do we need to do? -0. First, we need to collect the set of things to generate code for. In - particular, we need to find out which concrete types to substitute for - generic ones, since we need to generate code for the concrete types. - Generating code for the concrete types (i.e. emitting a copy of the code for - each concrete type) is called _monomorphization_, so the process of - collecting all the concrete types is called _monomorphization collection_. +0. First, we need to collect the set of things to generate code for. + In particular, + we need to find out which concrete types to substitute for generic ones, + since we need to generate code for the concrete types. + Generating code for the concrete types + (i.e. emitting a copy of the code for each concrete type) is called _monomorphization_, + so the process of collecting all the concrete types is called _monomorphization collection_. 1. Next, we need to actually lower the MIR to a codegen IR (usually LLVM IR) for each concrete type we collected. -2. Finally, we need to invoke the codegen backend, which runs a bunch of - optimization passes, generates executable code, and links together an - executable binary. +2. Finally, we need to invoke the codegen backend, + which runs a bunch of optimization passes, + generates executable code, + and links together an executable binary. [codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html The code for codegen is actually a bit complex due to a few factors: -- Support for multiple codegen backends (LLVM, Cranelift, and GCC). We try to share as much - backend code between them as possible, so a lot of it is generic over the - codegen implementation. This means that there are often a lot of layers of - abstraction. +- Support for multiple codegen backends (LLVM, Cranelift, and GCC). + We try to share as much backend code between them as possible, + so a lot of it is generic over the codegen implementation. + This means that there are often a lot of layers of abstraction. - Codegen happens asynchronously in another thread for performance. - The actual codegen is done by a third-party library (either of the 3 backends). @@ -48,5 +53,5 @@ while the [`rustc_codegen_llvm`][llvm] crate contains code specific to LLVM code [llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html At a very high level, the entry point is -[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the -process discussed in the rest of this chapter. +[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. +This function starts the process discussed in the rest of this chapter. From 3f6e8f6b10449645e2ce3faf53a5a0e8b75bf70b Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Thu, 25 Aug 2022 04:57:24 +0200 Subject: [PATCH 3/4] there are, again, 3 backends now --- src/backend/codegen.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/src/backend/codegen.md b/src/backend/codegen.md index 1a6c2fa76..47014b9e0 100644 --- a/src/backend/codegen.md +++ b/src/backend/codegen.md @@ -1,13 +1,16 @@ # Code generation -Code generation or "codegen" is the part of the compiler that actually -generates an executable binary. Usually, rustc uses LLVM for code generation; -there is also support for [Cranelift]. The key is that rustc doesn't implement -codegen itself. It's worth noting, though, that in the Rust source code, many -parts of the backend have `codegen` in their names (there are no hard -boundaries). +Code generation (or "codegen") is the part of the compiler +that actually generates an executable binary. +Usually, rustc uses LLVM for code generation, +bu there is also support for [Cranelift] and [GCC]. +The key is that rustc doesn't implement codegen itself. +It's worth noting, though, that in the Rust source code, +many parts of the backend have `codegen` in their names +(there are no hard boundaries). [Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/HEAD/cranelift +[GCC]: https://github.com/rust-lang/rustc_codegen_gcc > NOTE: If you are looking for hints on how to debug code generation bugs, > please see [this section of the debugging chapter][debugging]. From 3b92fad85ac2e55fbd44a7acca3f796c28ba33e2 Mon Sep 17 00:00:00 2001 From: Tshepang Mbambo Date: Thu, 25 Aug 2022 05:01:41 +0200 Subject: [PATCH 4/4] that shows hash of latest, which can be confusing --- src/backend/codegen.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/codegen.md b/src/backend/codegen.md index 47014b9e0..5feea5202 100644 --- a/src/backend/codegen.md +++ b/src/backend/codegen.md @@ -9,7 +9,7 @@ It's worth noting, though, that in the Rust source code, many parts of the backend have `codegen` in their names (there are no hard boundaries). -[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/HEAD/cranelift +[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/main/cranelift [GCC]: https://github.com/rust-lang/rustc_codegen_gcc > NOTE: If you are looking for hints on how to debug code generation bugs,