|
1 | 1 | # From MIR to Binaries
|
2 | 2 |
|
3 |
| -All of the preceding chapters of this guide have one thing in common: we never |
4 |
| -generated any executable machine code at all! With this chapter, all of that |
5 |
| -changes. |
| 3 | +All of the preceding chapters of this guide have one thing in common: |
| 4 | +we never generated any executable machine code at all! |
| 5 | +With this chapter, all of that changes. |
6 | 6 |
|
7 |
| -So far, we've shown how the compiler can take raw source code in text format |
8 |
| -and transform it into [MIR]. We have also shown how the compiler does various |
9 |
| -analyses on the code to detect things like type or lifetime errors. Now, we |
10 |
| -will finally take the MIR and produce some executable machine code. |
| 7 | +So far, |
| 8 | +we've shown how the compiler can take raw source code in text format |
| 9 | +and transform it into [MIR]. |
| 10 | +We have also shown how the compiler does various |
| 11 | +analyses on the code to detect things like type or lifetime errors. |
| 12 | +Now, we will finally take the MIR and produce some executable machine code. |
11 | 13 |
|
12 | 14 | [MIR]: ./mir/index.md
|
13 | 15 |
|
14 |
| -> NOTE: This part of a compiler is often called the _backend_. The term is a bit |
15 |
| -> overloaded because in the compiler source, it usually refers to the "codegen |
16 |
| -> backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" |
17 |
| -> in this part, we are referring to the "codegen backend". |
| 16 | +> NOTE: This part of a compiler is often called the _backend_. |
| 17 | +> The term is a bit overloaded because in the compiler source, |
| 18 | +> it usually refers to the "codegen backend" (i.e. LLVM, Cranelift, or GCC). |
| 19 | +> Usually, when you see the word "backend" in this part, |
| 20 | +> we are referring to the "codegen backend". |
18 | 21 |
|
19 | 22 | So what do we need to do?
|
20 | 23 |
|
21 |
| -0. First, we need to collect the set of things to generate code for. In |
22 |
| - particular, we need to find out which concrete types to substitute for |
23 |
| - generic ones, since we need to generate code for the concrete types. |
24 |
| - Generating code for the concrete types (i.e. emitting a copy of the code for |
25 |
| - each concrete type) is called _monomorphization_, so the process of |
26 |
| - collecting all the concrete types is called _monomorphization collection_. |
| 24 | +0. First, we need to collect the set of things to generate code for. |
| 25 | + In particular, |
| 26 | + we need to find out which concrete types to substitute for generic ones, |
| 27 | + since we need to generate code for the concrete types. |
| 28 | + Generating code for the concrete types |
| 29 | + (i.e. emitting a copy of the code for each concrete type) is called _monomorphization_, |
| 30 | + so the process of collecting all the concrete types is called _monomorphization collection_. |
27 | 31 | 1. Next, we need to actually lower the MIR to a codegen IR
|
28 | 32 | (usually LLVM IR) for each concrete type we collected.
|
29 |
| -2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of |
30 |
| - optimization passes, generates executable code, and links together an |
31 |
| - executable binary. |
| 33 | +2. Finally, we need to invoke the codegen backend, |
| 34 | + which runs a bunch of optimization passes, |
| 35 | + generates executable code, |
| 36 | + and links together an executable binary. |
32 | 37 |
|
33 | 38 | [codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html
|
34 | 39 |
|
35 | 40 | The code for codegen is actually a bit complex due to a few factors:
|
36 | 41 |
|
37 |
| -- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much |
38 |
| - backend code between them as possible, so a lot of it is generic over the |
39 |
| - codegen implementation. This means that there are often a lot of layers of |
40 |
| - abstraction. |
| 42 | +- Support for multiple codegen backends (LLVM, Cranelift, and GCC). |
| 43 | + We try to share as much backend code between them as possible, |
| 44 | + so a lot of it is generic over the codegen implementation. |
| 45 | + This means that there are often a lot of layers of abstraction. |
41 | 46 | - Codegen happens asynchronously in another thread for performance.
|
42 |
| -- The actual codegen is done by a third-party library (either LLVM or Cranelift). |
| 47 | +- The actual codegen is done by a third-party library (either of the 3 backends). |
43 | 48 |
|
44 |
| -Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code |
45 |
| -(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm] |
46 |
| -crate contains code specific to LLVM codegen. |
| 49 | +Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code, |
| 50 | +while the [`rustc_codegen_llvm`][llvm] crate contains code specific to LLVM codegen. |
47 | 51 |
|
48 | 52 | [ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
|
49 | 53 | [llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html
|
50 | 54 |
|
51 | 55 | At a very high level, the entry point is
|
52 |
| -[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the |
53 |
| -process discussed in the rest of this chapter. |
54 |
| - |
| 56 | +[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. |
| 57 | +This function starts the process discussed in the rest of this chapter. |
0 commit comments