|
| 1 | +# High-level overview of the compiler source |
| 2 | + |
| 3 | +## Crate structure |
| 4 | + |
| 5 | +The main Rust repository consists of a `src` directory, under which |
| 6 | +there live many crates. These crates contain the sources for the |
| 7 | +standard library and the compiler. This document, of course, focuses |
| 8 | +on the latter. |
| 9 | + |
| 10 | +Rustc consists of a number of crates, including `syntax`, |
| 11 | +`rustc`, `rustc_back`, `rustc_trans`, `rustc_driver`, and |
| 12 | +many more. The source for each crate can be found in a directory |
| 13 | +like `src/libXXX`, where `XXX` is the crate name. |
| 14 | + |
| 15 | +(NB. The names and divisions of these crates are not set in |
| 16 | +stone and may change over time -- for the time being, we tend towards |
| 17 | +a finer-grained division to help with compilation time, though as |
| 18 | +incremental improves that may change.) |
| 19 | + |
| 20 | +The dependency structure of these crates is roughly a diamond: |
| 21 | + |
| 22 | +``` |
| 23 | + rustc_driver |
| 24 | + / | \ |
| 25 | + / | \ |
| 26 | + / | \ |
| 27 | + / v \ |
| 28 | +rustc_trans rustc_borrowck ... rustc_metadata |
| 29 | + \ | / |
| 30 | + \ | / |
| 31 | + \ | / |
| 32 | + \ v / |
| 33 | + rustc |
| 34 | + | |
| 35 | + v |
| 36 | + syntax |
| 37 | + / \ |
| 38 | + / \ |
| 39 | + syntax_pos syntax_ext |
| 40 | +``` |
| 41 | + |
| 42 | +The `rustc_driver` crate, at the top of this lattice, is effectively |
| 43 | +the "main" function for the rust compiler. It doesn't have much "real |
| 44 | +code", but instead ties together all of the code defined in the other |
| 45 | +crates and defines the overall flow of execution. (As we transition |
| 46 | +more and more to the [query model](ty/maps/README.md), however, the |
| 47 | +"flow" of compilation is becoming less centrally defined.) |
| 48 | + |
| 49 | +At the other extreme, the `rustc` crate defines the common and |
| 50 | +pervasive data structures that all the rest of the compiler uses |
| 51 | +(e.g., how to represent types, traits, and the program itself). It |
| 52 | +also contains some amount of the compiler itself, although that is |
| 53 | +relatively limited. |
| 54 | + |
| 55 | +Finally, all the crates in the bulge in the middle define the bulk of |
| 56 | +the compiler -- they all depend on `rustc`, so that they can make use |
| 57 | +of the various types defined there, and they export public routines |
| 58 | +that `rustc_driver` will invoke as needed (more and more, what these |
| 59 | +crates export are "query definitions", but those are covered later |
| 60 | +on). |
| 61 | + |
| 62 | +Below `rustc` lie various crates that make up the parser and error |
| 63 | +reporting mechanism. For historical reasons, these crates do not have |
| 64 | +the `rustc_` prefix, but they are really just as much an internal part |
| 65 | +of the compiler and not intended to be stable (though they do wind up |
| 66 | +getting used by some crates in the wild; a practice we hope to |
| 67 | +gradually phase out). |
| 68 | + |
| 69 | +Each crate has a `README.md` file that describes, at a high-level, |
| 70 | +what it contains, and tries to give some kind of explanation (some |
| 71 | +better than others). |
| 72 | + |
| 73 | +## The main stages of compilation |
| 74 | + |
| 75 | +The Rust compiler is in a bit of transition right now. It used to be a |
| 76 | +purely "pass-based" compiler, where we ran a number of passes over the |
| 77 | +entire program, and each did a particular check of transformation. We |
| 78 | +are gradually replacing this pass-based code with an alternative setup |
| 79 | +based on on-demand **queries**. In the query-model, we work backwards, |
| 80 | +executing a *query* that expresses our ultimate goal (e.g., "compile |
| 81 | +this crate"). This query in turn may make other queries (e.g., "get me |
| 82 | +a list of all modules in the crate"). Those queries make other queries |
| 83 | +that ultimately bottom out in the base operations, like parsing the |
| 84 | +input, running the type-checker, and so forth. This on-demand model |
| 85 | +permits us to do exciting things like only do the minimal amount of |
| 86 | +work needed to type-check a single function. It also helps with |
| 87 | +incremental compilation. (For details on defining queries, check out |
| 88 | +`src/librustc/ty/maps/README.md`.) |
| 89 | + |
| 90 | +Regardless of the general setup, the basic operations that the |
| 91 | +compiler must perform are the same. The only thing that changes is |
| 92 | +whether these operations are invoked front-to-back, or on demand. In |
| 93 | +order to compile a Rust crate, these are the general steps that we |
| 94 | +take: |
| 95 | + |
| 96 | +1. **Parsing input** |
| 97 | + - this processes the `.rs` files and produces the AST ("abstract syntax tree") |
| 98 | + - the AST is defined in `syntax/ast.rs`. It is intended to match the lexical |
| 99 | + syntax of the Rust language quite closely. |
| 100 | +2. **Name resolution, macro expansion, and configuration** |
| 101 | + - once parsing is complete, we process the AST recursively, resolving paths |
| 102 | + and expanding macros. This same process also processes `#[cfg]` nodes, and hence |
| 103 | + may strip things out of the AST as well. |
| 104 | +3. **Lowering to HIR** |
| 105 | + - Once name resolution completes, we convert the AST into the HIR, |
| 106 | + or "high-level IR". The HIR is defined in `src/librustc/hir/`; that module also includes |
| 107 | + the lowering code. |
| 108 | + - The HIR is a lightly desugared variant of the AST. It is more processed than the |
| 109 | + AST and more suitable for the analyses that follow. It is **not** required to match |
| 110 | + the syntax of the Rust language. |
| 111 | + - As a simple example, in the **AST**, we preserve the parentheses |
| 112 | + that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse |
| 113 | + into distinct trees, even though they are equivalent. In the |
| 114 | + HIR, however, parentheses nodes are removed, and those two |
| 115 | + expressions are represented in the same way. |
| 116 | +3. **Type-checking and subsequent analyses** |
| 117 | + - An important step in processing the HIR is to perform type |
| 118 | + checking. This process assigns types to every HIR expression, |
| 119 | + for example, and also is responsible for resolving some |
| 120 | + "type-dependent" paths, such as field accesses (`x.f` -- we |
| 121 | + can't know what field `f` is being accessed until we know the |
| 122 | + type of `x`) and associated type references (`T::Item` -- we |
| 123 | + can't know what type `Item` is until we know what `T` is). |
| 124 | + - Type checking creates "side-tables" (`TypeckTables`) that include |
| 125 | + the types of expressions, the way to resolve methods, and so forth. |
| 126 | + - After type-checking, we can do other analyses, such as privacy checking. |
| 127 | +4. **Lowering to MIR and post-processing** |
| 128 | + - Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which |
| 129 | + is a **very** desugared version of Rust, well suited to the borrowck but also |
| 130 | + certain high-level optimizations. |
| 131 | +5. **Translation to LLVM and LLVM optimizations** |
| 132 | + - From MIR, we can produce LLVM IR. |
| 133 | + - LLVM then runs its various optimizations, which produces a number of `.o` files |
| 134 | + (one for each "codegen unit"). |
| 135 | +6. **Linking** |
| 136 | + - Finally, those `.o` files are linked together. |
| 137 | + |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | +The first thing you may wonder if |
0 commit comments