@@ -17,94 +17,6 @@ So first, let's look at what the compiler does to your code. For now, we will
17
17
avoid mentioning how the compiler implements these steps except as needed;
18
18
we'll talk about that later.
19
19
20
- - The compile process begins when a user writes a Rust source program in text
21
- and invokes the ` rustc ` compiler on it. The work that the compiler needs to
22
- perform is defined by command-line options. For example, it is possible to
23
- enable nightly features (` -Z ` flags), perform ` check ` -only builds, or emit
24
- LLVM-IR rather than executable machine code. The ` rustc ` executable call may
25
- be indirect through the use of ` cargo ` .
26
- - Command line argument parsing occurs in the [ ` rustc_driver ` ] . This crate
27
- defines the compile configuration that is requested by the user and passes it
28
- to the rest of the compilation process as a [ ` rustc_interface::Config ` ] .
29
- - The raw Rust source text is analyzed by a low-level lexer located in
30
- [ ` rustc_lexer ` ] . At this stage, the source text is turned into a stream of
31
- atomic source code units known as _ tokens_ . The lexer supports the
32
- Unicode character encoding.
33
- - The token stream passes through a higher-level lexer located in
34
- [ ` rustc_parse ` ] to prepare for the next stage of the compile process. The
35
- [ ` StringReader ` ] struct is used at this stage to perform a set of validations
36
- and turn strings into interned symbols (_ interning_ is discussed later).
37
- [ String interning] is a way of storing only one immutable
38
- copy of each distinct string value.
39
-
40
- - The lexer has a small interface and doesn't depend directly on the
41
- diagnostic infrastructure in ` rustc ` . Instead it provides diagnostics as plain
42
- data which are emitted in ` rustc_parse::lexer::mod ` as real diagnostics.
43
- - The lexer preserves full fidelity information for both IDEs and proc macros.
44
- - The parser [ translates the token stream from the lexer into an Abstract Syntax
45
- Tree (AST)] [ parser ] . It uses a recursive descent (top-down) approach to syntax
46
- analysis. The crate entry points for the parser are the
47
- [ ` Parser::parse_crate_mod() ` ] [ parse_crate_mod ] and [ ` Parser::parse_mod() ` ] [ parse_mod ]
48
- methods found in [ ` rustc_parse::parser::Parser ` ] . The external module parsing
49
- entry point is [ ` rustc_expand::module::parse_external_mod ` ] [ parse_external_mod ] .
50
- And the macro parser entry point is [ ` Parser::parse_nonterminal() ` ] [ parse_nonterminal ] .
51
- - Parsing is performed with a set of ` Parser ` utility methods including ` fn bump ` ,
52
- ` fn check ` , ` fn eat ` , ` fn expect ` , ` fn look_ahead ` .
53
- - Parsing is organized by the semantic construct that is being parsed. Separate
54
- ` parse_* ` methods can be found in [ ` rustc_parse ` ` parser ` ] [ rustc_parse_parser_dir ]
55
- directory. The source file name follows the construct name. For example, the
56
- following files are found in the parser:
57
- - ` expr.rs `
58
- - ` pat.rs `
59
- - ` ty.rs `
60
- - ` stmt.rs `
61
- - This naming scheme is used across many compiler stages. You will find
62
- either a file or directory with the same name across the parsing, lowering,
63
- type checking, THIR lowering, and MIR building sources.
64
- - Macro expansion, AST validation, name resolution, and early linting takes place
65
- during this stage of the compile process.
66
- - The parser uses the standard ` DiagnosticBuilder ` API for error handling, but we
67
- try to recover, parsing a superset of Rust's grammar, while also emitting an error.
68
- - ` rustc_ast::ast::{Crate, Mod, Expr, Pat, ...} ` AST nodes are returned from the parser.
69
- - We then take the AST and [ convert it to High-Level Intermediate
70
- Representation (HIR)] [ hir ] . This is a compiler-friendly representation of the
71
- AST. This involves a lot of desugaring of things like loops and ` async fn ` .
72
- - We use the HIR to do [ type inference] (the process of automatic
73
- detection of the type of an expression), [ trait solving] (the process
74
- of pairing up an impl with each reference to a trait), and [ type
75
- checking] (the process of converting the types found in the HIR
76
- (` hir::Ty ` ), which represent the syntactic things that the user wrote,
77
- into the internal representation used by the compiler (` Ty<'tcx> ` ),
78
- and using that information to verify the type safety, correctness and
79
- coherence of the types used in the program).
80
- - The HIR is then [ lowered to Mid-Level Intermediate Representation (MIR)] [ mir ] .
81
- - Along the way, we construct the THIR, which is an even more desugared HIR.
82
- THIR is used for pattern and exhaustiveness checking. It is also more
83
- convenient to convert into MIR than HIR is.
84
- - The MIR is used for [ borrow checking] .
85
- - We (want to) do [ many optimizations on the MIR] [ mir-opt ] because it is still
86
- generic and that improves the code we generate later, improving compilation
87
- speed too.
88
- - MIR is a higher level (and generic) representation, so it is easier to do
89
- some optimizations at MIR level than at LLVM-IR level. For example LLVM
90
- doesn't seem to be able to optimize the pattern the [ ` simplify_try ` ] mir
91
- opt looks for.
92
- - Rust code is _ monomorphized_ , which means making copies of all the generic
93
- code with the type parameters replaced by concrete types. To do
94
- this, we need to collect a list of what concrete types to generate code for.
95
- This is called _ monomorphization collection_ .
96
- - We then begin what is vaguely called _ code generation_ or _ codegen_ .
97
- - The [ code generation stage (codegen)] [ codegen ] is when higher level
98
- representations of source are turned into an executable binary. ` rustc `
99
- uses LLVM for code generation. The first step is to convert the MIR
100
- to LLVM Intermediate Representation (LLVM IR). This is where the MIR
101
- is actually monomorphized, according to the list we created in the
102
- previous step.
103
- - The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
104
- It then emits machine code. It is basically assembly code with additional
105
- low-level types and annotations added. (e.g. an ELF object or wasm).
106
- - The different libraries/binaries are linked together to produce the final
107
- binary.
108
20
### Invocation
109
21
110
22
Compilation begins when a user writes a Rust source program in text
@@ -231,9 +143,9 @@ binary.
231
143
[ `rustc_parse` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
232
144
[ parser ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
233
145
[ hir ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html
234
- [ type inference ] : https://rustc-dev-guide.rust-lang.org/type-inference.html
235
- [ trait solving ] : https://rustc-dev-guide.rust-lang.org/traits/resolution.html
236
- [ type checking ] : https://rustc-dev-guide.rust-lang.org/type-checking.html
146
+ [ * type inference* ] : https://rustc-dev-guide.rust-lang.org/type-inference.html
147
+ [ * trait solving* ] : https://rustc-dev-guide.rust-lang.org/traits/resolution.html
148
+ [ * type checking* ] : https://rustc-dev-guide.rust-lang.org/type-checking.html
237
149
[ mir ] : https://rustc-dev-guide.rust-lang.org/mir/index.html
238
150
[ borrow checking ] : https://rustc-dev-guide.rust-lang.org/borrow_check.html
239
151
[ mir-opt ] : https://rustc-dev-guide.rust-lang.org/mir/optimizations.html
@@ -245,6 +157,8 @@ binary.
245
157
[ `rustc_parse::parser::Parser` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
246
158
[ parse_external_mod ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html
247
159
[ rustc_parse_parser_dir ] : https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser
160
+ [ `hir::Ty` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html
161
+ [ `Ty<'tcx>` ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
248
162
249
163
## How it does it
250
164
@@ -405,7 +319,7 @@ on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is th
405
319
406
320
Also note that the ` rustc_middle::ty ` module defines the ` TyCtxt ` struct we mentioned before.
407
321
408
- [ ty ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type .Ty.html
322
+ [ ty ] : https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct .Ty.html
409
323
410
324
### Parallelism
411
325
@@ -439,6 +353,7 @@ For more details on bootstrapping, see
439
353
[ _bootstrapping_ ] : https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
440
354
[ rustc-bootstrap ] : building/bootstrapping.md
441
355
356
+ <!--
442
357
# Unresolved Questions
443
358
444
359
- Does LLVM ever do optimizations in debug builds?
@@ -448,7 +363,8 @@ For more details on bootstrapping, see
448
363
- What is the main source entry point for `X`?
449
364
- Where do phases diverge for cross-compilation to machine code across
450
365
different platforms?
451
-
366
+ -->
367
+
452
368
# References
453
369
454
370
- Command line parsing
0 commit comments