Skip to content

Commit a0b0d56

Browse files
committed
remaining edits
1 parent 0189dcc commit a0b0d56

File tree

1 file changed

+9
-93
lines changed

1 file changed

+9
-93
lines changed

src/overview.md

+9-93
Original file line numberDiff line numberDiff line change
@@ -17,94 +17,6 @@ So first, let's look at what the compiler does to your code. For now, we will
1717
avoid mentioning how the compiler implements these steps except as needed;
1818
we'll talk about that later.
1919

20-
- The compile process begins when a user writes a Rust source program in text
21-
and invokes the `rustc` compiler on it. The work that the compiler needs to
22-
perform is defined by command-line options. For example, it is possible to
23-
enable nightly features (`-Z` flags), perform `check`-only builds, or emit
24-
LLVM-IR rather than executable machine code. The `rustc` executable call may
25-
be indirect through the use of `cargo`.
26-
- Command line argument parsing occurs in the [`rustc_driver`]. This crate
27-
defines the compile configuration that is requested by the user and passes it
28-
to the rest of the compilation process as a [`rustc_interface::Config`].
29-
- The raw Rust source text is analyzed by a low-level lexer located in
30-
[`rustc_lexer`]. At this stage, the source text is turned into a stream of
31-
atomic source code units known as _tokens_. The lexer supports the
32-
Unicode character encoding.
33-
- The token stream passes through a higher-level lexer located in
34-
[`rustc_parse`] to prepare for the next stage of the compile process. The
35-
[`StringReader`] struct is used at this stage to perform a set of validations
36-
and turn strings into interned symbols (_interning_ is discussed later).
37-
[String interning] is a way of storing only one immutable
38-
copy of each distinct string value.
39-
40-
- The lexer has a small interface and doesn't depend directly on the
41-
diagnostic infrastructure in `rustc`. Instead it provides diagnostics as plain
42-
data which are emitted in `rustc_parse::lexer::mod` as real diagnostics.
43-
- The lexer preserves full fidelity information for both IDEs and proc macros.
44-
- The parser [translates the token stream from the lexer into an Abstract Syntax
45-
Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax
46-
analysis. The crate entry points for the parser are the
47-
[`Parser::parse_crate_mod()`][parse_crate_mod] and [`Parser::parse_mod()`][parse_mod]
48-
methods found in [`rustc_parse::parser::Parser`]. The external module parsing
49-
entry point is [`rustc_expand::module::parse_external_mod`][parse_external_mod].
50-
And the macro parser entry point is [`Parser::parse_nonterminal()`][parse_nonterminal].
51-
- Parsing is performed with a set of `Parser` utility methods including `fn bump`,
52-
`fn check`, `fn eat`, `fn expect`, `fn look_ahead`.
53-
- Parsing is organized by the semantic construct that is being parsed. Separate
54-
`parse_*` methods can be found in [`rustc_parse` `parser`][rustc_parse_parser_dir]
55-
directory. The source file name follows the construct name. For example, the
56-
following files are found in the parser:
57-
- `expr.rs`
58-
- `pat.rs`
59-
- `ty.rs`
60-
- `stmt.rs`
61-
- This naming scheme is used across many compiler stages. You will find
62-
either a file or directory with the same name across the parsing, lowering,
63-
type checking, THIR lowering, and MIR building sources.
64-
- Macro expansion, AST validation, name resolution, and early linting takes place
65-
during this stage of the compile process.
66-
- The parser uses the standard `DiagnosticBuilder` API for error handling, but we
67-
try to recover, parsing a superset of Rust's grammar, while also emitting an error.
68-
- `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST nodes are returned from the parser.
69-
- We then take the AST and [convert it to High-Level Intermediate
70-
Representation (HIR)][hir]. This is a compiler-friendly representation of the
71-
AST. This involves a lot of desugaring of things like loops and `async fn`.
72-
- We use the HIR to do [type inference] (the process of automatic
73-
detection of the type of an expression), [trait solving] (the process
74-
of pairing up an impl with each reference to a trait), and [type
75-
checking] (the process of converting the types found in the HIR
76-
(`hir::Ty`), which represent the syntactic things that the user wrote,
77-
into the internal representation used by the compiler (`Ty<'tcx>`),
78-
and using that information to verify the type safety, correctness and
79-
coherence of the types used in the program).
80-
- The HIR is then [lowered to Mid-Level Intermediate Representation (MIR)][mir].
81-
- Along the way, we construct the THIR, which is an even more desugared HIR.
82-
THIR is used for pattern and exhaustiveness checking. It is also more
83-
convenient to convert into MIR than HIR is.
84-
- The MIR is used for [borrow checking].
85-
- We (want to) do [many optimizations on the MIR][mir-opt] because it is still
86-
generic and that improves the code we generate later, improving compilation
87-
speed too.
88-
- MIR is a higher level (and generic) representation, so it is easier to do
89-
some optimizations at MIR level than at LLVM-IR level. For example LLVM
90-
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
91-
opt looks for.
92-
- Rust code is _monomorphized_, which means making copies of all the generic
93-
code with the type parameters replaced by concrete types. To do
94-
this, we need to collect a list of what concrete types to generate code for.
95-
This is called _monomorphization collection_.
96-
- We then begin what is vaguely called _code generation_ or _codegen_.
97-
- The [code generation stage (codegen)][codegen] is when higher level
98-
representations of source are turned into an executable binary. `rustc`
99-
uses LLVM for code generation. The first step is to convert the MIR
100-
to LLVM Intermediate Representation (LLVM IR). This is where the MIR
101-
is actually monomorphized, according to the list we created in the
102-
previous step.
103-
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
104-
It then emits machine code. It is basically assembly code with additional
105-
low-level types and annotations added. (e.g. an ELF object or wasm).
106-
- The different libraries/binaries are linked together to produce the final
107-
binary.
10820
### Invocation
10921

11022
Compilation begins when a user writes a Rust source program in text
@@ -231,9 +143,9 @@ binary.
231143
[`rustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
232144
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
233145
[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html
234-
[type inference]: https://rustc-dev-guide.rust-lang.org/type-inference.html
235-
[trait solving]: https://rustc-dev-guide.rust-lang.org/traits/resolution.html
236-
[type checking]: https://rustc-dev-guide.rust-lang.org/type-checking.html
146+
[*type inference*]: https://rustc-dev-guide.rust-lang.org/type-inference.html
147+
[*trait solving*]: https://rustc-dev-guide.rust-lang.org/traits/resolution.html
148+
[*type checking*]: https://rustc-dev-guide.rust-lang.org/type-checking.html
237149
[mir]: https://rustc-dev-guide.rust-lang.org/mir/index.html
238150
[borrow checking]: https://rustc-dev-guide.rust-lang.org/borrow_check.html
239151
[mir-opt]: https://rustc-dev-guide.rust-lang.org/mir/optimizations.html
@@ -245,6 +157,8 @@ binary.
245157
[`rustc_parse::parser::Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
246158
[parse_external_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html
247159
[rustc_parse_parser_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser
160+
[`hir::Ty`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html
161+
[`Ty<'tcx>`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
248162

249163
## How it does it
250164

@@ -405,7 +319,7 @@ on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is th
405319

406320
Also note that the `rustc_middle::ty` module defines the `TyCtxt` struct we mentioned before.
407321

408-
[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.Ty.html
322+
[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
409323

410324
### Parallelism
411325

@@ -439,6 +353,7 @@ For more details on bootstrapping, see
439353
[_bootstrapping_]: https://en.wikipedia.org/wiki/Bootstrapping_(compilers)
440354
[rustc-bootstrap]: building/bootstrapping.md
441355

356+
<!--
442357
# Unresolved Questions
443358
444359
- Does LLVM ever do optimizations in debug builds?
@@ -448,7 +363,8 @@ For more details on bootstrapping, see
448363
- What is the main source entry point for `X`?
449364
- Where do phases diverge for cross-compilation to machine code across
450365
different platforms?
451-
366+
-->
367+
452368
# References
453369

454370
- Command line parsing

0 commit comments

Comments
 (0)