You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/overview.md
+66-66
Original file line number
Diff line number
Diff line change
@@ -19,16 +19,16 @@ we'll talk about that later.
19
19
20
20
**TODO: someone else should confirm this vvv**
21
21
22
-
-User writes a program and invokes `rustc` on it (possibly through `cargo`).
23
-
-First, we parse command line flags, etc. This is done in [`librustc_driver`].
24
-
We now know what the exact work is we need to do (e.g. which nightly features
25
-
are enabled, whether we are doing a `check`-only build or emiting LLVM-IR or
26
-
a full compilation).
27
-
- Then, we start to do compilation...
28
-
- We first [_lex_ the user program][lex]. This turns the program into a stream
29
-
of _tokens_ (yes, the same sort of tokens as `proc_macros` (sort of)).
30
-
[`StringReader`] from [`librustc_parse`] integrates [`librustc_lexer`] with
31
-
`rustc` data structures.
22
+
-The compile process begins when a user writes a Rust source program in text and invokes the `rustc`compiler on it. The work that the compiler needs to perform is defined with command line options. For example, it is possible to optionally enable nightly features, perform `check`-only builds, or emit LLVM-IR rather than complete the entire compile process defined here. The `rustc` executable call may be indirect through the use of `cargo`.
23
+
-Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user.
24
+
- The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?)
25
+
- The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols.
26
+
- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST).
27
+
- macro expansion (**TODO** chrissimpkins)
28
+
- ast validation (**TODO** chrissimpkins)
29
+
- nameres (**TODO** chrissimpkins)
30
+
- early linting (**TODO** chrissimpkins)
31
+
32
32
- We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax
33
33
Tree (AST).
34
34
- We then take the AST and [convert it to High-Level Intermediate
@@ -45,27 +45,27 @@ we'll talk about that later.
45
45
- We (want to) do [many optimizations on the MIR][mir-opt] because it is still
46
46
generic and that improves the code we generate later, improving compilation
47
47
speed too. (**TODO: size optimizations too?**)
48
-
- MIR is a higher level (and generic) representation, so it is easier to do
49
-
some optimizations at MIR level than at LLVM-IR level. For example LLVM
50
-
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
51
-
opt looks for.
48
+
- MIR is a higher level (and generic) representation, so it is easier to do
49
+
some optimizations at MIR level than at LLVM-IR level. For example LLVM
50
+
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
51
+
opt looks for.
52
52
- Rust code is _monomorphized_, which means making copies of all the generic
53
53
code with the type parameters replaced by concrete types. To do
54
54
this, we need to collect a list of what concrete types to generate code for.
55
55
This is called _monomorphization collection_.
56
56
- We then begin what is vaguely called _code generation_ or _codegen_.
57
-
- The [code generation stage (codegen)][codegen] is when higher level
58
-
representations of source are turned into an executable binary. `rustc`
57
+
- The [code generation stage (codegen)][codegen] is when higher level
58
+
representations of source are turned into an executable binary. `rustc`
59
59
uses LLVM for code generation. The first step is the MIR is then
60
-
converted to LLVM Intermediate Representation (LLVM IR). This is where
61
-
the MIR is actually monomorphized, according to the list we created in
62
-
the previous step.
63
-
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
64
-
It then emits machine code. It is basically assembly code with additional
65
-
low-level types and annotations added. (e.g. an ELF object or wasm).
66
-
**TODO: reference for this section?**
67
-
- The different libraries/binaries are linked together to produce the final
68
-
binary. **TODO: reference for this section?**
60
+
converted to LLVM Intermediate Representation (LLVM IR). This is where
61
+
the MIR is actually monomorphized, according to the list we created in
62
+
the previous step.
63
+
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
64
+
It then emits machine code. It is basically assembly code with additional
65
+
low-level types and annotations added. (e.g. an ELF object or wasm).
66
+
**TODO: reference for this section?**
67
+
- The different libraries/binaries are linked together to produce the final
0 commit comments