Skip to content

Commit c405ac3

Browse files
committed
[overview.md] Add command line argument parsing, lexer stages, and parser outline
1 parent 92bd7c6 commit c405ac3

File tree

1 file changed

+66
-66
lines changed

1 file changed

+66
-66
lines changed

src/overview.md

+66-66
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,16 @@ we'll talk about that later.
1919

2020
**TODO: someone else should confirm this vvv**
2121

22-
- User writes a program and invokes `rustc` on it (possibly through `cargo`).
23-
- First, we parse command line flags, etc. This is done in [`librustc_driver`].
24-
We now know what the exact work is we need to do (e.g. which nightly features
25-
are enabled, whether we are doing a `check`-only build or emiting LLVM-IR or
26-
a full compilation).
27-
- Then, we start to do compilation...
28-
- We first [_lex_ the user program][lex]. This turns the program into a stream
29-
of _tokens_ (yes, the same sort of tokens as `proc_macros` (sort of)).
30-
[`StringReader`] from [`librustc_parse`] integrates [`librustc_lexer`] with
31-
`rustc` data structures.
22+
- The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined with command line options. For example, it is possible to optionally enable nightly features, perform `check`-only builds, or emit LLVM-IR rather than complete the entire compile process defined here. The `rustc` executable call may be indirect through the use of `cargo`.
23+
- Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user.
24+
- The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?)
25+
- The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols.
26+
- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST).
27+
- macro expansion (**TODO** chrissimpkins)
28+
- ast validation (**TODO** chrissimpkins)
29+
- nameres (**TODO** chrissimpkins)
30+
- early linting (**TODO** chrissimpkins)
31+
3232
- We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax
3333
Tree (AST).
3434
- We then take the AST and [convert it to High-Level Intermediate
@@ -45,27 +45,27 @@ we'll talk about that later.
4545
- We (want to) do [many optimizations on the MIR][mir-opt] because it is still
4646
generic and that improves the code we generate later, improving compilation
4747
speed too. (**TODO: size optimizations too?**)
48-
- MIR is a higher level (and generic) representation, so it is easier to do
49-
some optimizations at MIR level than at LLVM-IR level. For example LLVM
50-
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
51-
opt looks for.
48+
- MIR is a higher level (and generic) representation, so it is easier to do
49+
some optimizations at MIR level than at LLVM-IR level. For example LLVM
50+
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
51+
opt looks for.
5252
- Rust code is _monomorphized_, which means making copies of all the generic
5353
code with the type parameters replaced by concrete types. To do
5454
this, we need to collect a list of what concrete types to generate code for.
5555
This is called _monomorphization collection_.
5656
- We then begin what is vaguely called _code generation_ or _codegen_.
57-
- The [code generation stage (codegen)][codegen] is when higher level
58-
representations of source are turned into an executable binary. `rustc`
57+
- The [code generation stage (codegen)][codegen] is when higher level
58+
representations of source are turned into an executable binary. `rustc`
5959
uses LLVM for code generation. The first step is the MIR is then
60-
converted to LLVM Intermediate Representation (LLVM IR). This is where
61-
the MIR is actually monomorphized, according to the list we created in
62-
the previous step.
63-
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
64-
It then emits machine code. It is basically assembly code with additional
65-
low-level types and annotations added. (e.g. an ELF object or wasm).
66-
**TODO: reference for this section?**
67-
- The different libraries/binaries are linked together to produce the final
68-
binary. **TODO: reference for this section?**
60+
converted to LLVM Intermediate Representation (LLVM IR). This is where
61+
the MIR is actually monomorphized, according to the list we created in
62+
the previous step.
63+
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
64+
It then emits machine code. It is basically assembly code with additional
65+
low-level types and annotations added. (e.g. an ELF object or wasm).
66+
**TODO: reference for this section?**
67+
- The different libraries/binaries are linked together to produce the final
68+
binary. **TODO: reference for this section?**
6969

7070
[`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
7171
[`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html
@@ -90,12 +90,12 @@ satisfy/optimize for. For example,
9090

9191
- Compilation speed: how fast is it to compile a program. More/better
9292
compile-time analyses often means compilation is slower.
93-
- Also, we want to support incremental compilation, so we need to take that
94-
into account. How can we keep track of what work needs to be redone and
95-
what can be reused if the user modifies their program?
96-
- Also we can't store too much stuff in the incremental cache because
97-
it would take a long time to load from disk and it could take a lot
98-
of space on the user's system...
93+
- Also, we want to support incremental compilation, so we need to take that
94+
into account. How can we keep track of what work needs to be redone and
95+
what can be reused if the user modifies their program?
96+
- Also we can't store too much stuff in the incremental cache because
97+
it would take a long time to load from disk and it could take a lot
98+
of space on the user's system...
9999
- Compiler memory usage: while compiling a program, we don't want to use more
100100
memory than we need.
101101
- Program speed: how fast is your compiled program. More/better compile-time
@@ -277,46 +277,46 @@ but there are already some promising performance improvements.
277277
# References
278278

279279
- Command line parsing
280-
- Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html)
281-
- Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/)
282-
- Main entry point: **TODO**
280+
- Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html)
281+
- Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/)
282+
- Main entry point: **TODO**
283283
- Lexical Analysis: Lex the user program to a stream of tokens
284-
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
285-
- Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html)
286-
- Main entry point: **TODO**
284+
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
285+
- Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html)
286+
- Main entry point: **TODO**
287287
- Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST)
288-
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
289-
- Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html)
290-
- Main entry point: **TODO**
291-
- AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
288+
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
289+
- Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html)
290+
- Main entry point: **TODO**
291+
- AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
292292
- The High Level Intermediate Representation (HIR)
293-
- Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html)
294-
- Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir)
295-
- Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map)
296-
- Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html)
297-
- How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree`
298-
- Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
299-
- Main entry point: **TODO**
293+
- Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html)
294+
- Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir)
295+
- Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map)
296+
- Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html)
297+
- How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree`
298+
- Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
299+
- Main entry point: **TODO**
300300
- Type Inference
301-
- Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html)
302-
- Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics)
303-
- Main entry point: **TODO**
301+
- Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html)
302+
- Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics)
303+
- Main entry point: **TODO**
304304
- The Mid Level Intermediate Representation (MIR)
305-
- Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html)
306-
- Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir)
307-
- Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir)
308-
- Main entry point: **TODO**
305+
- Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html)
306+
- Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir)
307+
- Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir)
308+
- Main entry point: **TODO**
309309
- The Borrow Checker
310-
- Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html)
311-
- Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html)
312-
- Main entry point: **TODO**
310+
- Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html)
311+
- Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html)
312+
- Main entry point: **TODO**
313313
- MIR Optimizations
314-
- Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html)
315-
- Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?**
316-
- Main entry point: **TODO**
314+
- Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html)
315+
- Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?**
316+
- Main entry point: **TODO**
317317
- Code Generation
318-
- Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html)
319-
- Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet**
320-
- Generating Machine Code from LLVM IR with LLVM - **TODO: reference?**
321-
- Main entry point MIR -> LLVM IR: **TODO**
322-
- Main entry point LLVM IR -> Machine Code **TODO**
318+
- Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html)
319+
- Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet**
320+
- Generating Machine Code from LLVM IR with LLVM - **TODO: reference?**
321+
- Main entry point MIR -> LLVM IR: **TODO**
322+
- Main entry point LLVM IR -> Machine Code **TODO**

0 commit comments

Comments
 (0)