From 44d3fc92a4004c4a1557f0be601514c01f871074 Mon Sep 17 00:00:00 2001 From: mark Date: Wed, 25 Mar 2020 23:26:17 -0500 Subject: [PATCH 01/38] add overview --- src/SUMMARY.md | 1 - src/overview.md | 326 +++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 325 insertions(+), 2 deletions(-) diff --git a/src/SUMMARY.md b/src/SUMMARY.md index d6cbe0645..28abd8d65 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -33,7 +33,6 @@ - ["Cleanup Crew" ICE-breakers](ice-breaker/cleanup-crew.md) - [LLVM ICE-breakers](ice-breaker/llvm.md) - [Licenses](./licenses.md) - - [Part 2: High-level Compiler Architecture](./part-2-intro.md) - [Overview of the Compiler](./overview.md) - [The compiler source code](./compiler-src.md) diff --git a/src/overview.md b/src/overview.md index 140d2f352..546f0ad13 100644 --- a/src/overview.md +++ b/src/overview.md @@ -1,3 +1,327 @@ # Overview of the Compiler -Coming soon! Work is in progress on this chapter. See https://github.com/rust-lang/rustc-dev-guide/pull/633 for the source and the [project README](https://github.com/rust-lang/rustc-dev-guide) for local build instructions. +This chapter is about the overall process of compiling a program -- how +everything fits together. + +The rust compiler is special in two ways: it does things to your code that +other compilers don't do (e.g. borrow checking) and it has a lot of +unconventional implementation choices (e.g. queries). We will talk about these +in turn in this chapter, and in the rest of the guide, we will look at all the +individual pieces in more detail. + +## What the compiler does to your code + +So first, let's look at what the compiler does to your code. For now, we will +avoid mentioning how the compiler implements these steps except as needed; +we'll talk about that later. + +**TODO: Would be great to have a diagram of this once we nail down the details...** + +**TODO: someone else should confirm this vvv** + +- User writes a program and invokes `rustc` on it (possibly through `cargo`). +- First, we parse command line flags, etc. This is done in [`librustc_driver`]. + We now know what the exact work is we need to do (e.g. which nightly features + are enabled, whether we are doing a `check`-only build or emiting LLVM-IR or + a full compilation). +- Then, we start to do compilation... +- We first [_lex_ the user program][lex]. This turns the program into a stream + of _tokens_ (yes, the same sort of tokens as `proc_macros` (sort of)). + [`StringReader`] from [`librustc_parse`] integrates [`librustc_lexer`] with + `rustc` data structures. +- We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax + Tree (AST). +- We then take the AST and [convert it to High-Level Intermediate + Representation (HIR)][hir]. This is a compiler-friendly representation of the + AST. This involves a lot of desugaring of things like loops and `async fn`. +- We use the HIR to do [type inference]. This is the process of automatic + detection of the type of an expression. **TODO: how `ty` module fits in + here** +- **TODO: Maybe some other things are done here? I think initial type checking + happens here? And trait solving?** +- The HIR is then [lowered to Mid-Level Intermediate Representation (MIR)][mir]. +- The MIR is used for [borrow checking]. +- **TODO: const eval fits in somewhere here I think** +- We (want to) do [many optimizations on the MIR][mir-opt] because it is still + generic and that improves the code we generate later, improving compilation + speed too. (**TODO: size optimizations too?**) + - MIR is a higher level (and generic) representation, so it is easier to do + some optimizations at MIR level than at LLVM-IR level. For example LLVM + doesn't seem to be able to optimize the pattern the [`simplify_try`] mir + opt looks for. +- Rust code is _monomorphized_, which means making copies of all the generic + code with the type parameters replaced by concrete types. In order to do + this, we need to collect a list of what concrete types to generate code for. + This is called _monomorphization collection_. +- We then begin what is vaguely called _code generation_ or _codegen_. + - The [code generation stage (codegen)][codegen] is when higher level + representations of source are turned into an executable binary. `rustc` + uses LLVM for code generation. The first step is the MIR is then + converted to LLVM Intermediate Representation (LLVM IR). This is where + the MIR is actually monomorphized, according to the list we created in + the previous step. + - The LLVM IR is passed to LLVM, which does a lot more optimizations on it. + It then emits machine code. It is basically assembly code with additional + low-level types and annotations added. (e.g. an ELF object or wasm). + **TODO: reference for this section?** + - The different libraries/binaries are linked together to produce the final + binary. **TODO: reference for this section?** + +[`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html +[`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html +[lex]: https://rust-lang.github.io/rustc-guide/the-parser.html +[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html +[`librustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html +[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html +[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html +[type inference]: https://rust-lang.github.io/rustc-guide/type-inference.html +[mir]: https://rust-lang.github.io/rustc-guide/mir/index.html +[borrow checker]: https://rust-lang.github.io/rustc-guide/borrow_check.html +[mir-opt]: https://rust-lang.github.io/rustc-guide/mir/optimizations.html +[`simplify_try`]: https://github.com/rust-lang/rust/pull/66282 +[codegen]: https://rust-lang.github.io/rustc-guide/codegen.html + +## How it does it + +Ok, so now that we have a high-level view of what the compiler does to your +code, let's take a high-level view of _how_ it does all that stuff. There are a +lot of constraints and conflicting goals that the compiler needs to +satisfy/optimize for. For example, + +- Compilation speed: how fast is it to compile a program. More/better + compile-time analyses often means compilation is slower. + - Also, we want to support incremental compilation, so we need to take that + into account. How can we keep track of what work needs to be redone and + what can be reused if the user modifies their program? + - Also we can't store too much stuff in the incremental cache because + it would take a long time to load from disk and it could take a lot + of space on the user's system... +- Compiler memory usage: while compiling a program, we don't want to use more + memory than we need. +- Program speed: how fast is your compiled program. More/better compile-time + analyses often means the compiler can do better optimizations. +- Program size: how large is the compiled binary? Similar to the previous + point. +- Compiler compilation speed: how long does it take to compile the compiler? + This impacts contributors and compiler maintenance. +- Compiler implementation complexity: building a compiler is one of the hardest + things a person/group can do, and rust is not a very simple language, so how + do we make the compiler's code base manageable? +- Compiler correctness: the binaries produced by the compiler should do what + the input programs says they do, and should continue to do so despite the + tremendous amount of change constantly going on. +- Compiler integration: a number of other tools need to use the compiler in + various ways (e.g. cargo, clippy, miri, RLS) that must be supported. +- Compiler stability: the compiler should not crash or fail ungracefully on the + stable channel. +- Rust stability: the compiler must respect rust's stability guarantees by not + breaking programs that previously compiled despite the many changes that are + always going on to its implementation. +- Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some + strengths we leverage and some limitations/weaknesses we need to work around. +- And others that I'm probably forgetting. + +So, as you read through the rest of the guide, keep these things in mind. They +will often inform decisions that we make. + +### Constant change + +One thing to keep in mind is that `rustc` is a real production-quality product. +As such, it has its fair share of codebase churn and technical debt. A lot of +the designs discussed throughout this guide are idealized designs that are not +fully realized yet. And things keep changing so that it is hard to keep this +guide completely up to date on everything! + +The compiler definitely has rough edges, but because of its design it is able +to keep up with the requirements above. + +### Intermediate representations + +As with most compilers, `rustc` uses some intermediate representations (IRs) to +facilitate computations. In general, working directly with the source code is +extremely inconvenient. Source code is designed to be human-friendly while at +the same time being unambiguous, but it's less convenient for doing something +like, say, type checking. + +Instead most compilers, including `rustc`, build some sort of IR out of the +source code which is easier to analyze. `rustc` has a few IRs, each optimized +for different things: + +- Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream + of tokens produced by the lexer directly from the source code. It represents + pretty much exactly what the user wrote. It helps to do some syntactic sanity + checking (e.g. checking that a type is expected where the user wrote one). +- High-level IR (HIR): This is a sort of very desugared AST. It's still close + to what the user wrote syntactically, but it includes some implicit things + such as some elided lifetimes, etc. This IR is amenable to type checking. +- HAIR: This is an intermediate between HIR and MIR. This only exists to make + it easier to lower HIR to MIR. +- Middle-level IR (MIR): This IR is basically a Control-Flow Graph (CFG). A CFG + is a type of diagram that shows the basic blocks of a program and how control + flow can go between them. Likewise, MIR also has a bunch of basic blocks with + simple typed statements inside them (e.g. assignment, simple computations, + dropping values, etc). MIR is used for borrow checking and a bunch of other + important dataflow based checks, such as checking for uninitialized values. + It is also used for a bunch of optimizations and for constant evaluation (via + MIRI). Because MIR is still generic, we can do a lot of analyses here more + efficiently than after monomorphization. +- LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR + is basically a sort of typed assembly language with lots of annotations. It's + a standard format that is used by all compilers that use LLVM (e.g. the clang + C compiler also outputs LLVM IR). LLVM IR is designed to be easy for other + compilers to emit and also rich enough for LLVM to run a bunch of + optimizations on it. + +### Queries + +The first big implementation choice is the _query_ system. The rust compiler +uses a query system which is unlike most textbook compilers, which are +organized as a series of passes over the code that execute sequentially. The +compiler does this to make incremental compilation possible -- that is, if the +user makes a change to their program and recompiles, we want to do as little +redundant work as possible to produce the new binary. + +In rustc, all the major steps above are organized as a bunch of queries that +call each other. For example, there is a query to ask for the type of something +and another to ask for the optimized MIR of a function, and so on. These +queries can call each other and are all tracked through the query system, and +the results of the queries are cached on disk so that we can tell which +queries' results changed from the last compilation and only redo those. This is +how incremental compilation works. + +In principle, for the query-fied steps, we do each of the above for each item +individually. For example, we will take the HIR for a function and use queries +to ask for the LLVM IR for that HIR. This drives the generation of optimized +MIR, which drives the borrow checker, which drives the generation of MIR, and +so on. + +... except that this is very over-simplified. In fact, some queries are not +cached on disk, and some parts of the compiler have to run for all code anyway +for correctness even if the code is dead code (e.g. the borrow checker). For +example, [currently the `mir_borrowck` query is first executed on all functions +of a crate.][passes] Then the codegen backend invokes the +`collect_and_partition_mono_items` query, which first recursively requests the +`optimized_mir` for all reachable functions, which in turn runs `mir_borrowck` +for that function and then creates codegen units. This kind of split will need +to remain to ensure that unreachable functions still have their errors emitted. + +[passes]: https://github.com/rust-lang/rust/blob/45ebd5808afd3df7ba842797c0fcd4447ddf30fb/src/librustc_interface/passes.rs#L824 + +Moreover, the compiler wasn't originally built to use a query system; the query +system has been retrofitted into the compiler, so parts of it are not +query-fied yet. Also, LLVM isn't our code, so obviously that isn't querified +either. The plan is to eventually query-fy all of the steps listed in the +previous section, but as of this writing, only the steps between HIR and +LLVM-IR are query-fied. That is, lexing and parsing are done all at once for +the whole program. + +One other thing to mention here is the all-important "typing context", +[`TyCtxt`], which is a giant struct that is at the center of all things. All +queries are defined as methods on the [`TyCtxt`] type, and the in-memory query +cache is stored there too. In the code, there is usually a variable called +`tcx` which is a handle on the typing context. You will also see lifetimes with +the name `'tcx`, which means that something is tied to the lifetime of the +`TyCtxt` (usually it is stored or _interned_ there). + +[`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyCtxt.html + +### `ty::Ty` + +Types are really important in Rust, and they form the core of a lot of compiler +analyses. The main type (in the compiler) that represents types (in the user's +program) is [`rustc::ty::Ty`][ty]. This is so important that we have a whole chapter +on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is the way +`rustc` represents types! + +Oh, and also the `rustc::ty` module defines the `TyCtxt` struct we mentioned before. + +[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html + +### Parallelism + +Compiler performance is a problem that we would very much like to improve on +(and are always working on). One aspect of that is attempting to parallelize +`rustc` itself. + +Currently, there is only one part of rustc that is already parallel: codegen. +During monomorphization, the compiler will split up all the code to be +generated into smaller chunks called _codegen units_. These are then generated +by independent instances of LLVM. Since they are independent, we can run them +in parallel. At the end, the linker is run to combine all the codegen units +together into one binary. + +However, the rest of the compiler is still not yet parallel. There have been +lots of efforts spent on this, but it is generally a hard problem. The current +approach is (**TODO: verify**) to turn `RefCell`s into `Mutex`s -- that is, we +switch to thread-safe internal mutability. However, there are ongoing +challenges with lock contention, maintaining query-system invariants under +concurrency, and the complexity of the code base. One can try out the current +work by enabling parallel compilation in `config.toml`. It's still early days, +but there are already some promising performance improvements. + +### Bootstrapping + +**TODO (or do we want such a section)?** + +## A flow chart or walkthrough diagram + +**TODO** + +# Unresolved Questions + +**TODO: find answers to these** + +- Does LLVM ever do optimizations in debug builds? +- How do I explore phases of the compile process in my own sources (lexer, + parser, HIR, etc)? - e.g., `cargo rustc -- -Zunpretty=hir-tree` allows you to + view HIR representation +- What is the main source entry point for `X`? +- Where do phases diverge for cross-compilation to machine code across + different platforms? + +# References + +- Command line parsing + - Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html) + - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) + - Main entry point: **TODO** +- Lexical Analysis: Lex the user program to a stream of tokens + - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) + - Main entry point: **TODO** +- Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) + - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) + - Main entry point: **TODO** + - AST definition: [`syntax`](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/index.html) +- The High Level Intermediate Representation (HIR) + - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) + - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) + - Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map) + - Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html) + - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` + - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) + - Main entry point: **TODO** +- Type Inference + - Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html) + - Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics) + - Main entry point: **TODO** +- The Mid Level Intermediate Representation (MIR) + - Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html) + - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) + - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) + - Main entry point: **TODO** +- The Borrow Checker + - Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html) + - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) + - Main entry point: **TODO** +- MIR Optimizations + - Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html) + - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** + - Main entry point: **TODO** +- Code Generation + - Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html) + - Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** + - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** + - Main entry point MIR -> LLVM IR: **TODO** + - Main entry point LLVM IR -> Machine Code **TODO** From fca095dde19e2f1d27e5a85845dca72f1c48b1bc Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 28 Mar 2020 06:47:45 -0500 Subject: [PATCH 02/38] correct a few links --- src/overview.md | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/src/overview.md b/src/overview.md index 546f0ad13..c8b44d1c0 100644 --- a/src/overview.md +++ b/src/overview.md @@ -72,8 +72,8 @@ we'll talk about that later. [lex]: https://rust-lang.github.io/rustc-guide/the-parser.html [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html [`librustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html -[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html -[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html +[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parser/index.html +[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html [type inference]: https://rust-lang.github.io/rustc-guide/type-inference.html [mir]: https://rust-lang.github.io/rustc-guide/mir/index.html [borrow checker]: https://rust-lang.github.io/rustc-guide/borrow_check.html @@ -263,10 +263,6 @@ but there are already some promising performance improvements. **TODO (or do we want such a section)?** -## A flow chart or walkthrough diagram - -**TODO** - # Unresolved Questions **TODO: find answers to these** @@ -293,7 +289,7 @@ but there are already some promising performance improvements. - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) - Main entry point: **TODO** - - AST definition: [`syntax`](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/index.html) + - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - The High Level Intermediate Representation (HIR) - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) From 198c8d9bc30548424e5b123980f71f3dffb6ad34 Mon Sep 17 00:00:00 2001 From: mark Date: Thu, 2 Apr 2020 20:12:31 -0500 Subject: [PATCH 03/38] Apply Centril suggestions Co-Authored-By: Centril --- src/overview.md | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/src/overview.md b/src/overview.md index c8b44d1c0..82aa63705 100644 --- a/src/overview.md +++ b/src/overview.md @@ -50,7 +50,7 @@ we'll talk about that later. doesn't seem to be able to optimize the pattern the [`simplify_try`] mir opt looks for. - Rust code is _monomorphized_, which means making copies of all the generic - code with the type parameters replaced by concrete types. In order to do + code with the type parameters replaced by concrete types. To do this, we need to collect a list of what concrete types to generate code for. This is called _monomorphization collection_. - We then begin what is vaguely called _code generation_ or _codegen_. @@ -105,7 +105,7 @@ satisfy/optimize for. For example, - Compiler compilation speed: how long does it take to compile the compiler? This impacts contributors and compiler maintenance. - Compiler implementation complexity: building a compiler is one of the hardest - things a person/group can do, and rust is not a very simple language, so how + things a person/group can do, and Rust is not a very simple language, so how do we make the compiler's code base manageable? - Compiler correctness: the binaries produced by the compiler should do what the input programs says they do, and should continue to do so despite the @@ -119,14 +119,13 @@ satisfy/optimize for. For example, always going on to its implementation. - Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some strengths we leverage and some limitations/weaknesses we need to work around. -- And others that I'm probably forgetting. So, as you read through the rest of the guide, keep these things in mind. They will often inform decisions that we make. ### Constant change -One thing to keep in mind is that `rustc` is a real production-quality product. +Keep in mind that `rustc` is a real production-quality product. As such, it has its fair share of codebase churn and technical debt. A lot of the designs discussed throughout this guide are idealized designs that are not fully realized yet. And things keep changing so that it is hard to keep this @@ -139,19 +138,19 @@ to keep up with the requirements above. As with most compilers, `rustc` uses some intermediate representations (IRs) to facilitate computations. In general, working directly with the source code is -extremely inconvenient. Source code is designed to be human-friendly while at +extremely inconvenient and error-prone. Source code is designed to be human-friendly while at the same time being unambiguous, but it's less convenient for doing something like, say, type checking. Instead most compilers, including `rustc`, build some sort of IR out of the source code which is easier to analyze. `rustc` has a few IRs, each optimized -for different things: +for different purposes: - Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream of tokens produced by the lexer directly from the source code. It represents pretty much exactly what the user wrote. It helps to do some syntactic sanity checking (e.g. checking that a type is expected where the user wrote one). -- High-level IR (HIR): This is a sort of very desugared AST. It's still close +- High-level IR (HIR): This is a sort of desugared AST. It's still close to what the user wrote syntactically, but it includes some implicit things such as some elided lifetimes, etc. This IR is amenable to type checking. - HAIR: This is an intermediate between HIR and MIR. This only exists to make @@ -166,7 +165,7 @@ for different things: MIRI). Because MIR is still generic, we can do a lot of analyses here more efficiently than after monomorphization. - LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR - is basically a sort of typed assembly language with lots of annotations. It's + is a sort of typed assembly language with lots of annotations. It's a standard format that is used by all compilers that use LLVM (e.g. the clang C compiler also outputs LLVM IR). LLVM IR is designed to be easy for other compilers to emit and also rich enough for LLVM to run a bunch of @@ -181,9 +180,9 @@ compiler does this to make incremental compilation possible -- that is, if the user makes a change to their program and recompiles, we want to do as little redundant work as possible to produce the new binary. -In rustc, all the major steps above are organized as a bunch of queries that +In `rustc`, all the major steps above are organized as a bunch of queries that call each other. For example, there is a query to ask for the type of something -and another to ask for the optimized MIR of a function, and so on. These +and another to ask for the optimized MIR of a function. These queries can call each other and are all tracked through the query system, and the results of the queries are cached on disk so that we can tell which queries' results changed from the last compilation and only redo those. This is @@ -209,7 +208,7 @@ to remain to ensure that unreachable functions still have their errors emitted. Moreover, the compiler wasn't originally built to use a query system; the query system has been retrofitted into the compiler, so parts of it are not -query-fied yet. Also, LLVM isn't our code, so obviously that isn't querified +query-fied yet. Also, LLVM isn't our code, so that isn't querified either. The plan is to eventually query-fy all of the steps listed in the previous section, but as of this writing, only the steps between HIR and LLVM-IR are query-fied. That is, lexing and parsing are done all at once for @@ -239,8 +238,8 @@ Oh, and also the `rustc::ty` module defines the `TyCtxt` struct we mentioned bef ### Parallelism -Compiler performance is a problem that we would very much like to improve on -(and are always working on). One aspect of that is attempting to parallelize +Compiler performance is a problem that we would like to improve on +(and are always working on). One aspect of that is parallelizing `rustc` itself. Currently, there is only one part of rustc that is already parallel: codegen. From 95cf60d5bb1399d11b88955b35e7796740c32ab1 Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Fri, 3 Apr 2020 01:41:04 -0400 Subject: [PATCH 04/38] [overview.md] Add command line argument parsing, lexer stages, and parser outline --- src/overview.md | 132 ++++++++++++++++++++++++------------------------ 1 file changed, 66 insertions(+), 66 deletions(-) diff --git a/src/overview.md b/src/overview.md index 82aa63705..ee8761149 100644 --- a/src/overview.md +++ b/src/overview.md @@ -19,16 +19,16 @@ we'll talk about that later. **TODO: someone else should confirm this vvv** -- User writes a program and invokes `rustc` on it (possibly through `cargo`). -- First, we parse command line flags, etc. This is done in [`librustc_driver`]. - We now know what the exact work is we need to do (e.g. which nightly features - are enabled, whether we are doing a `check`-only build or emiting LLVM-IR or - a full compilation). -- Then, we start to do compilation... -- We first [_lex_ the user program][lex]. This turns the program into a stream - of _tokens_ (yes, the same sort of tokens as `proc_macros` (sort of)). - [`StringReader`] from [`librustc_parse`] integrates [`librustc_lexer`] with - `rustc` data structures. +- The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined with command line options. For example, it is possible to optionally enable nightly features, perform `check`-only builds, or emit LLVM-IR rather than complete the entire compile process defined here. The `rustc` executable call may be indirect through the use of `cargo`. +- Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user. +- The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?) +- The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols. +- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST). + - macro expansion (**TODO** chrissimpkins) + - ast validation (**TODO** chrissimpkins) + - nameres (**TODO** chrissimpkins) + - early linting (**TODO** chrissimpkins) + - We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST). - We then take the AST and [convert it to High-Level Intermediate @@ -45,27 +45,27 @@ we'll talk about that later. - We (want to) do [many optimizations on the MIR][mir-opt] because it is still generic and that improves the code we generate later, improving compilation speed too. (**TODO: size optimizations too?**) - - MIR is a higher level (and generic) representation, so it is easier to do - some optimizations at MIR level than at LLVM-IR level. For example LLVM - doesn't seem to be able to optimize the pattern the [`simplify_try`] mir - opt looks for. + - MIR is a higher level (and generic) representation, so it is easier to do + some optimizations at MIR level than at LLVM-IR level. For example LLVM + doesn't seem to be able to optimize the pattern the [`simplify_try`] mir + opt looks for. - Rust code is _monomorphized_, which means making copies of all the generic code with the type parameters replaced by concrete types. To do this, we need to collect a list of what concrete types to generate code for. This is called _monomorphization collection_. - We then begin what is vaguely called _code generation_ or _codegen_. - - The [code generation stage (codegen)][codegen] is when higher level - representations of source are turned into an executable binary. `rustc` + - The [code generation stage (codegen)][codegen] is when higher level + representations of source are turned into an executable binary. `rustc` uses LLVM for code generation. The first step is the MIR is then - converted to LLVM Intermediate Representation (LLVM IR). This is where - the MIR is actually monomorphized, according to the list we created in - the previous step. - - The LLVM IR is passed to LLVM, which does a lot more optimizations on it. - It then emits machine code. It is basically assembly code with additional - low-level types and annotations added. (e.g. an ELF object or wasm). - **TODO: reference for this section?** - - The different libraries/binaries are linked together to produce the final - binary. **TODO: reference for this section?** + converted to LLVM Intermediate Representation (LLVM IR). This is where + the MIR is actually monomorphized, according to the list we created in + the previous step. + - The LLVM IR is passed to LLVM, which does a lot more optimizations on it. + It then emits machine code. It is basically assembly code with additional + low-level types and annotations added. (e.g. an ELF object or wasm). + **TODO: reference for this section?** + - The different libraries/binaries are linked together to produce the final + binary. **TODO: reference for this section?** [`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html [`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html @@ -90,12 +90,12 @@ satisfy/optimize for. For example, - Compilation speed: how fast is it to compile a program. More/better compile-time analyses often means compilation is slower. - - Also, we want to support incremental compilation, so we need to take that - into account. How can we keep track of what work needs to be redone and - what can be reused if the user modifies their program? - - Also we can't store too much stuff in the incremental cache because - it would take a long time to load from disk and it could take a lot - of space on the user's system... + - Also, we want to support incremental compilation, so we need to take that + into account. How can we keep track of what work needs to be redone and + what can be reused if the user modifies their program? + - Also we can't store too much stuff in the incremental cache because + it would take a long time to load from disk and it could take a lot + of space on the user's system... - Compiler memory usage: while compiling a program, we don't want to use more memory than we need. - Program speed: how fast is your compiled program. More/better compile-time @@ -277,46 +277,46 @@ but there are already some promising performance improvements. # References - Command line parsing - - Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html) - - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) - - Main entry point: **TODO** + - Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html) + - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) + - Main entry point: **TODO** - Lexical Analysis: Lex the user program to a stream of tokens - - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) - - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) - - Main entry point: **TODO** + - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) + - Main entry point: **TODO** - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) - - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) - - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) - - Main entry point: **TODO** - - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) + - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) + - Main entry point: **TODO** + - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - The High Level Intermediate Representation (HIR) - - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) - - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) - - Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map) - - Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html) - - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` - - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) - - Main entry point: **TODO** + - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) + - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) + - Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map) + - Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html) + - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` + - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) + - Main entry point: **TODO** - Type Inference - - Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html) - - Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics) - - Main entry point: **TODO** + - Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html) + - Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics) + - Main entry point: **TODO** - The Mid Level Intermediate Representation (MIR) - - Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html) - - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) - - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) - - Main entry point: **TODO** + - Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html) + - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) + - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) + - Main entry point: **TODO** - The Borrow Checker - - Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html) - - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) - - Main entry point: **TODO** + - Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html) + - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) + - Main entry point: **TODO** - MIR Optimizations - - Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html) - - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** - - Main entry point: **TODO** + - Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html) + - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** + - Main entry point: **TODO** - Code Generation - - Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html) - - Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** - - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - - Main entry point MIR -> LLVM IR: **TODO** - - Main entry point LLVM IR -> Machine Code **TODO** + - Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html) + - Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** + - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** + - Main entry point MIR -> LLVM IR: **TODO** + - Main entry point LLVM IR -> Machine Code **TODO** From 268cb1f3ee9acc202b677e6d22f23a270320856e Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Fri, 3 Apr 2020 09:54:12 -0400 Subject: [PATCH 05/38] Update src/overview.md Co-Authored-By: LeSeulArtichaut --- src/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index ee8761149..efb96a68f 100644 --- a/src/overview.md +++ b/src/overview.md @@ -19,7 +19,7 @@ we'll talk about that later. **TODO: someone else should confirm this vvv** -- The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined with command line options. For example, it is possible to optionally enable nightly features, perform `check`-only builds, or emit LLVM-IR rather than complete the entire compile process defined here. The `rustc` executable call may be indirect through the use of `cargo`. +- The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined by command-line options. For example, it is possible to enable nightly features (`-Z` flags), perform `check`-only builds, or emit LLVM-IR rather than executable machine code. The `rustc` executable call may be indirect through the use of `cargo`. - Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user. - The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?) - The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols. From 9afb848c7b2b4802dd21330f1377579f5a381b65 Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Fri, 3 Apr 2020 09:59:09 -0400 Subject: [PATCH 06/38] Update src/overview.md Co-Authored-By: LeSeulArtichaut --- src/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index efb96a68f..f0b2887de 100644 --- a/src/overview.md +++ b/src/overview.md @@ -20,7 +20,7 @@ we'll talk about that later. **TODO: someone else should confirm this vvv** - The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined by command-line options. For example, it is possible to enable nightly features (`-Z` flags), perform `check`-only builds, or emit LLVM-IR rather than executable machine code. The `rustc` executable call may be indirect through the use of `cargo`. -- Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user. +- Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user and passes it to the rest of the compilation process as a [`rustc_interface::Config`]. - The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?) - The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols. - (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST). From 4dfb4fc176d5590b2298fd1a327f4f4cbdfc4eb2 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 12:30:45 -0500 Subject: [PATCH 07/38] fix old rustc-dev-guide links --- src/overview.md | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/src/overview.md b/src/overview.md index f0b2887de..148d7ab34 100644 --- a/src/overview.md +++ b/src/overview.md @@ -68,18 +68,18 @@ we'll talk about that later. binary. **TODO: reference for this section?** [`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html -[`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html -[lex]: https://rust-lang.github.io/rustc-guide/the-parser.html +[`librustc_driver`]: https://rustc-dev-guide.rust-lang.org/rustc-driver.html +[lex]: https://rustc-dev-guide.rust-lang.org/the-parser.html [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html [`librustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parser/index.html [hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html -[type inference]: https://rust-lang.github.io/rustc-guide/type-inference.html -[mir]: https://rust-lang.github.io/rustc-guide/mir/index.html -[borrow checker]: https://rust-lang.github.io/rustc-guide/borrow_check.html -[mir-opt]: https://rust-lang.github.io/rustc-guide/mir/optimizations.html +[type inference]: https://rustc-dev-guide.rust-lang.org/type-inference.html +[mir]: https://rustc-dev-guide.rust-lang.org/mir/index.html +[borrow checker]: https://rustc-dev-guide.rust-lang.org/borrow_check.html +[mir-opt]: https://rustc-dev-guide.rust-lang.org/mir/optimizations.html [`simplify_try`]: https://github.com/rust-lang/rust/pull/66282 -[codegen]: https://rust-lang.github.io/rustc-guide/codegen.html +[codegen]: https://rustc-dev-guide.rust-lang.org/codegen.html ## How it does it @@ -277,46 +277,46 @@ but there are already some promising performance improvements. # References - Command line parsing - - Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html) + - Guide: [The Rustc Driver and Interface](https://rustc-dev-guide.rust-lang.org/rustc-driver.html) - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) - Main entry point: **TODO** - Lexical Analysis: Lex the user program to a stream of tokens - - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Guide: [Lexing and Parsing](https://rustc-dev-guide.rust-lang.org/the-parser.html) - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) - Main entry point: **TODO** - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) - - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Guide: [Lexing and Parsing](https://rustc-dev-guide.rust-lang.org/the-parser.html) - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) - Main entry point: **TODO** - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - The High Level Intermediate Representation (HIR) - - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) - - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) - - Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map) - - Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html) + - Guide: [The HIR](https://rustc-dev-guide.rust-lang.org/hir.html) + - Guide: [Identifiers in the HIR](https://rustc-dev-guide.rust-lang.org/hir.html#identifiers-in-the-hir) + - Guide: [The HIR Map](https://rustc-dev-guide.rust-lang.org/hir.html#the-hir-map) + - Guide: [Lowering AST to HIR](https://rustc-dev-guide.rust-lang.org/lowering.html) - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) - Main entry point: **TODO** - Type Inference - - Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html) - - Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics) + - Guide: [Type Inference](https://rustc-dev-guide.rust-lang.org/type-inference.html) + - Guide: [The ty Module: Representing Types](https://rustc-dev-guide.rust-lang.org/ty.html) (semantics) - Main entry point: **TODO** - The Mid Level Intermediate Representation (MIR) - - Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html) + - Guide: [The MIR (Mid level IR)](https://rustc-dev-guide.rust-lang.org/mir/index.html) - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) - Main entry point: **TODO** - The Borrow Checker - - Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html) + - Guide: [MIR Borrow Check](https://rustc-dev-guide.rust-lang.org/borrow_check.html) - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) - Main entry point: **TODO** - MIR Optimizations - - Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html) + - Guide: [MIR Optimizations](https://rustc-dev-guide.rust-lang.org/mir/optimizations.html) - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** - Main entry point: **TODO** - Code Generation - - Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html) - - Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** + - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/codegen.html) + - Guide: [Generating LLVM IR](https://rustc-dev-guide.rust-lang.org/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - Main entry point MIR -> LLVM IR: **TODO** - Main entry point LLVM IR -> Machine Code **TODO** From 2819e069e6c71f5fb4bbad1d70dafe18b049533d Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 12:46:14 -0500 Subject: [PATCH 08/38] Add some entry points Co-Authored-By: LeSeulArtichaut --- src/overview.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/src/overview.md b/src/overview.md index 148d7ab34..9c6b71b76 100644 --- a/src/overview.md +++ b/src/overview.md @@ -279,11 +279,11 @@ but there are already some promising performance improvements. - Command line parsing - Guide: [The Rustc Driver and Interface](https://rustc-dev-guide.rust-lang.org/rustc-driver.html) - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) - - Main entry point: **TODO** + - Main entry point: [`rustc_session::config::build_session_options`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/config/fn.build_session_options.html) - Lexical Analysis: Lex the user program to a stream of tokens - Guide: [Lexing and Parsing](https://rustc-dev-guide.rust-lang.org/the-parser.html) - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) - - Main entry point: **TODO** + - Main entry point: [`rustc_lexer::tokenize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/fn.tokenize.html) - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) - Guide: [Lexing and Parsing](https://rustc-dev-guide.rust-lang.org/the-parser.html) - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) @@ -309,14 +309,13 @@ but there are already some promising performance improvements. - The Borrow Checker - Guide: [MIR Borrow Check](https://rustc-dev-guide.rust-lang.org/borrow_check.html) - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) - - Main entry point: **TODO** + - Main entry point: [`mir_borrowck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html) - MIR Optimizations - Guide: [MIR Optimizations](https://rustc-dev-guide.rust-lang.org/mir/optimizations.html) - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** - - Main entry point: **TODO** + - Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/fn.optimized_mir.html) - Code Generation - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/codegen.html) - Guide: [Generating LLVM IR](https://rustc-dev-guide.rust-lang.org/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - - Main entry point MIR -> LLVM IR: **TODO** - - Main entry point LLVM IR -> Machine Code **TODO** + - Main entry point MIR -> Machine Code: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) From 720db6b0f99860258c3e048582c1f952cabf2b4b Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 12:50:27 -0500 Subject: [PATCH 09/38] mention the hair --- src/overview.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/overview.md b/src/overview.md index 9c6b71b76..0648ac7e1 100644 --- a/src/overview.md +++ b/src/overview.md @@ -40,6 +40,9 @@ we'll talk about that later. - **TODO: Maybe some other things are done here? I think initial type checking happens here? And trait solving?** - The HIR is then [lowered to Mid-Level Intermediate Representation (MIR)][mir]. + - Along the way, we construct the HAIR, which is an even more desugared HIR. + HAIR is used for pattern and exhaustiveness checking. It is also more + convenient to convert into MIR than HIR is. - The MIR is used for [borrow checking]. - **TODO: const eval fits in somewhere here I think** - We (want to) do [many optimizations on the MIR][mir-opt] because it is still From e72118436ba0a555c70b42efc9fda9649b79ff90 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 12:52:55 -0500 Subject: [PATCH 10/38] mention token stream as an IR --- src/overview.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 0648ac7e1..cebff545d 100644 --- a/src/overview.md +++ b/src/overview.md @@ -149,8 +149,11 @@ Instead most compilers, including `rustc`, build some sort of IR out of the source code which is easier to analyze. `rustc` has a few IRs, each optimized for different purposes: +- Token stream: the lexer produces a stream of tokens directly from the source + code. This stream of tokens is easier for the parser to deal with than raw + text. - Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream - of tokens produced by the lexer directly from the source code. It represents + of tokens produced by the lexer. It represents pretty much exactly what the user wrote. It helps to do some syntactic sanity checking (e.g. checking that a type is expected where the user wrote one). - High-level IR (HIR): This is a sort of desugared AST. It's still close From d47d79a85281016817f1b527831b39ab7767a017 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 12:57:14 -0500 Subject: [PATCH 11/38] correct the note about HAIR --- src/overview.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/overview.md b/src/overview.md index cebff545d..8f6d00d7e 100644 --- a/src/overview.md +++ b/src/overview.md @@ -29,8 +29,6 @@ we'll talk about that later. - nameres (**TODO** chrissimpkins) - early linting (**TODO** chrissimpkins) -- We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax - Tree (AST). - We then take the AST and [convert it to High-Level Intermediate Representation (HIR)][hir]. This is a compiler-friendly representation of the AST. This involves a lot of desugaring of things like loops and `async fn`. @@ -159,8 +157,10 @@ for different purposes: - High-level IR (HIR): This is a sort of desugared AST. It's still close to what the user wrote syntactically, but it includes some implicit things such as some elided lifetimes, etc. This IR is amenable to type checking. -- HAIR: This is an intermediate between HIR and MIR. This only exists to make - it easier to lower HIR to MIR. +- HAIR: This is an intermediate between HIR and MIR. It is like the HIR but it + is fully typed and a bit more desugared (e.g. method calls and implicit + dereferences are made fully explicit). Moreover, it is easier to lower to MIR + than HIR. - Middle-level IR (MIR): This IR is basically a Control-Flow Graph (CFG). A CFG is a type of diagram that shows the basic blocks of a program and how control flow can go between them. Likewise, MIR also has a bunch of basic blocks with From 1a4d5eb9bff239ee98f594b939a9eef3a6049e31 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 12:59:51 -0500 Subject: [PATCH 12/38] Improve description of MIR Co-Authored-By: Centril --- src/overview.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/overview.md b/src/overview.md index 8f6d00d7e..37a082070 100644 --- a/src/overview.md +++ b/src/overview.md @@ -165,9 +165,10 @@ for different purposes: is a type of diagram that shows the basic blocks of a program and how control flow can go between them. Likewise, MIR also has a bunch of basic blocks with simple typed statements inside them (e.g. assignment, simple computations, - dropping values, etc). MIR is used for borrow checking and a bunch of other - important dataflow based checks, such as checking for uninitialized values. - It is also used for a bunch of optimizations and for constant evaluation (via + etc) and control flow edges to other basic blocks (e.g., calls, dropping + values). MIR is used for borrow checking and other + important dataflow-based checks, such as checking for uninitialized values. + It is also used for a series of optimizations and for constant evaluation (via MIRI). Because MIR is still generic, we can do a lot of analyses here more efficiently than after monomorphization. - LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR From 8371ddc62e0e931ed44671f202316eaac6960a20 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:02:04 -0500 Subject: [PATCH 13/38] break long sentence --- src/overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/overview.md b/src/overview.md index 37a082070..3bd60904f 100644 --- a/src/overview.md +++ b/src/overview.md @@ -190,8 +190,8 @@ redundant work as possible to produce the new binary. In `rustc`, all the major steps above are organized as a bunch of queries that call each other. For example, there is a query to ask for the type of something and another to ask for the optimized MIR of a function. These -queries can call each other and are all tracked through the query system, and -the results of the queries are cached on disk so that we can tell which +queries can call each other and are all tracked through the query system. +The results of the queries are cached on disk so that we can tell which queries' results changed from the last compilation and only redo those. This is how incremental compilation works. From 55409939e57e0437536a696d720177738887fb9e Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:04:36 -0500 Subject: [PATCH 14/38] add a note on tcx name --- src/overview.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 3bd60904f..d92142e38 100644 --- a/src/overview.md +++ b/src/overview.md @@ -222,7 +222,10 @@ LLVM-IR are query-fied. That is, lexing and parsing are done all at once for the whole program. One other thing to mention here is the all-important "typing context", -[`TyCtxt`], which is a giant struct that is at the center of all things. All +[`TyCtxt`], which is a giant struct that is at the center of all things. +(Note that the name is mostly historic. This is _not_ a "typing context" in the +sense of `Γ` or `Δ` from type theory. The name is retained because that's what +the name of the struct is in the source code.) All queries are defined as methods on the [`TyCtxt`] type, and the in-memory query cache is stored there too. In the code, there is usually a variable called `tcx` which is a handle on the typing context. You will also see lifetimes with From ddecf6c925821b737cce3fc328e5cdd6b205c85d Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:17:04 -0500 Subject: [PATCH 15/38] write a bit about bootstrapping --- src/overview.md | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index d92142e38..7b8cae839 100644 --- a/src/overview.md +++ b/src/overview.md @@ -270,7 +270,23 @@ but there are already some promising performance improvements. ### Bootstrapping -**TODO (or do we want such a section)?** +`rustc` itself is written in Rust. So how do we compile the compiler? We use an +older compiler to compile the newer compiler. This is called _bootstrapping_. + +Bootstrapping has a lot of interesting implications. For example, it means that one +of the major users of Rust is Rust, so we are constantly testing our own +software ("eating our own dogfood"). Also, it means building the compiler can +take a long time because one must first build the compiler and then use that to +build the new compiler (sometimes you can get away without the full 2-stage +build, but for release artifacts you need the 2-stage build). + +Bootstrapping also has implications for when features are usable in the +compiler itself. The build system uses the current beta compiler to build the +stage-1 bootstrapping compiler. This means that the compiler source code can't +use some features until they reach beta (because otherwise the beta compiler +doesn't support them). On the other hand, for compiler intrinsics and internal +features, we may be able to use them immediately because the stage-1 +bootstrapping compiler will support them. # Unresolved Questions From 17656d2fc4f04c6800a2c2c607e41cf635ad60d0 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:19:21 -0500 Subject: [PATCH 16/38] add a few todos --- src/overview.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/overview.md b/src/overview.md index 7b8cae839..350ff5523 100644 --- a/src/overview.md +++ b/src/overview.md @@ -315,6 +315,10 @@ bootstrapping compiler will support them. - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) - Main entry point: **TODO** - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) + - Expansion: **TODO** + - Name Resolution: **TODO** + - Feature gating: **TODO** + - Early linting: **TODO** - The High Level Intermediate Representation (HIR) - Guide: [The HIR](https://rustc-dev-guide.rust-lang.org/hir.html) - Guide: [Identifiers in the HIR](https://rustc-dev-guide.rust-lang.org/hir.html#identifiers-in-the-hir) @@ -323,6 +327,7 @@ bootstrapping compiler will support them. - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) - Main entry point: **TODO** + - Late linting: **TODO** - Type Inference - Guide: [Type Inference](https://rustc-dev-guide.rust-lang.org/type-inference.html) - Guide: [The ty Module: Representing Types](https://rustc-dev-guide.rust-lang.org/ty.html) (semantics) From 67d4d9bc2e29c72ce100ab5f6e50d859c8dfe021 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:20:48 -0500 Subject: [PATCH 17/38] line lengths --- src/overview.md | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/src/overview.md b/src/overview.md index 350ff5523..26f7a2ad1 100644 --- a/src/overview.md +++ b/src/overview.md @@ -19,11 +19,25 @@ we'll talk about that later. **TODO: someone else should confirm this vvv** -- The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined by command-line options. For example, it is possible to enable nightly features (`-Z` flags), perform `check`-only builds, or emit LLVM-IR rather than executable machine code. The `rustc` executable call may be indirect through the use of `cargo`. -- Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user and passes it to the rest of the compilation process as a [`rustc_interface::Config`]. -- The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?) -- The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols. -- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST). +- The compile process begins when a user writes a Rust source program in text + and invokes the `rustc` compiler on it. The work that the compiler needs to + perform is defined by command-line options. For example, it is possible to + enable nightly features (`-Z` flags), perform `check`-only builds, or emit + LLVM-IR rather than executable machine code. The `rustc` executable call may + be indirect through the use of `cargo`. +- Command line argument parsing occurs in the [`librustc_driver`]. This crate + defines the compile configuration that is requested by the user and passes it + to the rest of the compilation process as a [`rustc_interface::Config`]. +- The raw Rust source text is analyzed by a low-level lexer located in + [`librustc_lexer`]. At this stage, the source text is turned into a stream of + atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe + discuss Unicode handling during this stage?) +- The token stream passes through a higher-level lexer located in + [`librustc_parse`] to prepare for the next stage of the compile process. The + [`StringReader`] struct is used at this stage to perform a set of validations + and turn strings into interned symbols. +- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream + of tokens][parser] to build an Abstract Syntax Tree (AST). - macro expansion (**TODO** chrissimpkins) - ast validation (**TODO** chrissimpkins) - nameres (**TODO** chrissimpkins) From 392ffd233cb5b88425cfc352a20851a5dee091bb Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:23:23 -0500 Subject: [PATCH 18/38] fix links --- src/overview.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 26f7a2ad1..4d5ff03b4 100644 --- a/src/overview.md +++ b/src/overview.md @@ -84,6 +84,7 @@ we'll talk about that later. [`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html [`librustc_driver`]: https://rustc-dev-guide.rust-lang.org/rustc-driver.html +[`rustc_interface::Config`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html [lex]: https://rustc-dev-guide.rust-lang.org/the-parser.html [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html [`librustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html @@ -91,7 +92,7 @@ we'll talk about that later. [hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html [type inference]: https://rustc-dev-guide.rust-lang.org/type-inference.html [mir]: https://rustc-dev-guide.rust-lang.org/mir/index.html -[borrow checker]: https://rustc-dev-guide.rust-lang.org/borrow_check.html +[borrow checking]: https://rustc-dev-guide.rust-lang.org/borrow_check.html [mir-opt]: https://rustc-dev-guide.rust-lang.org/mir/optimizations.html [`simplify_try`]: https://github.com/rust-lang/rust/pull/66282 [codegen]: https://rustc-dev-guide.rust-lang.org/codegen.html From e35b046b7171c8a3f825d13d6645873931472170 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:27:24 -0500 Subject: [PATCH 19/38] remove a todo --- src/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 4d5ff03b4..e2dc1e12b 100644 --- a/src/overview.md +++ b/src/overview.md @@ -358,7 +358,7 @@ bootstrapping compiler will support them. - Main entry point: [`mir_borrowck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html) - MIR Optimizations - Guide: [MIR Optimizations](https://rustc-dev-guide.rust-lang.org/mir/optimizations.html) - - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** + - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) - Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/fn.optimized_mir.html) - Code Generation - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/codegen.html) From 9c9cf7fe55f5056324fc1faa9155d9412b10ada6 Mon Sep 17 00:00:00 2001 From: mark Date: Sat, 4 Apr 2020 13:30:40 -0500 Subject: [PATCH 20/38] add an entry point --- src/overview.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index e2dc1e12b..9b67343af 100644 --- a/src/overview.md +++ b/src/overview.md @@ -364,4 +364,5 @@ bootstrapping compiler will support them. - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/codegen.html) - Guide: [Generating LLVM IR](https://rustc-dev-guide.rust-lang.org/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - - Main entry point MIR -> Machine Code: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) + - Main entry point MIR -> LLVM IR: [`MonoItem::define`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/mono_item/enum.MonoItem.html#method.define) + - Main entry point LLVM IR -> Machine Code: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) From 8462cd8ebba4a790fd7b3dc661185a4909741d07 Mon Sep 17 00:00:00 2001 From: Who? Me?! Date: Tue, 14 Apr 2020 15:15:32 -0500 Subject: [PATCH 21/38] Improve wording Co-Authored-By: Santiago Pastorino --- src/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 9b67343af..7660b2c53 100644 --- a/src/overview.md +++ b/src/overview.md @@ -175,7 +175,7 @@ for different purposes: - HAIR: This is an intermediate between HIR and MIR. It is like the HIR but it is fully typed and a bit more desugared (e.g. method calls and implicit dereferences are made fully explicit). Moreover, it is easier to lower to MIR - than HIR. + from HAIR than from HIR. - Middle-level IR (MIR): This IR is basically a Control-Flow Graph (CFG). A CFG is a type of diagram that shows the basic blocks of a program and how control flow can go between them. Likewise, MIR also has a bunch of basic blocks with From a90e4721522f7a2726b8708e81541e16974b2273 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:03:08 -0500 Subject: [PATCH 22/38] fix lexer entry point --- src/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 7660b2c53..09285c5ba 100644 --- a/src/overview.md +++ b/src/overview.md @@ -324,7 +324,7 @@ bootstrapping compiler will support them. - Lexical Analysis: Lex the user program to a stream of tokens - Guide: [Lexing and Parsing](https://rustc-dev-guide.rust-lang.org/the-parser.html) - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) - - Main entry point: [`rustc_lexer::tokenize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/fn.tokenize.html) + - Main entry point: [`rustc_lexer::first_token`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/fn.first_token.html) - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) - Guide: [Lexing and Parsing](https://rustc-dev-guide.rust-lang.org/the-parser.html) - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) From 0db4271a81340f8df5c6441f7c94bd037cc0382b Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Sun, 5 Apr 2020 22:20:44 -0400 Subject: [PATCH 23/38] [overview.md] add parser entry point links --- src/overview.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 09285c5ba..432add21c 100644 --- a/src/overview.md +++ b/src/overview.md @@ -328,7 +328,10 @@ bootstrapping compiler will support them. - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) - Guide: [Lexing and Parsing](https://rustc-dev-guide.rust-lang.org/the-parser.html) - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) - - Main entry point: **TODO** + - Main entry points: + - [Entry point for first file in crate](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/passes/fn.parse.html) + - [Entry point for outline module parsing](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html) + - [Entry point for macro fragments](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/fn.parse_nt.html) - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - Expansion: **TODO** - Name Resolution: **TODO** From 46e549b619dfab5565dd7b1e4f7cd3293c11f249 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:12:44 -0500 Subject: [PATCH 24/38] add a para on interning and arenas --- src/overview.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/overview.md b/src/overview.md index 432add21c..7085bb2a7 100644 --- a/src/overview.md +++ b/src/overview.md @@ -35,7 +35,7 @@ we'll talk about that later. - The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations - and turn strings into interned symbols. + and turn strings into interned symbols (_interning_ is discussed later). - (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST). - macro expansion (**TODO** chrissimpkins) @@ -193,6 +193,14 @@ for different purposes: compilers to emit and also rich enough for LLVM to run a bunch of optimizations on it. +One other thing to note is that many values in the compiler are _interned_. +This is a performance and memory optimization in which we allocate the values +in a special allocator called an _arena_. Then, we pass around references to +the values allocated in the arena. This allows us to make sure that identical +values (e.g. types in your program) are only allocated once and can be compared +cheaply by comparing pointers. Many of the intermediate representations are +interned. + ### Queries The first big implementation choice is the _query_ system. The rust compiler @@ -245,7 +253,7 @@ queries are defined as methods on the [`TyCtxt`] type, and the in-memory query cache is stored there too. In the code, there is usually a variable called `tcx` which is a handle on the typing context. You will also see lifetimes with the name `'tcx`, which means that something is tied to the lifetime of the -`TyCtxt` (usually it is stored or _interned_ there). +`TyCtxt` (usually it is stored or interned there). [`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyCtxt.html From 8760fbc34d5142186efdc0f1523dadb80fe9b562 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:18:55 -0500 Subject: [PATCH 25/38] add entry points for type check and type inference Co-Authored-By: LeSeulArtichaut --- src/overview.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 7085bb2a7..a8a631b96 100644 --- a/src/overview.md +++ b/src/overview.md @@ -357,7 +357,9 @@ bootstrapping compiler will support them. - Type Inference - Guide: [Type Inference](https://rustc-dev-guide.rust-lang.org/type-inference.html) - Guide: [The ty Module: Representing Types](https://rustc-dev-guide.rust-lang.org/ty.html) (semantics) - - Main entry point: **TODO** + - Main entry point (type inference): [`InferCtxtBuilder::enter`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_infer/infer/struct.InferCtxtBuilder.html#method.enter) + - Main entry point (type checking bodies): [the `typeck_tables_of` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.typeck_tables_of) + - These two functions can't be decoupled. - The Mid Level Intermediate Representation (MIR) - Guide: [The MIR (Mid level IR)](https://rustc-dev-guide.rust-lang.org/mir/index.html) - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) From bbc61290950065b7df9c0bc7e6dd80cbcad6f2b5 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:30:39 -0500 Subject: [PATCH 26/38] some cleanup --- src/overview.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/src/overview.md b/src/overview.md index a8a631b96..2565f902b 100644 --- a/src/overview.md +++ b/src/overview.md @@ -17,8 +17,6 @@ we'll talk about that later. **TODO: Would be great to have a diagram of this once we nail down the details...** -**TODO: someone else should confirm this vvv** - - The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined by command-line options. For example, it is possible to @@ -47,8 +45,7 @@ we'll talk about that later. Representation (HIR)][hir]. This is a compiler-friendly representation of the AST. This involves a lot of desugaring of things like loops and `async fn`. - We use the HIR to do [type inference]. This is the process of automatic - detection of the type of an expression. **TODO: how `ty` module fits in - here** + detection of the type of an expression. - **TODO: Maybe some other things are done here? I think initial type checking happens here? And trait solving?** - The HIR is then [lowered to Mid-Level Intermediate Representation (MIR)][mir]. @@ -364,7 +361,6 @@ bootstrapping compiler will support them. - Guide: [The MIR (Mid level IR)](https://rustc-dev-guide.rust-lang.org/mir/index.html) - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) - - Main entry point: **TODO** - The Borrow Checker - Guide: [MIR Borrow Check](https://rustc-dev-guide.rust-lang.org/borrow_check.html) - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) From ab95d001ff2823597dc4babdddc1565303633a61 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:30:48 -0500 Subject: [PATCH 27/38] correct entry points for codegen --- src/overview.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/overview.md b/src/overview.md index 2565f902b..751a01ec0 100644 --- a/src/overview.md +++ b/src/overview.md @@ -371,7 +371,7 @@ bootstrapping compiler will support them. - Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/fn.optimized_mir.html) - Code Generation - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/codegen.html) - - Guide: [Generating LLVM IR](https://rustc-dev-guide.rust-lang.org/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - - Main entry point MIR -> LLVM IR: [`MonoItem::define`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/mono_item/enum.MonoItem.html#method.define) - - Main entry point LLVM IR -> Machine Code: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) + - Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) + - This monomorphizes and produces LLVM IR for one codegen unit. It then starts a background thread to run LLVM, which must be joined later. + - Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) From aa1184de48d09a1a356c2e761866f784ebc8f33f Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Sun, 5 Apr 2020 22:50:28 -0400 Subject: [PATCH 28/38] [overview.md] add documentation of lexer support for Unicode encoding --- src/overview.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/overview.md b/src/overview.md index 751a01ec0..d641718c8 100644 --- a/src/overview.md +++ b/src/overview.md @@ -28,8 +28,8 @@ we'll talk about that later. to the rest of the compilation process as a [`rustc_interface::Config`]. - The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of - atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe - discuss Unicode handling during this stage?) + atomic source code units known as _tokens_. The lexer supports the Unicode + character encoding. - The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations From 4cf7d02ab1bb41ece9266c87a32d3d9c5f814372 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:34:23 -0500 Subject: [PATCH 29/38] update mono entry points --- src/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index d641718c8..8bfa20118 100644 --- a/src/overview.md +++ b/src/overview.md @@ -374,4 +374,4 @@ bootstrapping compiler will support them. - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) - This monomorphizes and produces LLVM IR for one codegen unit. It then starts a background thread to run LLVM, which must be joined later. - - Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) + - Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) and [`rustc_codegen_ssa::base::codegen_instance `](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_instance.html) From d8a3c894701d83df657d1fe3678abf3c2b772041 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:37:53 -0500 Subject: [PATCH 30/38] minor wording change on bootstrapping --- src/overview.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/overview.md b/src/overview.md index 8bfa20118..744ef3b1f 100644 --- a/src/overview.md +++ b/src/overview.md @@ -293,12 +293,13 @@ but there are already some promising performance improvements. `rustc` itself is written in Rust. So how do we compile the compiler? We use an older compiler to compile the newer compiler. This is called _bootstrapping_. -Bootstrapping has a lot of interesting implications. For example, it means that one -of the major users of Rust is Rust, so we are constantly testing our own +Bootstrapping has a lot of interesting implications. For example, it means that +one of the major users of Rust is Rust, so we are constantly testing our own software ("eating our own dogfood"). Also, it means building the compiler can -take a long time because one must first build the compiler and then use that to -build the new compiler (sometimes you can get away without the full 2-stage -build, but for release artifacts you need the 2-stage build). +take a long time because one must first build the new compiler with an older +compiler and then use that to build the new compiler with itself (sometimes you +can get away without the full 2-stage build, but for release artifacts you need +the 2-stage build). Bootstrapping also has implications for when features are usable in the compiler itself. The build system uses the current beta compiler to build the From 378034475bc1be0629203ed6a416a7d85287e22e Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:41:50 -0500 Subject: [PATCH 31/38] add intrinsics to glossary --- src/appendix/glossary.md | 1 + 1 file changed, 1 insertion(+) diff --git a/src/appendix/glossary.md b/src/appendix/glossary.md index 11ddb494f..5dc8f3801 100644 --- a/src/appendix/glossary.md +++ b/src/appendix/glossary.md @@ -36,6 +36,7 @@ ICH
| Short for incremental compilation ha infcx
| The inference context (see `librustc_middle/infer`) inference variable
| When doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type. intern
| Interning refers to storing certain frequently-used constant data, such as strings, and then referring to the data by an identifier (e.g. a `Symbol`) rather than the data itself, to reduce memory usage and number of allocations. See [this chapter](../memory.md) for more info. +intrinsic
| Intrinsics are special functions that are implemented in the compiler itself but exposed (often unstably) to users. They do magical and dangerous things. (See [`std::intrinsics`](https://doc.rust-lang.org/std/intrinsics/index.html)) IR
| Short for Intermediate Representation, a general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it. IRLO
| `IRLO` or `irlo` is sometimes used as an abbreviation for [internals.rust-lang.org](https://internals.rust-lang.org). item
| A kind of "definition" in the language, such as a static, const, use statement, module, struct, etc. Concretely, this corresponds to the `Item` type. From 74f5f8865d88dea2bb625310652151ed2b648614 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 16:48:04 -0500 Subject: [PATCH 32/38] fix links --- src/overview.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/overview.md b/src/overview.md index 744ef3b1f..69e71723e 100644 --- a/src/overview.md +++ b/src/overview.md @@ -85,14 +85,14 @@ we'll talk about that later. [lex]: https://rustc-dev-guide.rust-lang.org/the-parser.html [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html [`librustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html -[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parser/index.html +[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html [hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html [type inference]: https://rustc-dev-guide.rust-lang.org/type-inference.html [mir]: https://rustc-dev-guide.rust-lang.org/mir/index.html [borrow checking]: https://rustc-dev-guide.rust-lang.org/borrow_check.html [mir-opt]: https://rustc-dev-guide.rust-lang.org/mir/optimizations.html [`simplify_try`]: https://github.com/rust-lang/rust/pull/66282 -[codegen]: https://rustc-dev-guide.rust-lang.org/codegen.html +[codegen]: https://rustc-dev-guide.rust-lang.org/backend/codegen.html ## How it does it @@ -252,7 +252,7 @@ cache is stored there too. In the code, there is usually a variable called the name `'tcx`, which means that something is tied to the lifetime of the `TyCtxt` (usually it is stored or interned there). -[`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyCtxt.html +[`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html ### `ty::Ty` @@ -264,7 +264,7 @@ on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is th Oh, and also the `rustc::ty` module defines the `TyCtxt` struct we mentioned before. -[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html +[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.Ty.html ### Parallelism @@ -360,8 +360,8 @@ bootstrapping compiler will support them. - These two functions can't be decoupled. - The Mid Level Intermediate Representation (MIR) - Guide: [The MIR (Mid level IR)](https://rustc-dev-guide.rust-lang.org/mir/index.html) - - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) - - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) + - Definition: [`librustc_middle/mir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html) + - Definition of source that manipulates the MIR: [`librustc_mir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/index.html) - The Borrow Checker - Guide: [MIR Borrow Check](https://rustc-dev-guide.rust-lang.org/borrow_check.html) - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) @@ -371,7 +371,7 @@ bootstrapping compiler will support them. - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) - Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/fn.optimized_mir.html) - Code Generation - - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/codegen.html) + - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/backend/codegen.html) - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) - This monomorphizes and produces LLVM IR for one codegen unit. It then starts a background thread to run LLVM, which must be joined later. From 2b587b08beb119415502e79c4fae6adabfdc047e Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Tue, 7 Apr 2020 00:27:39 -0400 Subject: [PATCH 33/38] [overview.md] add initial parser documentation --- src/overview.md | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/src/overview.md b/src/overview.md index 69e71723e..94ca51048 100644 --- a/src/overview.md +++ b/src/overview.md @@ -34,8 +34,33 @@ we'll talk about that later. [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols (_interning_ is discussed later). -- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream - of tokens][parser] to build an Abstract Syntax Tree (AST). +- The lexer has a small interface and doesn't depend directly on the + diagnostic infrastructure in `rustc`. Instead it provides diagnostics as plain + data which are emitted in `librustc_parse::lexer::mod` as real diagnostics. +- The lexer preseves full fidelity information for both IDEs and proc macros. +- The parser [translates the token stream from the lexer into an Abstract Syntax + Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax + analysis. The crate entry points for the parser are the `Parser.parse_crate_mod()` and + `Parser.parse_mod()` methods found in `librustc_parse::parser::item`. The external + module parsing entry point is `librustc_expand::module::parse_external_mod`. And + the macro parser entry point is `rustc_expand::mbe::macro_parser::parse_nt`. +- Parsing is performed with a set of `Parser` utility methods including `fn bump`, + `fn check`, `fn eat`, `fn expect`, `fn look_ahead`. +- Parsing is organized by the semantic construct that is being parsed. Separate + `parse_*` methods can be found in `librustc_parse` `parser` directory. File + naming follows the construct name. For example, the following files are found + in the parser: + - `expr.rs` + - `pat.rs` + - `ty.rs` + - `stmt.rs` +- This naming scheme is used across the parser, lowering, type checking, + HAIR lowering, & MIR building stages of the compile process and you will + find either a file or directory with the same name for most of these constructs + at each of these stages of compilation. +- For error handling, the parser uses the standard `DiagnosticBuilder` API, but we + try to recover, parsing a superset of Rust's grammar, while also emitting an error. +- The `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST node returned from the parser. - macro expansion (**TODO** chrissimpkins) - ast validation (**TODO** chrissimpkins) - nameres (**TODO** chrissimpkins) From d023ac7c164963fff32b739976726d9bf8e6242c Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Tue, 7 Apr 2020 22:19:57 -0400 Subject: [PATCH 34/38] [overview.md] add lexer updates, parser updates includes feedback from matklad (lexer) and centril (parser) --- src/overview.md | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/src/overview.md b/src/overview.md index 94ca51048..3391952c7 100644 --- a/src/overview.md +++ b/src/overview.md @@ -28,8 +28,8 @@ we'll talk about that later. to the rest of the compilation process as a [`rustc_interface::Config`]. - The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of - atomic source code units known as _tokens_. The lexer supports the Unicode - character encoding. + atomic source code units known as _tokens_. The lexer supports the + Unicode character encoding. - The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations @@ -47,25 +47,21 @@ we'll talk about that later. - Parsing is performed with a set of `Parser` utility methods including `fn bump`, `fn check`, `fn eat`, `fn expect`, `fn look_ahead`. - Parsing is organized by the semantic construct that is being parsed. Separate - `parse_*` methods can be found in `librustc_parse` `parser` directory. File - naming follows the construct name. For example, the following files are found + `parse_*` methods can be found in `librustc_parse` `parser` directory. The source + file name follows the construct name. For example, the following files are found in the parser: - `expr.rs` - `pat.rs` - `ty.rs` - `stmt.rs` -- This naming scheme is used across the parser, lowering, type checking, - HAIR lowering, & MIR building stages of the compile process and you will - find either a file or directory with the same name for most of these constructs - at each of these stages of compilation. -- For error handling, the parser uses the standard `DiagnosticBuilder` API, but we +- This naming scheme is used across many compiler stages. You will find + either a file or directory with the same name across the parsing, lowering, + type checking, HAIR lowering, and MIR building sources. +- Macro expansion, AST validation, name resolution, and early linting takes place + during this stage of the compile process. +- The parser uses the standard `DiagnosticBuilder` API for error handling, but we try to recover, parsing a superset of Rust's grammar, while also emitting an error. -- The `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST node returned from the parser. - - macro expansion (**TODO** chrissimpkins) - - ast validation (**TODO** chrissimpkins) - - nameres (**TODO** chrissimpkins) - - early linting (**TODO** chrissimpkins) - +- `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST nodes are returned from the parser. - We then take the AST and [convert it to High-Level Intermediate Representation (HIR)][hir]. This is a compiler-friendly representation of the AST. This involves a lot of desugaring of things like loops and `async fn`. From ce3ff5842552f8141a69d6d75856a367f86214a6 Mon Sep 17 00:00:00 2001 From: mark Date: Tue, 14 Apr 2020 17:40:10 -0500 Subject: [PATCH 35/38] line length --- src/overview.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/overview.md b/src/overview.md index 3391952c7..bdebd86e4 100644 --- a/src/overview.md +++ b/src/overview.md @@ -395,5 +395,6 @@ bootstrapping compiler will support them. - Guide: [Code Generation](https://rustc-dev-guide.rust-lang.org/backend/codegen.html) - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) - - This monomorphizes and produces LLVM IR for one codegen unit. It then starts a background thread to run LLVM, which must be joined later. + - This monomorphizes and produces LLVM IR for one codegen unit. It then + starts a background thread to run LLVM, which must be joined later. - Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) and [`rustc_codegen_ssa::base::codegen_instance `](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_instance.html) From e5f427d78eee0e14c850043048bed46b0f521039 Mon Sep 17 00:00:00 2001 From: Who? Me?! Date: Fri, 17 Apr 2020 21:14:11 -0500 Subject: [PATCH 36/38] Remove various todos With some items added to https://github.com/rust-lang/rustc-dev-guide/issues/674 Co-Authored-By: Chris Simpkins --- src/overview.md | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/src/overview.md b/src/overview.md index bdebd86e4..abe35188f 100644 --- a/src/overview.md +++ b/src/overview.md @@ -15,7 +15,6 @@ So first, let's look at what the compiler does to your code. For now, we will avoid mentioning how the compiler implements these steps except as needed; we'll talk about that later. -**TODO: Would be great to have a diagram of this once we nail down the details...** - The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to @@ -74,10 +73,9 @@ we'll talk about that later. HAIR is used for pattern and exhaustiveness checking. It is also more convenient to convert into MIR than HIR is. - The MIR is used for [borrow checking]. -- **TODO: const eval fits in somewhere here I think** - We (want to) do [many optimizations on the MIR][mir-opt] because it is still generic and that improves the code we generate later, improving compilation - speed too. (**TODO: size optimizations too?**) + speed too. - MIR is a higher level (and generic) representation, so it is easier to do some optimizations at MIR level than at LLVM-IR level. For example LLVM doesn't seem to be able to optimize the pattern the [`simplify_try`] mir @@ -96,9 +94,8 @@ we'll talk about that later. - The LLVM IR is passed to LLVM, which does a lot more optimizations on it. It then emits machine code. It is basically assembly code with additional low-level types and annotations added. (e.g. an ELF object or wasm). - **TODO: reference for this section?** - The different libraries/binaries are linked together to produce the final - binary. **TODO: reference for this section?** + binary. [`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html [`librustc_driver`]: https://rustc-dev-guide.rust-lang.org/rustc-driver.html @@ -138,13 +135,13 @@ satisfy/optimize for. For example, point. - Compiler compilation speed: how long does it take to compile the compiler? This impacts contributors and compiler maintenance. -- Compiler implementation complexity: building a compiler is one of the hardest +- Implementation complexity: building a compiler is one of the hardest things a person/group can do, and Rust is not a very simple language, so how do we make the compiler's code base manageable? - Compiler correctness: the binaries produced by the compiler should do what the input programs says they do, and should continue to do so despite the tremendous amount of change constantly going on. -- Compiler integration: a number of other tools need to use the compiler in +- Integration: a number of other tools need to use the compiler in various ways (e.g. cargo, clippy, miri, RLS) that must be supported. - Compiler stability: the compiler should not crash or fail ungracefully on the stable channel. @@ -283,7 +280,7 @@ program) is [`rustc::ty::Ty`][ty]. This is so important that we have a whole cha on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is the way `rustc` represents types! -Oh, and also the `rustc::ty` module defines the `TyCtxt` struct we mentioned before. +Also note that the `rustc::ty` module defines the `TyCtxt` struct we mentioned before. [ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.Ty.html @@ -332,7 +329,6 @@ bootstrapping compiler will support them. # Unresolved Questions -**TODO: find answers to these** - Does LLVM ever do optimizations in debug builds? - How do I explore phases of the compile process in my own sources (lexer, From 83105be397d924dc80a4a88eb50caa768907c4df Mon Sep 17 00:00:00 2001 From: mark Date: Fri, 17 Apr 2020 21:19:25 -0500 Subject: [PATCH 37/38] add link to intrinsic --- src/overview.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/overview.md b/src/overview.md index abe35188f..e08f012f3 100644 --- a/src/overview.md +++ b/src/overview.md @@ -323,9 +323,11 @@ Bootstrapping also has implications for when features are usable in the compiler itself. The build system uses the current beta compiler to build the stage-1 bootstrapping compiler. This means that the compiler source code can't use some features until they reach beta (because otherwise the beta compiler -doesn't support them). On the other hand, for compiler intrinsics and internal -features, we may be able to use them immediately because the stage-1 -bootstrapping compiler will support them. +doesn't support them). On the other hand, for [compiler intrinsics][intrinsics] +and internal features, we may be able to use them immediately because the +stage-1 bootstrapping compiler will support them. + +[intrinsics]: ./appendix/glossary.md#intrinsic # Unresolved Questions From 2d53282f3efeca399695186bda312152346638c0 Mon Sep 17 00:00:00 2001 From: Yuki Okushi Date: Sat, 18 Apr 2020 11:54:09 +0900 Subject: [PATCH 38/38] Apply suggestions from code review Co-Authored-By: Chris Simpkins --- src/overview.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/overview.md b/src/overview.md index e08f012f3..f2b360013 100644 --- a/src/overview.md +++ b/src/overview.md @@ -36,7 +36,7 @@ we'll talk about that later. - The lexer has a small interface and doesn't depend directly on the diagnostic infrastructure in `rustc`. Instead it provides diagnostics as plain data which are emitted in `librustc_parse::lexer::mod` as real diagnostics. -- The lexer preseves full fidelity information for both IDEs and proc macros. +- The lexer preserves full fidelity information for both IDEs and proc macros. - The parser [translates the token stream from the lexer into an Abstract Syntax Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax analysis. The crate entry points for the parser are the `Parser.parse_crate_mod()` and @@ -145,7 +145,7 @@ satisfy/optimize for. For example, various ways (e.g. cargo, clippy, miri, RLS) that must be supported. - Compiler stability: the compiler should not crash or fail ungracefully on the stable channel. -- Rust stability: the compiler must respect rust's stability guarantees by not +- Rust stability: the compiler must respect Rust's stability guarantees by not breaking programs that previously compiled despite the many changes that are always going on to its implementation. - Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some @@ -299,7 +299,7 @@ together into one binary. However, the rest of the compiler is still not yet parallel. There have been lots of efforts spent on this, but it is generally a hard problem. The current -approach is (**TODO: verify**) to turn `RefCell`s into `Mutex`s -- that is, we +approach is to turn `RefCell`s into `Mutex`s -- that is, we switch to thread-safe internal mutability. However, there are ongoing challenges with lock contention, maintaining query-system invariants under concurrency, and the complexity of the code base. One can try out the current