@@ -50,7 +50,7 @@ we'll talk about that later.
50
50
doesn't seem to be able to optimize the pattern the [ ` simplify_try ` ] mir
51
51
opt looks for.
52
52
- Rust code is _ monomorphized_ , which means making copies of all the generic
53
- code with the type parameters replaced by concrete types. In order to do
53
+ code with the type parameters replaced by concrete types. To do
54
54
this, we need to collect a list of what concrete types to generate code for.
55
55
This is called _ monomorphization collection_ .
56
56
- We then begin what is vaguely called _ code generation_ or _ codegen_ .
@@ -105,7 +105,7 @@ satisfy/optimize for. For example,
105
105
- Compiler compilation speed: how long does it take to compile the compiler?
106
106
This impacts contributors and compiler maintenance.
107
107
- Compiler implementation complexity: building a compiler is one of the hardest
108
- things a person/group can do, and rust is not a very simple language, so how
108
+ things a person/group can do, and Rust is not a very simple language, so how
109
109
do we make the compiler's code base manageable?
110
110
- Compiler correctness: the binaries produced by the compiler should do what
111
111
the input programs says they do, and should continue to do so despite the
@@ -119,14 +119,13 @@ satisfy/optimize for. For example,
119
119
always going on to its implementation.
120
120
- Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some
121
121
strengths we leverage and some limitations/weaknesses we need to work around.
122
- - And others that I'm probably forgetting.
123
122
124
123
So, as you read through the rest of the guide, keep these things in mind. They
125
124
will often inform decisions that we make.
126
125
127
126
### Constant change
128
127
129
- One thing to keep in mind is that ` rustc ` is a real production-quality product.
128
+ Keep in mind that ` rustc ` is a real production-quality product.
130
129
As such, it has its fair share of codebase churn and technical debt. A lot of
131
130
the designs discussed throughout this guide are idealized designs that are not
132
131
fully realized yet. And things keep changing so that it is hard to keep this
@@ -139,19 +138,19 @@ to keep up with the requirements above.
139
138
140
139
As with most compilers, ` rustc ` uses some intermediate representations (IRs) to
141
140
facilitate computations. In general, working directly with the source code is
142
- extremely inconvenient. Source code is designed to be human-friendly while at
141
+ extremely inconvenient and error-prone . Source code is designed to be human-friendly while at
143
142
the same time being unambiguous, but it's less convenient for doing something
144
143
like, say, type checking.
145
144
146
145
Instead most compilers, including ` rustc ` , build some sort of IR out of the
147
146
source code which is easier to analyze. ` rustc ` has a few IRs, each optimized
148
- for different things :
147
+ for different purposes :
149
148
150
149
- Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream
151
150
of tokens produced by the lexer directly from the source code. It represents
152
151
pretty much exactly what the user wrote. It helps to do some syntactic sanity
153
152
checking (e.g. checking that a type is expected where the user wrote one).
154
- - High-level IR (HIR): This is a sort of very desugared AST. It's still close
153
+ - High-level IR (HIR): This is a sort of desugared AST. It's still close
155
154
to what the user wrote syntactically, but it includes some implicit things
156
155
such as some elided lifetimes, etc. This IR is amenable to type checking.
157
156
- HAIR: This is an intermediate between HIR and MIR. This only exists to make
@@ -160,13 +159,13 @@ for different things:
160
159
is a type of diagram that shows the basic blocks of a program and how control
161
160
flow can go between them. Likewise, MIR also has a bunch of basic blocks with
162
161
simple typed statements inside them (e.g. assignment, simple computations,
163
- dropping values, etc). MIR is used for borrow checking and a bunch of other
162
+ dropping values, etc). MIR is used for borrow checking and other
164
163
important dataflow based checks, such as checking for uninitialized values.
165
164
It is also used for a bunch of optimizations and for constant evaluation (via
166
165
MIRI). Because MIR is still generic, we can do a lot of analyses here more
167
166
efficiently than after monomorphization.
168
167
- LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR
169
- is basically a sort of typed assembly language with lots of annotations. It's
168
+ is a sort of typed assembly language with lots of annotations. It's
170
169
a standard format that is used by all compilers that use LLVM (e.g. the clang
171
170
C compiler also outputs LLVM IR). LLVM IR is designed to be easy for other
172
171
compilers to emit and also rich enough for LLVM to run a bunch of
@@ -181,9 +180,9 @@ compiler does this to make incremental compilation possible -- that is, if the
181
180
user makes a change to their program and recompiles, we want to do as little
182
181
redundant work as possible to produce the new binary.
183
182
184
- In rustc, all the major steps above are organized as a bunch of queries that
183
+ In ` rustc ` , all the major steps above are organized as a bunch of queries that
185
184
call each other. For example, there is a query to ask for the type of something
186
- and another to ask for the optimized MIR of a function, and so on . These
185
+ and another to ask for the optimized MIR of a function. These
187
186
queries can call each other and are all tracked through the query system, and
188
187
the results of the queries are cached on disk so that we can tell which
189
188
queries' results changed from the last compilation and only redo those. This is
@@ -209,7 +208,7 @@ to remain to ensure that unreachable functions still have their errors emitted.
209
208
210
209
Moreover, the compiler wasn't originally built to use a query system; the query
211
210
system has been retrofitted into the compiler, so parts of it are not
212
- query-fied yet. Also, LLVM isn't our code, so obviously that isn't querified
211
+ query-fied yet. Also, LLVM isn't our code, so that isn't querified
213
212
either. The plan is to eventually query-fy all of the steps listed in the
214
213
previous section, but as of this writing, only the steps between HIR and
215
214
LLVM-IR are query-fied. That is, lexing and parsing are done all at once for
@@ -239,8 +238,8 @@ Oh, and also the `rustc::ty` module defines the `TyCtxt` struct we mentioned bef
239
238
240
239
### Parallelism
241
240
242
- Compiler performance is a problem that we would very much like to improve on
243
- (and are always working on). One aspect of that is attempting to parallelize
241
+ Compiler performance is a problem that we would like to improve on
242
+ (and are always working on). One aspect of that is parallelizing
244
243
` rustc ` itself.
245
244
246
245
Currently, there is only one part of rustc that is already parallel: codegen.
0 commit comments