Skip to content

Commit 95bab3e

Browse files
committed
rework the MIR intro section, breaking out passes and visitors
1 parent 5803d3d commit 95bab3e

9 files changed

+1121
-79
lines changed

src/SUMMARY.md

+4
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,15 @@
2424
- [Type checking](./type-checking.md)
2525
- [The MIR (Mid-level IR)](./mir.md)
2626
- [MIR construction](./mir-construction.md)
27+
- [MIR visitor](./mir-visitor.md)
28+
- [MIR passes: getting the MIR for a function](./mir-passes.md)
2729
- [MIR borrowck](./mir-borrowck.md)
30+
- [MIR-based region checking (NLL)](./mir-regionck.md)
2831
- [MIR optimizations](./mir-optimizations.md)
2932
- [Constant evaluation](./const-eval.md)
3033
- [miri const evaluator](./miri.md)
3134
- [Parameter Environments](./param_env.md)
3235
- [Generating LLVM IR](./trans.md)
36+
- [Background material](./background.md)
3337
- [Glossary](./glossary.md)
3438
- [Code Index](./code-index.md)

src/background.md

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Background topics
2+
3+
This section covers a numbers of common compiler terms that arise in
4+
this guide. We try to give the general definition while providing some
5+
Rust-specific context.
6+
7+
<a name=cfg>
8+
9+
## What is a control-flow graph?
10+
11+
A control-flow graph is a common term from compilers. If you've ever
12+
used a flow-chart, then the concept of a control-flow graph will be
13+
pretty familiar to you. It's a representation of your program that
14+
exposes the underlying control flow in a very clear way.
15+
16+
A control-flow graph is structured as a set of **basic blocks**
17+
connected by edges. The key idea of a basic block is that it is a set
18+
of statements that execute "together" -- that is, whenever you branch
19+
to a basic block, you start at the first statement and then execute
20+
all the remainder. Only at the end of the is there the possibility of
21+
branching to more than one place (in MIR, we call that final statement
22+
the **terminator**):
23+
24+
```
25+
bb0: {
26+
statement0;
27+
statement1;
28+
statement2;
29+
...
30+
terminator;
31+
}
32+
```
33+
34+
Many expressions that you are used to in Rust compile down to multiple
35+
basic blocks. For example, consider an if statement:
36+
37+
```rust
38+
a = 1;
39+
if some_variable {
40+
b = 1;
41+
} else {
42+
c = 1;
43+
}
44+
d = 1;
45+
```
46+
47+
This would compile into four basic blocks:
48+
49+
```
50+
BB0: {
51+
a = 1;
52+
if some_variable { goto BB1 } else { goto BB2 }
53+
}
54+
55+
BB1: {
56+
b = 1;
57+
goto BB3;
58+
}
59+
60+
BB2: {
61+
c = 1;
62+
goto BB3;
63+
}
64+
65+
BB3: {
66+
d = 1;
67+
...;
68+
}
69+
```
70+
71+
When using a control-flow graph, a loop simply appears as a cycle in
72+
the graph, and the `break` keyword translates into a path out of that
73+
cycle.
74+
75+
<a name=dataflow>
76+
77+
## What is a dataflow analysis?
78+
79+
*to be written*
80+
81+
<a name=quantified>
82+
83+
## What is "universally quantified"? What about "existentially quantified"?
84+
85+
*to be written*
86+
87+
<a name=variance>
88+
89+
## What is co- and contra-variance?
90+
91+
*to be written*
92+
93+
<a name=free-vs-bound>
94+
95+
## What is a "free region" or a "free variable"? What about "bound region"?
96+
97+
Let's describe the concepts of free vs bound in terms of program
98+
variables, since that's the thing we're most familiar with.
99+
100+
- Consider this expression: `a + b`. In this expression, `a` and `b`
101+
refer to local variables that are defined *outside* of the
102+
expression. We say that those variables **appear free** in the
103+
expression. To see why this term makes sense, consider the next
104+
example.
105+
- In contrast, consider this expression, which creates a closure: `|a,
106+
b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments
107+
that the closure will be given when it is called. We say that the
108+
`a` and `b` there are **bound** to the closure, and that the closure
109+
signature `|a, b|` is a **binder** for the names `a` and `b`
110+
(because any references to `a` or `b` within refer to the variables
111+
that it introduces).
112+
113+
So there you have it: a variable "appears free" in some
114+
expression/statement/whatever if it refers to something defined
115+
outside of that expressions/statement/whatever. Equivalently, we can
116+
then refer to the "free variables" of an expression -- which is just
117+
the set of variables that "appear free".
118+
119+
So what does this have to do with regions? Well, we can apply the
120+
analogous concept to type and regions. For example, in the type `&'a
121+
u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it
122+
does not.

src/glossary.md

+2
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,15 @@ HirId | identifies a particular node in the HIR by combining
1818
HIR Map | The HIR map, accessible via tcx.hir, allows you to quickly navigate the HIR and convert between various forms of identifiers.
1919
ICE | internal compiler error. When the compiler crashes.
2020
ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
21+
inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents value you are trying to find. Think of `X` in algebra.
2122
infcx | the inference context (see `librustc/infer`)
2223
IR | Intermediate Representation. A general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it.
2324
local crate | the crate currently being compiled.
2425
LTO | Link-Time Optimizations. A set of optimizations offered by LLVM that occur just before the final binary is linked. These include optmizations like removing functions that are never used in the final program, for example. _ThinLTO_ is a variant of LTO that aims to be a bit more scalable and efficient, but possibly sacrifices some optimizations. You may also read issues in the Rust repo about "FatLTO", which is the loving nickname given to non-Thin LTO. LLVM documentation: [here][lto] and [here][thinlto]
2526
[LLVM] | (actually not an acronym :P) an open-source compiler backend. It accepts LLVM IR and outputs native binaries. Various languages (e.g. Rust) can then implement a compiler front-end that output LLVM IR and use LLVM to compile to all the platforms LLVM supports.
2627
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
2728
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
29+
newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
2830
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
2931
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
3032
provider | the function that executes a query ([see more](query.html))

src/mir-background.md

+122
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# MIR Background topics
2+
3+
This section covers a numbers of common compiler terms that arise when
4+
talking about MIR and optimizations. We try to give the general
5+
definition while providing some Rust-specific context.
6+
7+
<a name=cfg>
8+
9+
## What is a control-flow graph?
10+
11+
A control-flow graph is a common term from compilers. If you've ever
12+
used a flow-chart, then the concept of a control-flow graph will be
13+
pretty familiar to you. It's a representation of your program that
14+
exposes the underlying control flow in a very clear way.
15+
16+
A control-flow graph is structured as a set of **basic blocks**
17+
connected by edges. The key idea of a basic block is that it is a set
18+
of statements that execute "together" -- that is, whenever you branch
19+
to a basic block, you start at the first statement and then execute
20+
all the remainder. Only at the end of the is there the possibility of
21+
branching to more than one place (in MIR, we call that final statement
22+
the **terminator**):
23+
24+
```
25+
bb0: {
26+
statement0;
27+
statement1;
28+
statement2;
29+
...
30+
terminator;
31+
}
32+
```
33+
34+
Many expressions that you are used to in Rust compile down to multiple
35+
basic blocks. For example, consider an if statement:
36+
37+
```rust
38+
a = 1;
39+
if some_variable {
40+
b = 1;
41+
} else {
42+
c = 1;
43+
}
44+
d = 1;
45+
```
46+
47+
This would compile into four basic blocks:
48+
49+
```
50+
BB0: {
51+
a = 1;
52+
if some_variable { goto BB1 } else { goto BB2 }
53+
}
54+
55+
BB1: {
56+
b = 1;
57+
goto BB3;
58+
}
59+
60+
BB2: {
61+
c = 1;
62+
goto BB3;
63+
}
64+
65+
BB3: {
66+
d = 1;
67+
...;
68+
}
69+
```
70+
71+
When using a control-flow graph, a loop simply appears as a cycle in
72+
the graph, and the `break` keyword translates into a path out of that
73+
cycle.
74+
75+
<a name=dataflow>
76+
77+
## What is a dataflow analysis?
78+
79+
*to be written*
80+
81+
<a name=quantified>
82+
83+
## What is "universally quantified"? What about "existentially quantified"?
84+
85+
*to be written*
86+
87+
<a name=variance>
88+
89+
## What is co- and contra-variance?
90+
91+
*to be written*
92+
93+
<a name=free-vs-bound>
94+
95+
## What is a "free region" or a "free variable"? What about "bound region"?
96+
97+
Let's describe the concepts of free vs bound in terms of program
98+
variables, since that's the thing we're most familiar with.
99+
100+
- Consider this expression: `a + b`. In this expression, `a` and `b`
101+
refer to local variables that are defined *outside* of the
102+
expression. We say that those variables **appear free** in the
103+
expression. To see why this term makes sense, consider the next
104+
example.
105+
- In contrast, consider this expression, which creates a closure: `|a,
106+
b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments
107+
that the closure will be given when it is called. We say that the
108+
`a` and `b` there are **bound** to the closure, and that the closure
109+
signature `|a, b|` is a **binder** for the names `a` and `b`
110+
(because any references to `a` or `b` within refer to the variables
111+
that it introduces).
112+
113+
So there you have it: a variable "appears free" in some
114+
expression/statement/whatever if it refers to something defined
115+
outside of that expressions/statement/whatever. Equivalently, we can
116+
then refer to the "free variables" of an expression -- which is just
117+
the set of variables that "appear free".
118+
119+
So what does this have to do with regions? Well, we can apply the
120+
analogous concept to type and regions. For example, in the type `&'a
121+
u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it
122+
does not.

src/mir-borrowck.md

+56-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,56 @@
1-
# MIR borrowck
1+
# MIR borrow check
2+
3+
The borrow check is Rust's "secret sauce" -- it is tasked with
4+
enforcing a number of properties:
5+
6+
- That all variables are initialized before they are used.
7+
- That you can't move the same value twice.
8+
- That you can't move a value while it is borrowed.
9+
- That you can't access a place while it is mutably borrowed (except through the reference).
10+
- That you can't mutate a place while it is shared borrowed.
11+
- etc
12+
13+
At the time of this writing, the code is in a state of transition. The
14+
"main" borrow checker still works by processing [the HIR](hir.html),
15+
but that is being phased out in favor of the MIR-based borrow checker.
16+
Doing borrow checking on MIR has two key advantages:
17+
18+
- The MIR is *far* less complex than the HIR; the radical desugaring
19+
helps prevent bugs in the borrow checker. (If you're curious, you
20+
can see
21+
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
22+
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
23+
which are regions derived from the control-flow graph.
24+
25+
[47366]: https://github.com/rust-lang/rust/issues/47366
26+
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
27+
28+
### Major phases of the borrow checker
29+
30+
The borrow checker source is found in
31+
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
32+
the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate
33+
in several modes, but this text will describe only the mode when NLL is enabled
34+
(what you get with `#![feature(nll)]`).
35+
36+
[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check
37+
38+
The overall flow of the borrow checker is as follows:
39+
40+
- We first create a **local copy** C of the MIR. We will be modifying
41+
this copy in place to modify the types and things to include
42+
references to the new regions that we are computing.
43+
- We then invoke `nll::replace_regions_in_mir` to modify this copy C.
44+
Among other things, this function will replace all of the regions in
45+
the MIR with fresh [inference variables](glossary.html).
46+
- (More details can be found in [the regionck section](./mir-regionck.html).)
47+
- Next, we perform a number of [dataflow analyses](./background.html#dataflow)
48+
that compute what data is moved and when. The results of these analyses
49+
are needed to do both borrow checking and region inference.
50+
- Using the move data, we can then compute the values of all the regions in the MIR.
51+
- (More details can be found in [the NLL section](./mir-regionck.html).)
52+
- Finally, the borrow checker itself runs, taking as input (a) the
53+
results of move analysis and (b) the regions computed by the region
54+
checker. This allows is to figure out which loans are still in scope
55+
at any particular point.
56+

0 commit comments

Comments
 (0)