Skip to content

Commit 62cd099

Browse files
committed
Add some documentation for const eval and related topics
1 parent b9bc44d commit 62cd099

File tree

6 files changed

+216
-1
lines changed

6 files changed

+216
-1
lines changed

Diff for: src/SUMMARY.md

+3
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,8 @@
2222
- [MIR construction](./mir-construction.md)
2323
- [MIR borrowck](./mir-borrowck.md)
2424
- [MIR optimizations](./mir-optimizations.md)
25+
- [Constant evaluation](./const-eval.md)
26+
- [miri const evaluator](./miri.md)
27+
- [Parameter Environments](./param_env.md)
2528
- [Generating LLVM IR](./trans.md)
2629
- [Glossary](./glossary.md)

Diff for: src/const-eval.md

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Constant Evaluation
2+
3+
Constant evaluation is the process of computing values at compile time. For a
4+
specific item (constant/static/array length) this happens after the MIR for the
5+
item is borrow-checked and optimized. In many cases trying to const evaluate an
6+
item will trigger the computation of its MIR for the first time.
7+
8+
Prominent examples are
9+
10+
* The initializer of a `static`
11+
* Array length
12+
* needs to be known to reserve stack or heap space
13+
* Enum variant discriminants
14+
* needs to be known to prevent two variants from having the same discriminant
15+
* Patterns
16+
* need to be known to check for overlapping patterns
17+
18+
Additionally constant evaluation can be used to reduce the workload or binary
19+
size at runtime by precomputing complex operations at compiletime and only
20+
storing the result.
21+
22+
Constant evaluation can be done by calling the `const_eval` query of `TyCtxt`.
23+
24+
The `const_eval` query takes a [`ParamEnv`](./param_env.html) of environment in
25+
which the constant is evaluated (e.g. the function within which the constant is
26+
used) and a `GlobalId`. The `GlobalId` is made up of an
27+
`Instance` referring to a constant or static or of an
28+
`Instance` of a function and an index into the function's `Promoted` table.
29+
30+
Constant evaluation returns a `Result` with either the error, or the simplest
31+
representation of the constant. "simplest" meaning if it is representable as an
32+
integer or fat pointer, it will directly yield the value (via `Value::ByVal` or
33+
`Value::ByValPair`), instead of referring to the [`miri`](./miri.html) virtual
34+
memory allocation (via `Value::ByRef`). This means that the `const_eval`
35+
function cannot be used to create miri-pointers to the evaluated constant or
36+
static. If you need that, you need to directly work with the functions in
37+
[src/librustc_mir/interpret/const_eval.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/const_eval.rs).

Diff for: src/glossary.md

+3
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@ generics | the set of generic type parameters defined on a type
1818
ICE | internal compiler error. When the compiler crashes.
1919
ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
2020
infcx | the inference context (see `librustc/infer`)
21+
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
22+
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
23+
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
2124
local crate | the crate currently being compiled.
2225
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
2326
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.

Diff for: src/miri.md

+142
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Miri
2+
3+
Miri (**MIR** **I**nterpreter) is a virtual machine for executing MIR without
4+
compiling to machine code. It is usually invoked via `tcx.const_eval`.
5+
6+
If you start out with a constant
7+
8+
```rust
9+
const FOO: usize = 1 << 12;
10+
```
11+
12+
rustc doesn't actually invoke anything until the constant is either used or
13+
placed into metadata.
14+
15+
Once you have a use-site like
16+
17+
```rust
18+
type Foo = [u8; FOO - 42];
19+
```
20+
21+
The compiler needs to figure out the length of the array before being able to
22+
create items that use the type (locals, constants, function arguments, ...).
23+
24+
To obtain the (in this case empty) parameter environment, one can call
25+
`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is
26+
27+
```rust
28+
let gid = GlobalId {
29+
promoted: None,
30+
instance: Instance::mono(length_def_id),
31+
};
32+
```
33+
34+
Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of
35+
the MIR of the array length expression. The MIR will look something like this:
36+
37+
```mir
38+
const Foo::{{initializer}}: usize = {
39+
let mut _0: usize; // return pointer
40+
let mut _1: (usize, bool);
41+
42+
bb0: {
43+
_1 = CheckedSub(const Unevaluated(FOO, Slice([])), const 42usize);
44+
assert(!(_1.1: bool), "attempt to subtract with overflow") -> bb1;
45+
}
46+
47+
bb1: {
48+
_0 = (_1.0: usize);
49+
return;
50+
}
51+
}
52+
```
53+
54+
Before the evaluation, a virtual memory location (in this case essentially a
55+
`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.
56+
57+
At the start of the evaluation, `_0` and `_1` are
58+
`Value::ByVal(PrimVal::Undef)`. When the initialization of `_1` is invoked, the
59+
value of the `FOO` constant is required, and triggers another call to
60+
`tcx.const_eval`, which will not be shown here. If the evaluation of FOO is
61+
successful, 42 will be subtracted by its value `4096` and the result stored in
62+
`_1` as `Value::ByValPair(PrimVal::Bytes(4054), PrimVal::Bytes(0))`. The first
63+
part of the pair is the computed value, the second part is a bool that's true if
64+
an overflow happened.
65+
66+
The next statement asserts that said boolean is `0`. In case the assertion
67+
fails, its error message is used for reporting a compile-time error.
68+
69+
Since it does not fail, `Value::ByVal(PrimVal::Bytes(4054))` is stored in the
70+
virtual memory was allocated before the evaluation. `_0` always refers to that
71+
location directly.
72+
73+
After the evaluation is done, the virtual memory allocation is interned into the
74+
`TyCtxt`. Future evaluations of the same constants will not actually invoke
75+
miri, but just extract the value from the interned allocation.
76+
77+
The `tcx.const_eval` function has one additional feature: it will not return a
78+
`ByRef(interned_allocation_id)`, but a `ByVal(computed_value)` if possible. This
79+
makes using the result much more convenient, as no further queries need to be
80+
executed in order to get at something as simple as a `usize`.
81+
82+
## Datastructures
83+
84+
Miri's core datastructures can be found in
85+
[librustc/mir/interpret](https://github.com/rust-lang/rust/blob/master/src/librustc/mir/interpret).
86+
This is mainly the error enum and the `Value` and `PrimVal` types. A `Value` can
87+
be either `ByVal` (a single `PrimVal`), `ByValPair` (two `PrimVal`s, usually fat
88+
pointers or two element tuples) or `ByRef`, which is used for anything else and
89+
refers to a virtual allocation. These allocations can be accessed via the
90+
methods on `tcx.interpret_interner`.
91+
92+
If you are expecting a numeric result, you can use `unwrap_u64` (panics on
93+
anything that can't be representad as a `u64`) or `to_raw_bits` which results
94+
in an `Option<u128>` yielding the `ByVal` if possible.
95+
96+
## Allocations
97+
98+
A miri allocation is either a byte sequence of the memory or an `Instance` in
99+
the case of function pointers. Byte sequences can additionally contain
100+
relocations that mark a group of bytes as a pointer to another allocation. The
101+
actual bytes at the relocation refer to the offset inside the other allocation.
102+
103+
These allocations exist so that references and raw pointers have something to
104+
point to. There is no global linear heap in which things are allocated, but each
105+
allocation (be it for a local variable, a static or a (future) heap allocation)
106+
gets its own little memory with exactly the required size. So if you have a
107+
pointer to an allocation for a local variable `a`, there is no possible (no
108+
matter how unsafe) operation that you can do that would ever change said pointer
109+
to a pointer to `b`.
110+
111+
## Interpretation
112+
113+
Although the main entry point to constant evaluation is the `tcx.const_eval`
114+
query, there are additional functions in
115+
[librustc_mir/interpret/const_eval.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/const_eval.rs)
116+
that allow accessing the fields of a `Value` (`ByRef` or otherwise). You should
117+
never have to access an `Allocation` directly except for translating it to the
118+
compilation target (at the moment just LLVM).
119+
120+
Miri starts by creating a virtual stack frame for the current constant that is
121+
being evaluated. There's essentially no difference between a constant and a
122+
function with no arguments, except that constants do not allow local (named)
123+
variables at the time of writing this guide.
124+
125+
A stack frame is defined by the `Frame` type in
126+
[librustc_mir/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/eval_context.rs)
127+
and contains all the local
128+
variables memory (`None` at the start of evaluation). Each frame refers to the
129+
evaluation of either the root constant or subsequent calls to `const fn`. The
130+
evaluation of another constant simply calls `tcx.const_eval`, which produces an
131+
entirely new and independent stack frame.
132+
133+
The frames are just a `Vec<Frame>`, there's no way to actually refer to a
134+
`Frame`'s memory even if horrible shenigans are done via unsafe code. The only
135+
memory that can be referred to are `Allocation`s.
136+
137+
Miri now calls the `step` method (in
138+
[librustc_mir/interpret/step.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/step.rs)
139+
) until it either returns an error or has no further statements to execute. Each
140+
statement will now initialize or modify the locals or the virtual memory
141+
referred to by a local. This might require evaluating other constants or
142+
statics, which just recursively invokes `tcx.const_eval`.

Diff for: src/param_env.md

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Parameter Environment
2+
3+
When working with associated and/or or generic items (types, constants,
4+
functions/methods) it is often relevant to have more information about the
5+
`Self` or generic parameters. Trait bounds and similar information is encoded in
6+
the `ParamEnv`. Often this is not enough information to obtain things like the
7+
type's `Layout`, but you can do all kinds of other checks on it (e.g. whether a
8+
type implements `Copy`) or you can evaluate an associated constant whose value
9+
does not depend on anything from the parameter environment.
10+
11+
For example if you have a function
12+
13+
```rust
14+
fn foo<T: Copy>(t: T) {
15+
}
16+
```
17+
18+
the parameter environment for that function is `[T: Copy]`. This means any
19+
evaluation within this function will, when accessing the type `T`, know about
20+
its `Copy` bound via the parameter environment.
21+
22+
Although you can obtain a valid `ParamEnv` for any item via
23+
`tcx.param_env(def_id)`, this `ParamEnv` can be too generic for your use case.
24+
Using the `ParamEnv` from the surrounding context can allow you to evaluate more
25+
things.
26+
27+
Another great thing about `ParamEnv` is that you can use it to bundle the thing
28+
depending on generic parameters (e.g. a `Ty`) by calling `param_env.and(ty)`.
29+
This will produce a `ParamEnvAnd<Ty>`, making clear that you should probably not
30+
be using the inner value without taking care to also use the `ParamEnv`.

Diff for: src/trait-resolution.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -433,7 +433,7 @@ before, and hence the cache lookup would succeed, yielding
433433
One subtle interaction is that the results of trait lookup will vary
434434
depending on what where clauses are in scope. Therefore, we actually
435435
have *two* caches, a local and a global cache. The local cache is
436-
attached to the `ParamEnv` and the global cache attached to the
436+
attached to the [`ParamEnv`](./param_env.html) and the global cache attached to the
437437
`tcx`. We use the local cache whenever the result might depend on the
438438
where clauses that are in scope. The determination of which cache to
439439
use is done by the method `pick_candidate_cache` in `select.rs`. At

0 commit comments

Comments
 (0)