Skip to content

Add some documentation for const eval and related topics #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 23, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,5 +22,8 @@
- [MIR construction](./mir-construction.md)
- [MIR borrowck](./mir-borrowck.md)
- [MIR optimizations](./mir-optimizations.md)
- [Constant evaluation](./const-eval.md)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add ./miri and ./param_env to this file?

- [miri const evaluator](./miri.md)
- [Parameter Environments](./param_env.md)
- [Generating LLVM IR](./trans.md)
- [Glossary](./glossary.md)
37 changes: 37 additions & 0 deletions src/const-eval.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Constant Evaluation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to mention where in the compiler "pipeline" const eval happens. IIUC, it is after MIR is constructed and borrowck?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the "pipeline" model is no longer entirely accurate. We care more about dependencies - and yes, miri requires MIR to be constructed to run on, but e.g. the type-checking, MIR construction and MIR evaluation of a constant might happen before any of those ever happen on a different constant/function/etc.

We have to to do all that to understand even the type [T; 8] - type-checking and working MIR is therefore reentrant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the "pipeline" model is no longer entirely accurate.

Yep, that's why I put it in "quotes" 😛 What I was getting at more was what dependencies const eval has, as you noted.


Constant evaluation is the process of computing values at compile time. For a
specific item (constant/static/array length) this happens after the MIR for the
item is borrow-checked and optimized. In many cases trying to const evaluate an
item will trigger the computation of its MIR for the first time.

Prominent examples are

* The initializer of a `static`
* Array length
* needs to be known to reserve stack or heap space
* Enum variant discriminants
* needs to be known to prevent two variants from having the same discriminant
* Patterns
* need to be known to check for overlapping patterns

Additionally constant evaluation can be used to reduce the workload or binary
size at runtime by precomputing complex operations at compiletime and only
storing the result.

Constant evaluation can be done by calling the `const_eval` query of `TyCtxt`.

The `const_eval` query takes a [`ParamEnv`](./param_env.html) of environment in
which the constant is evaluated (e.g. the function within which the constant is
used) and a `GlobalId`. The `GlobalId` is made up of an
`Instance` referring to a constant or static or of an
`Instance` of a function and an index into the function's `Promoted` table.

Constant evaluation returns a `Result` with either the error, or the simplest
representation of the constant. "simplest" meaning if it is representable as an
integer or fat pointer, it will directly yield the value (via `Value::ByVal` or
`Value::ByValPair`), instead of referring to the [`miri`](./miri.html) virtual
memory allocation (via `Value::ByRef`). This means that the `const_eval`
function cannot be used to create miri-pointers to the evaluated constant or
static. If you need that, you need to directly work with the functions in
[src/librustc_mir/interpret/const_eval.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/const_eval.rs).
3 changes: 3 additions & 0 deletions src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ generics | the set of generic type parameters defined on a type
ICE | internal compiler error. When the compiler crashes.
ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
infcx | the inference context (see `librustc/infer`)
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
local crate | the crate currently being compiled.
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
Expand Down
142 changes: 142 additions & 0 deletions src/miri.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Miri
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start!

One thing I would find helpful is an example. Something like,

const MY_CONST: usize = 1 << 12;
let x = [0; MY_CONST]

would, I think, be sufficient. Maybe a walk through of evaluating the MIR to get the length of x would be good?

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: type Foo = [u8; MY_CONST]; may add less confusion, as nesting const inside a fn body can be misleading (even if it has no semantic effects).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heck yes to an example! =) Maybe use a const fn to explain?


Miri (**MIR** **I**nterpreter) is a virtual machine for executing MIR without
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to define what the model of the virtual machine is. For example, is it a stack-machine? Or does it have a flat byte-addressable memory? This would provide helpful context when discussing allocations and virtual memory below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those two are not exclusive, as many stack machines are also RAM machines (and if you're a RAM machine, adding a stack is easy).

However, miri isn't either of those, as each virtual allocation forms its own "address space", and the stack frames have fixed shapes (determined by the MIR of the fn body/const initializer being evaluated), instead of having more general data stack manipulation primitives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the stack is implemented more as a linked list of Frame in some sense (where each Frame is created based on what you are miri-ing)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well it's a stack of Frames within one evaluation - I don't believe nested Frames are ever created except for const fn calls - if you need the value of a different const, it would be evaluated separately, with an entirely different stack.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it :)

It would be good to add this info to the chapter.

compiling to machine code. It is usually invoked via `tcx.const_eval`.

If you start out with a constant

```rust
const FOO: usize = 1 << 12;
```

rustc doesn't actually invoke anything until the constant is either used or
placed into metadata.

Once you have a use-site like

```rust
type Foo = [u8; FOO - 42];
```

The compiler needs to figure out the length of the array before being able to
create items that use the type (locals, constants, function arguments, ...).

To obtain the (in this case empty) parameter environment, one can call
`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is

```rust
let gid = GlobalId {
promoted: None,
instance: Instance::mono(length_def_id),
};
```

Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of
the MIR of the array length expression. The MIR will look something like this:

```mir
const Foo::{{initializer}}: usize = {
let mut _0: usize; // return pointer
let mut _1: (usize, bool);

bb0: {
_1 = CheckedSub(const Unevaluated(FOO, Slice([])), const 42usize);
assert(!(_1.1: bool), "attempt to subtract with overflow") -> bb1;
}

bb1: {
_0 = (_1.0: usize);
return;
}
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... is the exact MIR likely to change over time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already simplified, and unlikely to change much or in relevant ways.


Before the evaluation, a virtual memory location (in this case essentially a
`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.

At the start of the evaluation, `_0` and `_1` are
`Value::ByVal(PrimVal::Undef)`. When the initialization of `_1` is invoked, the
value of the `FOO` constant is required, and triggers another call to
`tcx.const_eval`, which will not be shown here. If the evaluation of FOO is
successful, 42 will be subtracted by its value `4096` and the result stored in
`_1` as `Value::ByValPair(PrimVal::Bytes(4054), PrimVal::Bytes(0))`. The first
part of the pair is the computed value, the second part is a bool that's true if
an overflow happened.

The next statement asserts that said boolean is `0`. In case the assertion
fails, its error message is used for reporting a compile-time error.

Since it does not fail, `Value::ByVal(PrimVal::Bytes(4054))` is stored in the
virtual memory was allocated before the evaluation. `_0` always refers to that
location directly.

After the evaluation is done, the virtual memory allocation is interned into the
`TyCtxt`. Future evaluations of the same constants will not actually invoke
miri, but just extract the value from the interned allocation.

The `tcx.const_eval` function has one additional feature: it will not return a
`ByRef(interned_allocation_id)`, but a `ByVal(computed_value)` if possible. This
makes using the result much more convenient, as no further queries need to be
executed in order to get at something as simple as a `usize`.

## Datastructures

Miri's core datastructures can be found in
[librustc/mir/interpret](https://github.com/rust-lang/rust/blob/master/src/librustc/mir/interpret).
This is mainly the error enum and the `Value` and `PrimVal` types. A `Value` can
be either `ByVal` (a single `PrimVal`), `ByValPair` (two `PrimVal`s, usually fat
pointers or two element tuples) or `ByRef`, which is used for anything else and
refers to a virtual allocation. These allocations can be accessed via the
methods on `tcx.interpret_interner`.

If you are expecting a numeric result, you can use `unwrap_u64` (panics on
anything that can't be representad as a `u64`) or `to_raw_bits` which results
in an `Option<u128>` yielding the `ByVal` if possible.

## Allocations

A miri allocation is either a byte sequence of the memory or an `Instance` in
the case of function pointers. Byte sequences can additionally contain
relocations that mark a group of bytes as a pointer to another allocation. The
actual bytes at the relocation refer to the offset inside the other allocation.

These allocations exist so that references and raw pointers have something to
point to. There is no global linear heap in which things are allocated, but each
allocation (be it for a local variable, a static or a (future) heap allocation)
gets its own little memory with exactly the required size. So if you have a
pointer to an allocation for a local variable `a`, there is no possible (no
matter how unsafe) operation that you can do that would ever change said pointer
to a pointer to `b`.

## Interpretation

Although the main entry point to constant evaluation is the `tcx.const_eval`
query, there are additional functions in
[librustc_mir/interpret/const_eval.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/const_eval.rs)
that allow accessing the fields of a `Value` (`ByRef` or otherwise). You should
never have to access an `Allocation` directly except for translating it to the
compilation target (at the moment just LLVM).

Miri starts by creating a virtual stack frame for the current constant that is
being evaluated. There's essentially no difference between a constant and a
function with no arguments, except that constants do not allow local (named)
variables at the time of writing this guide.

A stack frame is defined by the `Frame` type in
[librustc_mir/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/eval_context.rs)
and contains all the local
variables memory (`None` at the start of evaluation). Each frame refers to the
evaluation of either the root constant or subsequent calls to `const fn`. The
evaluation of another constant simply calls `tcx.const_eval`, which produces an
entirely new and independent stack frame.

The frames are just a `Vec<Frame>`, there's no way to actually refer to a
`Frame`'s memory even if horrible shenigans are done via unsafe code. The only
memory that can be referred to are `Allocation`s.

Miri now calls the `step` method (in
[librustc_mir/interpret/step.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/step.rs)
) until it either returns an error or has no further statements to execute. Each
statement will now initialize or modify the locals or the virtual memory
referred to by a local. This might require evaluating other constants or
statics, which just recursively invokes `tcx.const_eval`.
30 changes: 30 additions & 0 deletions src/param_env.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Parameter Environment

When working with associated and/or or generic items (types, constants,
functions/methods) it is often relevant to have more information about the
`Self` or generic parameters. Trait bounds and similar information is encoded in
the `ParamEnv`. Often this is not enough information to obtain things like the
type's `Layout`, but you can do all kinds of other checks on it (e.g. whether a
type implements `Copy`) or you can evaluate an associated constant whose value
does not depend on anything from the parameter environment.

For example if you have a function

```rust
fn foo<T: Copy>(t: T) {
}
```

the parameter environment for that function is `[T: Copy]`. This means any
evaluation within this function will, when accessing the type `T`, know about
its `Copy` bound via the parameter environment.

Although you can obtain a valid `ParamEnv` for any item via
`tcx.param_env(def_id)`, this `ParamEnv` can be too generic for your use case.
Using the `ParamEnv` from the surrounding context can allow you to evaluate more
things.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence. It seems like you should use the param-env that is correct, and no other. =) Maybe the question is more about determining what is correct? Can you give a specific example of what you had in mind here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The place I ran into issues with this was when evaluating promoted constants. If I ran tcx.param_env(promoted_const_def_id) and then evaluated the constant, I got type resolution problems left and right. If I used tcx.param_env(function_def_id) during the evaluation of the promoted, everything was good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promoted_const_def_id doesn't exist though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... maybe it was associated constants and not promoteds.

I'm definitely using the param_env of the surrounding function when evaluating constants, not just when calling Instance::resolve.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule is that you use the ParamEnv for the body you're monomorphizing. So a function referring to a constant or anything else, will use whatever is in the MIR, substituted with the Substs and the ParamEnv (if needed) for the function.

That is, if you got the MIR for a def_id, typesystem things you get from that MIR are valid wrt tcx.param_env(def_id).subst(tcx, substs), after applying .subst(tcx, substs).


Another great thing about `ParamEnv` is that you can use it to bundle the thing
depending on generic parameters (e.g. a `Ty`) by calling `param_env.and(ty)`.
This will produce a `ParamEnvAnd<Ty>`, making clear that you should probably not
be using the inner value without taking care to also use the `ParamEnv`.
2 changes: 1 addition & 1 deletion src/trait-resolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ before, and hence the cache lookup would succeed, yielding
One subtle interaction is that the results of trait lookup will vary
depending on what where clauses are in scope. Therefore, we actually
have *two* caches, a local and a global cache. The local cache is
attached to the `ParamEnv` and the global cache attached to the
attached to the [`ParamEnv`](./param_env.html) and the global cache attached to the
`tcx`. We use the local cache whenever the result might depend on the
where clauses that are in scope. The determination of which cache to
use is done by the method `pick_candidate_cache` in `select.rs`. At
Expand Down