|
| 1 | +# Miri |
| 2 | + |
| 3 | +Miri (**MIR** **I**nterpreter) is a virtual machine for executing MIR without |
| 4 | +compiling to machine code. It is usually invoked via `tcx.const_eval`. |
| 5 | + |
| 6 | +If you start out with a constant |
| 7 | + |
| 8 | +```rust |
| 9 | +const FOO: usize = 1 << 12; |
| 10 | +``` |
| 11 | + |
| 12 | +rustc doesn't actually invoke anything until the constant is either used or |
| 13 | +placed into metadata. |
| 14 | + |
| 15 | +Once you have a use-site like |
| 16 | + |
| 17 | +```rust |
| 18 | +type Foo = [u8; FOO - 42]; |
| 19 | +``` |
| 20 | + |
| 21 | +The compiler needs to figure out the length of the array before being able to |
| 22 | +create items that use the type (locals, constants, function arguments, ...). |
| 23 | + |
| 24 | +To obtain the (in this case empty) parameter environment, one can call |
| 25 | +`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is |
| 26 | + |
| 27 | +```rust |
| 28 | +let gid = GlobalId { |
| 29 | + promoted: None, |
| 30 | + instance: Instance::mono(length_def_id), |
| 31 | +}; |
| 32 | +``` |
| 33 | + |
| 34 | +Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of |
| 35 | +the MIR of the array length expression. The MIR will look something like this: |
| 36 | + |
| 37 | +```mir |
| 38 | +const Foo::{{initializer}}: usize = { |
| 39 | + let mut _0: usize; // return pointer |
| 40 | + let mut _1: (usize, bool); |
| 41 | + |
| 42 | + bb0: { |
| 43 | + _1 = CheckedSub(const Unevaluated(FOO, Slice([])), const 42usize); |
| 44 | + assert(!(_1.1: bool), "attempt to subtract with overflow") -> bb1; |
| 45 | + } |
| 46 | + |
| 47 | + bb1: { |
| 48 | + _0 = (_1.0: usize); |
| 49 | + return; |
| 50 | + } |
| 51 | +} |
| 52 | +``` |
| 53 | + |
| 54 | +Before the evaluation, a virtual memory location (in this case essentially a |
| 55 | +`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result. |
| 56 | + |
| 57 | +At the start of the evaluation, `_0` and `_1` are |
| 58 | +`Value::ByVal(PrimVal::Undef)`. When the initialization of `_1` is invoked, the |
| 59 | +value of the `FOO` constant is required, and triggers another call to |
| 60 | +`tcx.const_eval`, which will not be shown here. If the evaluation of FOO is |
| 61 | +successful, 42 will be subtracted by its value `4096` and the result stored in |
| 62 | +`_1` as `Value::ByValPair(PrimVal::Bytes(4054), PrimVal::Bytes(0))`. The first |
| 63 | +part of the pair is the computed value, the second part is a bool that's true if |
| 64 | +an overflow happened. |
| 65 | + |
| 66 | +The next statement asserts that said boolean is `0`. In case the assertion |
| 67 | +fails, its error message is used for reporting a compile-time error. |
| 68 | + |
| 69 | +Since it does not fail, `Value::ByVal(PrimVal::Bytes(4054))` is stored in the |
| 70 | +virtual memory was allocated before the evaluation. `_0` always refers to that |
| 71 | +location directly. |
| 72 | + |
| 73 | +After the evaluation is done, the virtual memory allocation is interned into the |
| 74 | +`TyCtxt`. Future evaluations of the same constants will not actually invoke |
| 75 | +miri, but just extract the value from the interned allocation. |
| 76 | + |
| 77 | +The `tcx.const_eval` function has one additional feature: it will not return a |
| 78 | +`ByRef(interned_allocation_id)`, but a `ByVal(computed_value)` if possible. This |
| 79 | +makes using the result much more convenient, as no further queries need to be |
| 80 | +executed in order to get at something as simple as a `usize`. |
| 81 | + |
| 82 | +## Datastructures |
| 83 | + |
| 84 | +Miri's core datastructures can be found in |
| 85 | +[librustc/mir/interpret](https://github.com/rust-lang/rust/blob/master/src/librustc/mir/interpret). |
| 86 | +This is mainly the error enum and the `Value` and `PrimVal` types. A `Value` can |
| 87 | +be either `ByVal` (a single `PrimVal`), `ByValPair` (two `PrimVal`s, usually fat |
| 88 | +pointers or two element tuples) or `ByRef`, which is used for anything else and |
| 89 | +refers to a virtual allocation. These allocations can be accessed via the |
| 90 | +methods on `tcx.interpret_interner`. |
| 91 | + |
| 92 | +If you are expecting a numeric result, you can use `unwrap_u64` (panics on |
| 93 | +anything that can't be representad as a `u64`) or `to_raw_bits` which results |
| 94 | +in an `Option<u128>` yielding the `ByVal` if possible. |
| 95 | + |
| 96 | +## Allocations |
| 97 | + |
| 98 | +A miri allocation is either a byte sequence of the memory or an `Instance` in |
| 99 | +the case of function pointers. Byte sequences can additionally contain |
| 100 | +relocations that mark a group of bytes as a pointer to another allocation. The |
| 101 | +actual bytes at the relocation refer to the offset inside the other allocation. |
| 102 | + |
| 103 | +These allocations exist so that references and raw pointers have something to |
| 104 | +point to. There is no global linear heap in which things are allocated, but each |
| 105 | +allocation (be it for a local variable, a static or a (future) heap allocation) |
| 106 | +gets its own little memory with exactly the required size. So if you have a |
| 107 | +pointer to an allocation for a local variable `a`, there is no possible (no |
| 108 | +matter how unsafe) operation that you can do that would ever change said pointer |
| 109 | +to a pointer to `b`. |
| 110 | + |
| 111 | +## Interpretation |
| 112 | + |
| 113 | +Although the main entry point to constant evaluation is the `tcx.const_eval` |
| 114 | +query, there are additional functions in |
| 115 | +[librustc_mir/interpret/const_eval.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/const_eval.rs) |
| 116 | +that allow accessing the fields of a `Value` (`ByRef` or otherwise). You should |
| 117 | +never have to access an `Allocation` directly except for translating it to the |
| 118 | +compilation target (at the moment just LLVM). |
| 119 | + |
| 120 | +Miri starts by creating a virtual stack frame for the current constant that is |
| 121 | +being evaluated. There's essentially no difference between a constant and a |
| 122 | +function with no arguments, except that constants do not allow local (named) |
| 123 | +variables at the time of writing this guide. |
| 124 | + |
| 125 | +A stack frame is defined by the `Frame` type in |
| 126 | +[librustc_mir/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/eval_context.rs) |
| 127 | +and contains all the local |
| 128 | +variables memory (`None` at the start of evaluation). Each frame refers to the |
| 129 | +evaluation of either the root constant or subsequent calls to `const fn`. The |
| 130 | +evaluation of another constant simply calls `tcx.const_eval`, which produces an |
| 131 | +entirely new and independent stack frame. |
| 132 | + |
| 133 | +The frames are just a `Vec<Frame>`, there's no way to actually refer to a |
| 134 | +`Frame`'s memory even if horrible shenigans are done via unsafe code. The only |
| 135 | +memory that can be referred to are `Allocation`s. |
| 136 | + |
| 137 | +Miri now calls the `step` method (in |
| 138 | +[librustc_mir/interpret/step.rs](https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/step.rs) |
| 139 | +) until it either returns an error or has no further statements to execute. Each |
| 140 | +statement will now initialize or modify the locals or the virtual memory |
| 141 | +referred to by a local. This might require evaluating other constants or |
| 142 | +statics, which just recursively invokes `tcx.const_eval`. |
0 commit comments