|
1 |
| -# Rust Value Domain |
2 |
| - |
3 |
| -**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology. |
4 |
| - |
5 |
| -The purpose of this document is to describe what the set of *all possible values* is in Rust. |
6 |
| -This is an important definition: one key element of a Rust specification will be to define the [representation relation][representation] of every type. |
7 |
| -This relation relates values with lists of bytes: it says, for a given value and list of bytes, if that value is represented by that list. |
8 |
| -However, before we can even start specifying the relation, we have to specify the involved domains. |
9 |
| -`Byte` is defined as part of [the memory interface][memory-interface]; this document is about defining `Value`. |
10 |
| - |
11 |
| -[representation]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#representation |
12 |
| -[memory-interface]: memory-interface.md |
13 |
| - |
14 |
| -## Values |
15 |
| - |
16 |
| -The Rust value domain is described by the following (incomplete) type definition (using the `Pointer` type defined by [the memory interface][memory-interface]): |
17 |
| - |
18 |
| -```rust |
19 |
| -enum Value { |
20 |
| - /// A mathematical integer, used for `i*`/`u*` types. |
21 |
| - Int(BigInt), |
22 |
| - /// A Boolean value, used for `bool`. |
23 |
| - Bool(bool), |
24 |
| - /// A pointer value, used for (thin) references and raw pointers. |
25 |
| - Ptr(Pointer), |
26 |
| - /// An uninitialized value. |
27 |
| - Uninit, |
28 |
| - /// An n-tuple, used for arrays, structs, tuples, SIMD vectors. |
29 |
| - Tuple(Vec<Self>), |
30 |
| - /// A variant of a sum type, used for enums. |
31 |
| - Variant { |
32 |
| - idx: BigInt, |
33 |
| - data: Box<Self>, |
34 |
| - }, |
35 |
| - /// A "bag of raw bytes", used for unions. |
36 |
| - RawBag(Vec<Byte>), |
37 |
| - /* ... */ |
38 |
| -} |
39 |
| -``` |
40 |
| - |
41 |
| -The point of this type is to capture the mathematical concepts that are represented by the data we store in memory. |
42 |
| -The definition is likely incomplete, and even if it was complete now, we might expand it as Rust grows. |
43 |
| -That is okay; all previously defined representation relations are still well-defined when the domain grows, the newly added values will just not be valid for old types as one would expect. |
44 |
| - |
45 |
| -## Example value relations |
46 |
| - |
47 |
| -We show some examples for how one might want to use this `Value` domain to define the value relation for a type. |
48 |
| - |
49 |
| -### `bool` |
50 |
| - |
51 |
| -The value relation for `bool` relates `Bool(b)` with `[r]` if and only if `r.as_int() == Some(if b { 1 } else { 0 })`. |
52 |
| -(`as_int` is defined in [the memory interface][memory-interface].) |
53 |
| - |
54 |
| -**Note:** Here and in the following, we implicitly perform a ptr-to-int cast when loading a `PtrFragment` at a non-pointer type. |
55 |
| -This basically means that non-pointer types carry no [provenance], and "superflous" provenance is implicitly stripped on loads. |
56 |
| -There are [quite a few problems](https://github.com/rust-lang/unsafe-code-guidelines/issues/181#issuecomment-519860562) with this approach, |
57 |
| -but there is also no known alternative that has no problems. |
58 |
| -For this document (in accordance with what Miri does), we chose the option that has least UB, to avoid false positives. |
59 |
| -But this means there are likely many false negatives, and the final Rust spec will likely have more UB than this WIP document! |
60 |
| - |
61 |
| -[provenance]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#pointer-provenance |
62 |
| - |
63 |
| -### `()` |
64 |
| - |
65 |
| -The value relation for the `()` type relates the empty tuple `Tuple([])` (assuming we can use array notation to "match" on `Vec`) with the empty byte list `[]`, and that's it. |
66 |
| - |
67 |
| -### `Pad` |
68 |
| - |
69 |
| -The [hypothetical `Pad` type][pad] relates the empty tuple `Tuple([])` with any byte list of length 1. |
70 |
| - |
71 |
| -[pad]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#padding |
72 |
| - |
73 |
| -### `!` |
74 |
| - |
75 |
| -The value relation for the `!` type is empty: nothing is related to anything at this type. |
76 |
| - |
77 |
| -### `#[repr(C)] struct Pair<T, U>(T, U)` |
78 |
| - |
79 |
| -The value relation for `Pair` is based on the value relations for `T` and `U`. |
80 |
| -A value `Tuple([t, u])` is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if: |
81 |
| - |
82 |
| -* `t` is represented by `rt` at type `T`. |
83 |
| -* `u` is represented by `ru` at type `U`. |
84 |
| -* The length of `rt ++ pad1` is equal to the offset of the `U` field. |
85 |
| -* The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair<T, U>`. |
86 |
| - |
87 |
| -This relation specifies that values of type `Pair` are always 2-tuples (aka, pairs). |
88 |
| -It also says that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length. |
89 |
| -So, for example when considering `Pair<u8, u16>`, the value `Tuple[42, 119]` is represented on a little-endian target by `[Raw(42), byte, Raw(119), Raw(0)]` for *any* `byte: Byte`. |
90 |
| - |
91 |
| -### `&T`/`&mut T` |
92 |
| - |
93 |
| -Reference types are tricky. |
94 |
| -But a possible value relation for sized `T` is: |
95 |
| -A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment { ptr, idx: PTR_SIZE-1 }]` if `ptr` is non-NULL and appropriately aligned (defining alignment is left open for now). |
96 |
| - |
97 |
| -### `u8` |
98 |
| - |
99 |
| -For the value representation of integer types, there are two different reasonable choices. |
100 |
| -Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_int() == Some(i)`. |
101 |
| - |
102 |
| -And then, maybe, we also want to additionally say that value `Uninit` is related to byte list `[Uninit]`. |
103 |
| -This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value). |
104 |
| - |
105 |
| -### `NonZeroU8` |
106 |
| - |
107 |
| -`NonZeroU8` is basically the same as `u8` except that we remove the 0, and certainly do not consider `Uninit` a valid value. |
108 |
| -A value `Int(i)` where `i` in `1..256` is related to `[b]` if `b.as_int() == Some(i)`, and that's it. |
109 |
| - |
110 |
| -### `union` |
111 |
| - |
112 |
| -The `union` type does not even try to interpret memory, so for a `union` of size `n`, the value relation says that for any byte list `bytes` of that length, `RawBag(bytes)` is related to `bytes`. |
113 |
| -(Note however that [this definition might not be implementable](https://github.com/rust-lang/unsafe-code-guidelines/issues/156).) |
114 |
| - |
115 |
| -## The role of the value representation in the operational semantics |
116 |
| - |
117 |
| -One key use of the value representation is to define a "typed" interface to memory: |
118 |
| - |
119 |
| -```rust |
120 |
| -trait TypedMemory: Memory { |
121 |
| - /// Write a value of the given type to memory. |
122 |
| - fn typed_write(&mut self, ptr: Self::Pointer, val: Value, ty: Type) -> Result; |
123 |
| - |
124 |
| - /// Read a value of the given type. |
125 |
| - fn typed_read(&mut self, ptr: Self::Pointer, ty: Type) -> Result<Value>; |
126 |
| -} |
127 |
| -``` |
128 |
| - |
129 |
| -Here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler). |
130 |
| -This interface is inspired by [Cerberus](https://www.cl.cam.ac.uk/~pes20/cerberus/). |
131 |
| - |
132 |
| -We can implement `TypedMemory` for any `Memory` as follows: |
133 |
| -* For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB. |
134 |
| -* For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB. |
135 |
| - |
136 |
| -In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type `T`), or when an argument of type `T` is passed to a function or a return value of type `T` returned from a function, this is called a "typed copy". |
137 |
| -Its semantics is basically to do a `typed_read` at `T` followed by a `typed_write` at `T`. |
138 |
| -In particular, this means doing a "typed copy" at `T` of some memory that has no valid representation at `T` is undefined behavior! |
139 |
| -This also means that for `Pad` and more generally for types that have padding, the "typed copy" does not preserve the padding bytes. |
140 |
| - |
141 |
| -## Relation to validity invariant |
142 |
| - |
143 |
| -One way we *could* also use the value representation (and the author thinks this is exceedingly elegant) is to define the validity invariant. |
144 |
| -Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`. |
145 |
| -We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation". |
146 |
| -For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references. |
147 |
| -The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`. |
148 |
| -This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice. |
149 |
| - |
150 |
| -Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like the `noalias` attribute. |
151 |
| - |
152 |
| -[Stacked Borrows]: stacked-borrows.md |
| 1 | +This file [moved](https://github.com/RalfJung/minirust/blob/master/lang/values.md). |
0 commit comments