|
| 1 | +# Rust Value Domain |
| 2 | + |
| 3 | +**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology. |
| 4 | + |
| 5 | +The purpose of this document is to describe what the set of *all possible values* is in Rust. |
| 6 | +This is an important definition: one key element of a Rust specification will be to define the [representation relation][representation] of every type. |
| 7 | +This relation relates values with lists of bytes, so before we can even start specifying the relation we have to specify the involved domains. |
| 8 | +`Byte` is defined as part of [the memory interface][memory-interface]; this document is about defining `Value`. |
| 9 | + |
| 10 | +[representation]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#representation |
| 11 | +[memory-interface]: memory-interface.md |
| 12 | + |
| 13 | +## Values |
| 14 | + |
| 15 | +The Rust value domain is described by the following (incomplete) type definition (in terms of an arbitrary `Pointer` type instantiated by [the memory interface][memory-interface]): |
| 16 | + |
| 17 | +```rust |
| 18 | +enum Value<Pointer> { |
| 19 | + /// A mathematical integer. |
| 20 | + Int(BigInt), |
| 21 | + /// A Boolean value. |
| 22 | + Bool(bool), |
| 23 | + /// A pointer. |
| 24 | + Ptr(Pointer), |
| 25 | + /// A zero-sized "unit". |
| 26 | + Unit, |
| 27 | + /// An uninitialized value. |
| 28 | + Uninit, |
| 29 | + /// An n-tuple. |
| 30 | + Tuple(Vec<Self>), |
| 31 | + /// A variant of a sum type. |
| 32 | + Variant { |
| 33 | + idx: u64, |
| 34 | + data: Box<Self>, |
| 35 | + }, |
| 36 | + /* ... */ |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +As Rust grows, we might expand this definition. That is okay; all previously defined representation relations are still well-defined when the domain grows, the newly added values will just not be valid for old types as one would expect. |
| 41 | + |
| 42 | +## Example value relations |
| 43 | + |
| 44 | +We show some examples for how one might want to use this `Value` domain to define the value relation for a type. |
| 45 | + |
| 46 | +### `bool` |
| 47 | + |
| 48 | +The value relation for `bool` relates `Bool(true)` with `[Raw(0)]` and `Bool(false)` with [`Raw(1)`], and that's it. |
| 49 | + |
| 50 | +### `()` |
| 51 | + |
| 52 | +The value relation for the `()` type relates `Unit` with the empty list `[]`, and that's it. |
| 53 | + |
| 54 | +### `!` |
| 55 | + |
| 56 | +The value relation for the `!` type is empty: nothing is related to anything at this type. |
| 57 | + |
| 58 | +### `#[repr(C)] struct Pair<T, U>(T, U)` |
| 59 | + |
| 60 | +The value relation for `Pair` us based on the value relations for `T` and `U`. |
| 61 | +A value `Tuple([t, u])` (assuming we can use array notation to "match" on `Vec`) is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if: |
| 62 | + |
| 63 | +* `t` is represented by `rt` at type `T`. |
| 64 | +* `u` is represented by `ru` at type `U`. |
| 65 | +* The length of `rt ++ pad1` is equal to the offset of the `U` field. |
| 66 | +* The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair<T, U>`. |
| 67 | + |
| 68 | +This relation demonstrates that value of type `Pair` are always 2-tuples (aka, pairs). |
| 69 | +It also shows that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length. |
| 70 | + |
| 71 | +### `&T`/`&mut T` |
| 72 | + |
| 73 | +Reference types are tricky. |
| 74 | +But a possible value relation for sized `T` is: |
| 75 | +A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment { ptr, idx: N-1 }]` where `N == size_of::<usize>()` if `ptr` is non-NULL and aligned to `align_of::<T>()`. |
| 76 | + |
| 77 | +### `u8` |
| 78 | + |
| 79 | +For the value representation of integer types, there are two different reasonable choices. |
| 80 | +Certainly, a value `Int(i)` where `i` in `0..256` is related to `[Raw(i as u8)]`. |
| 81 | + |
| 82 | +And then, maybe, we also want to additionally say that value `Uninit` is related to `[Uninit]`. |
| 83 | +This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value). |
| 84 | + |
| 85 | +## The role of the value representation in the operational semantics |
| 86 | + |
| 87 | +One key use of the value representation is to define a "typed" interface to memory: |
| 88 | + |
| 89 | +```rust |
| 90 | +trait TypedMemory: Memory { |
| 91 | + /// Write a value of the given type to memory. |
| 92 | + fn typed_write(&mut self, ptr: Self::Pointer, val: Value, ty: Type) -> Result<(), Error>; |
| 93 | + |
| 94 | + /// Read a value of the given type. |
| 95 | + fn typed_read(&mut self, ptr: Self::Pointer, ty: Type) -> Result<Value, Error>; |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler). |
| 100 | +We can implement `TypedMemory` for any `Memory` as follows: |
| 101 | +* For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB. |
| 102 | +* For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB. |
| 103 | + |
| 104 | +In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type `T`), or when an argument of type `T` is passed to a function or a return value of type `T` returned from a function, this is called a "typed copy". |
| 105 | +Its semantics is basically to do a `typed_read` at `T` followed by a `typed_write` at `T`. |
| 106 | +In particular, this means doing a "typed copy" at `T` of some memory that has no valid representation at `T` is undefined behavior! |
| 107 | +This also means that for types that have padding, the "typed copy" does not preserve the padding bytes. |
| 108 | + |
| 109 | +## Relation to validity invariant |
| 110 | + |
| 111 | +One way we *could* also use the value representation (and the author things this is exceedingly elegant) is to define the validity invariant. |
| 112 | +Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`. |
| 113 | +We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation". |
| 114 | +For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references. |
| 115 | +The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`. |
| 116 | +This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice. |
| 117 | + |
| 118 | +Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like that is needed to justify the `noalias` attribute. |
| 119 | + |
| 120 | +[Stacked Borrows]: stacked-borrows.md |
0 commit comments