|
| 1 | +# Rust Memory Interface |
| 2 | + |
| 3 | +**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology. |
| 4 | + |
| 5 | +The purpose of this document is to describe the interface between a Rust program and memory. |
| 6 | +This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface: |
| 7 | +* The *expression/statement semantics* of Rust boils down to explaining which "memory events" (calls to the memory interface) happen in which order - expressed as calls to the methods of this interface, and reactions to its return values. |
| 8 | + This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory. |
| 9 | +* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. |
| 10 | + A memory model is defined by implementing the memory interface. |
| 11 | + |
| 12 | +The interface shown below is also opinionated in several ways. |
| 13 | +It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. |
| 14 | +For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult [defect report responses](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm), to see this). |
| 15 | +Another key property of the interface presented below is that it is *untyped*. |
| 16 | +This implies that in Rust, *operations are typed, but memory is not* - a key difference to C and C++ with their type-based strict aliasing rules. |
| 17 | +At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), and we might have to revisit this choice later if it turns out to not be workable. |
| 18 | + |
| 19 | +## Pointers |
| 20 | + |
| 21 | +One key question a memory model has to answer is *what is a pointer*. |
| 22 | +It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated]. |
| 23 | +This becomes even more prominent with aliasing models such as [Stacked Borrows]. |
| 24 | +So we leave it up to the memory model to answer this question, and make `Pointer` an associated type. |
| 25 | +Practically speaking, `Pointer` will be some representation of an "address", plus [provenance] information. |
| 26 | + |
| 27 | +[provenance]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#pointer-provenance |
| 28 | + |
| 29 | +## Bytes |
| 30 | + |
| 31 | +The unit of communication between the memory model and the rest of the program is a *byte*. |
| 32 | +Again, the question of "what is a byte" is not as trivial as it might seem; beyond `u8` values we have to represent `Pointer`s and [uninitialized memory][uninit]. |
| 33 | +We define the `Byte` type as follows, where `Pointer` will later be instantiated with the `Memory::Pointer` associated type. |
| 34 | + |
| 35 | +```rust |
| 36 | +enum Byte<Pointer> { |
| 37 | + /// The "normal" case: a (frozen, initialized) integer in `0..256`. |
| 38 | + Raw(u8), |
| 39 | + /// An uninitialized byte. |
| 40 | + Uninit, |
| 41 | + /// One byte of a pointer. |
| 42 | + PtrFragment { |
| 43 | + /// The pointer of which this is a byte. |
| 44 | + /// That is, the byte is a fragment of this pointer. |
| 45 | + ptr: Pointer, |
| 46 | + /// Which byte of the pointer this is. |
| 47 | + /// `idx` will always be in `0..PTR_SIZE`. |
| 48 | + idx: u8, |
| 49 | + } |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +The purpose of `PtrFragment` is to enable a byte-wise representation of a `Pointer`. |
| 54 | +On a 32-bit system, the sequence of 4 bytes representing `ptr: Pointer` is: |
| 55 | +``` |
| 56 | +[ |
| 57 | + PtrFragment { ptr, idx: 0 }, |
| 58 | + PtrFragment { ptr, idx: 1 }, |
| 59 | + PtrFragment { ptr, idx: 2 }, |
| 60 | + PtrFragment { ptr, idx: 3 }, |
| 61 | +] |
| 62 | +``` |
| 63 | + |
| 64 | +Based on the `PtrToInt` trait (see below), we can turn every initialized `Byte` into an integer in `0..256`: |
| 65 | + |
| 66 | +```rust |
| 67 | +impl<Pointer: PtrToInt> Byte<Pointer> { |
| 68 | + fn as_int(self) -> Option<u8> { |
| 69 | + match self { |
| 70 | + Byte::Raw(int) => Some(int), |
| 71 | + Byte::Uninit => None, |
| 72 | + Byte::PtrFragment { ptr, idx } => |
| 73 | + Some(ptr.get_byte(idx)), |
| 74 | + } |
| 75 | + } |
| 76 | +} |
| 77 | +``` |
| 78 | + |
| 79 | +## Memory interface |
| 80 | + |
| 81 | +The Rust memory interface is described by the following (not-yet-complete) trait definition: |
| 82 | + |
| 83 | +```rust |
| 84 | +/// All operations are fallible, so they return `Result`. If they fail, that |
| 85 | +/// means the program caused UB. What exactly the `UndefinedBehavior` type is |
| 86 | +/// does not matter here. |
| 87 | +type Result<T=()> = std::result::Result<T, UndefinedBehavior>; |
| 88 | + |
| 89 | +/// *Note*: All memory operations can be non-deterministic, which means that |
| 90 | +/// executing the same operation on the same memory can have different results. |
| 91 | +/// We also let all operations potentially mutate memory. For example, reads |
| 92 | +/// actually do change the current state when considering concurrency or |
| 93 | +/// Stacked Borrows. |
| 94 | +/// This is pseudo-Rust, so we just use fully owned types everywhere for |
| 95 | +/// symmetry and simplicity. |
| 96 | +trait Memory { |
| 97 | + /// The type of pointer values. |
| 98 | + type Pointer: Copy + PtrToInt; |
| 99 | + |
| 100 | + /// The size of pointer values. |
| 101 | + const PTR_SIZE: u64; |
| 102 | + |
| 103 | + /// Create a new allocation. |
| 104 | + fn allocate(&mut self, size: u64, align: u64) -> Result<Self::Pointer>; |
| 105 | + |
| 106 | + /// Remove an allocation. |
| 107 | + fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result; |
| 108 | + |
| 109 | + /// Write some bytes to memory. |
| 110 | + fn write(&mut self, ptr: Self::Pointer, bytes: Vec<Byte<Self::Pointer>>) -> Result; |
| 111 | + |
| 112 | + /// Read some bytes from memory. |
| 113 | + fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result<Vec<Byte<Self::Pointer>>>; |
| 114 | + |
| 115 | + /// Offset the given pointer. |
| 116 | + fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) |
| 117 | + -> Result<Self::Pointer>; |
| 118 | + |
| 119 | + /// Cast the given integer to a pointer. (The other direction is handled by `PtrToInt` below.) |
| 120 | + fn int_to_ptr(&mut self, int: u64) -> Result<Self::Pointer>; |
| 121 | +} |
| 122 | + |
| 123 | +/// The `Pointer` type must know how to extract its bytes, *without any access to the `Memory`*. |
| 124 | +trait PtrToInt { |
| 125 | + /// Get the `idx`-th byte of the pointer. `idx` must be in `0..PTR_SIZE`. |
| 126 | + fn get_byte(self, idx: u8) -> u8; |
| 127 | +} |
| 128 | + |
| 129 | +/// The rules applying to this pointer offset operation. |
| 130 | +enum OffsetMode { |
| 131 | + /// Wrapping offset; never UB. |
| 132 | + Wrapping, |
| 133 | + /// Non-wrapping offset; UB if it wraps. |
| 134 | + NonWrapping, |
| 135 | + /// In-bounds offset; UB if it wraps or if old and new pointer are not both |
| 136 | + /// in-bounds of the same allocation (details are specified by the memory |
| 137 | + /// model). |
| 138 | + Inbounds, |
| 139 | +} |
| 140 | +``` |
| 141 | + |
| 142 | +We will generally assume we have a particular memory model in scope, and freely refer to its `PTR_SIZE` and `Pointer` items. |
| 143 | +We will also write `Byte` for `Byte<Pointer>`. |
| 144 | + |
| 145 | +This is a very basic memory interface that is incomplete in at least the following ways: |
| 146 | + |
| 147 | +* To implement rules like "dereferencing a null, unaligned, or dangling raw pointer is UB" (even if no memory access happens), there needs to be a way to do an "alignment, bounds and null-check". |
| 148 | +* There needs to be some way to do alignment checks -- either using the above operation, or by adding `align` parameters to `read` and `write`. |
| 149 | +* To represent concurrency, many operations need to take a "thread ID" and `read` and `write` need to take an [`Ordering`]. |
| 150 | +* To represent [Stacked Borrows], there needs to be a "retag" operation, and that one will in fact be "lightly typed" (it cares about `UnsafeCell`). |
| 151 | +* Maybe we want operations that can compare pointers without casting them to integers. |
| 152 | + |
| 153 | +[pointers-complicated]: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html |
| 154 | +[uninit]: https://www.ralfj.de/blog/2019/07/14/uninit.html |
| 155 | +[`Ordering`]: https://doc.rust-lang.org/nightly/core/sync/atomic/enum.Ordering.html |
| 156 | +[Stacked Borrows]: stacked-borrows.md |
0 commit comments