|
| 1 | +# Rust Memory Interface |
| 2 | + |
| 3 | +**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology. |
| 4 | + |
| 5 | +The purpose of this document is to describe the interface between a Rust program and memory. |
| 6 | +This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface: |
| 7 | +* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order. |
| 8 | +* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads. |
| 9 | + |
| 10 | +The interface is also opinionated in several ways; this is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust. |
| 11 | +For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult non-normative defect report responses, to see this). |
| 12 | +Another key property of the interface presented below is that it is *untyped*. |
| 13 | +This encodes the fact that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules. |
| 14 | + |
| 15 | +## Pointers |
| 16 | + |
| 17 | +One key question a memory model has to answer is *what is a pointer*. |
| 18 | +It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated]. |
| 19 | +So we will leave this question open, and treat `Pointer` as an "associated type" of the memory interface |
| 20 | + |
| 21 | +## Bytes |
| 22 | + |
| 23 | +The unit of communication between the memory model and the rest of the program is a *byte*. |
| 24 | +Again the question of "what is a byte" is not as trivial as it might seem; beyond `u8` values we have to represent `Pointer`s and [uninitialized memory][uninit]. |
| 25 | +We define the `Byte` type (in terms of an arbitrary `Pointer` type) as follows: |
| 26 | + |
| 27 | +```rust |
| 28 | +enum Byte<Pointer> { |
| 29 | + /// The "normal" case: a (frozen, initialized) integer in `0..256`. |
| 30 | + Raw(u8), |
| 31 | + /// An uninitialized byte. |
| 32 | + Uninit, |
| 33 | + /// One byte of a pointer. |
| 34 | + Pointer { |
| 35 | + /// The pointer of which this is a byte. |
| 36 | + ptr: Pointer, |
| 37 | + /// Which byte of the pointer this is. |
| 38 | + /// `idx` will always be in `0..size_of::<usize>()`. |
| 39 | + idx: u8, |
| 40 | + } |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +## Memory interface |
| 45 | + |
| 46 | +The Rust memory interface is described by the following (not-yet-complete) trait definition: |
| 47 | + |
| 48 | +```rust |
| 49 | +/// *Note*: All memory operations can be non-deterministic, which means that |
| 50 | +/// executing the same operation on the same memory can have different results. |
| 51 | +/// We also let all operations potentially mutated memory. For example, reads |
| 52 | +/// actually do change the current state when considering concurrency or |
| 53 | +/// Stacked Borrows. |
| 54 | +trait Memory { |
| 55 | + /// The type of pointer values. |
| 56 | + type Pointer; |
| 57 | + |
| 58 | + /// The type of memory errors (i.e., ways in which the program can cause UB |
| 59 | + /// by interacting with memory). |
| 60 | + type Error; |
| 61 | + |
| 62 | + /// Create a new allocation. |
| 63 | + fn allocate(&mut self, size: u64, align: u64) -> Result<Self::Pointer, Self::Error>; |
| 64 | + |
| 65 | + /// Remove an allocation. |
| 66 | + fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result<(), Self::Error>; |
| 67 | + |
| 68 | + /// Write some bytes to memory. |
| 69 | + fn write(&mut self, ptr: Self::Pointer, bytes: Vec<Byte<Self::Pointer>>) -> Result<(), Self::Error>; |
| 70 | + |
| 71 | + /// Read some bytes from memory. |
| 72 | + fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result<Vec<Byte<Self::Pointer>>, Self::Error>; |
| 73 | + |
| 74 | + /// Offset the given pointer. |
| 75 | + fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result<Self::Pointer, Self::Error>; |
| 76 | + |
| 77 | + /// Cast the given pointer to an integer. |
| 78 | + fn ptr_to_int(&mut self, ptr: Self::Pointer) -> Result<u64, Self::Error>; |
| 79 | + |
| 80 | + /// Cast the given integer to a pointer. |
| 81 | + fn int_to_ptr(&mut self, int: u64) -> Result<Self::Pointer, Self::Error>; |
| 82 | +} |
| 83 | + |
| 84 | +/// The rules applying to this pointer offset operation. |
| 85 | +enum OffsetMode { |
| 86 | + /// Wrapping offset; never UB. |
| 87 | + Wrapping, |
| 88 | + /// Non-wrapping offset; UB if it wraps. |
| 89 | + NonWrapping, |
| 90 | + /// In-bounds offset; UB if it wraps or if old and new pointer are not both |
| 91 | + /// in-bounds of the same allocation (details are specified by the memory |
| 92 | + /// model). |
| 93 | + Inbounds, |
| 94 | +} |
| 95 | +``` |
| 96 | + |
| 97 | +This is a very basic memory interface that is incomplete in at least the following ways: |
| 98 | + |
| 99 | +* To implement rules like "dereferencing a null, unaligned, or dangling raw pointer is UB" (even if no memory access happens), there needs to be a way to do an "alignment, bounds and null-check". |
| 100 | +* There needs to be some way to do alignment checks -- either using the above operation, or by adding `align` parameters to `read` and `write`. |
| 101 | +* To represent concurrency, many operations need to take a "thread ID" and `read` and `write` need to take an [`Ordering`]. |
| 102 | +* To represent [Stacked Borrows], there needs to be a "retag" operation, and that one will in fact be "lightly typed" (it cares about `UnsafeCell`). |
| 103 | +* Maybe we want operations that can compare pointers without casting them to integers. |
| 104 | + |
| 105 | +But I think it can still be useful to provide some basic terminology and grounds for further discussion. |
| 106 | + |
| 107 | +[pointers-complicated]: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html |
| 108 | +[uninit]: https://www.ralfj.de/blog/2019/07/14/uninit.html |
| 109 | +[`Ordering`]: https://doc.rust-lang.org/nightly/core/sync/atomic/enum.Ordering.html |
| 110 | +[Stacked Borrows]: stacked-borrows.md |
0 commit comments