Skip to content

Commit 8cd1cc5

Browse files
authored
Merge pull request rust-lang#341 from RalfJung/minirust
some files have moved
2 parents 99c8493 + e265921 commit 8cd1cc5

File tree

2 files changed

+2
-308
lines changed

2 files changed

+2
-308
lines changed

wip/memory-interface.md

+1-156
Original file line numberDiff line numberDiff line change
@@ -1,156 +1 @@
1-
# Rust Memory Interface
2-
3-
**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology.
4-
5-
The purpose of this document is to describe the interface between a Rust program and memory.
6-
This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface:
7-
* The *expression/statement semantics* of Rust boils down to explaining which "memory events" (calls to the memory interface) happen in which order - expressed as calls to the methods of this interface, and reactions to its return values.
8-
This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory.
9-
* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads.
10-
A memory model is defined by implementing the memory interface.
11-
12-
The interface shown below is also opinionated in several ways.
13-
It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust.
14-
For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult [defect report responses](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm), to see this).
15-
Another key property of the interface presented below is that it is *untyped*.
16-
This implies that in Rust, *operations are typed, but memory is not* - a key difference to C and C++ with their type-based strict aliasing rules.
17-
At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), and we might have to revisit this choice later if it turns out to not be workable.
18-
19-
## Pointers
20-
21-
One key question a memory model has to answer is *what is a pointer*.
22-
It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated].
23-
This becomes even more prominent with aliasing models such as [Stacked Borrows].
24-
So we leave it up to the memory model to answer this question, and make `Pointer` an associated type.
25-
Practically speaking, `Pointer` will be some representation of an "address", plus [provenance] information.
26-
27-
[provenance]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#pointer-provenance
28-
29-
## Bytes
30-
31-
The unit of communication between the memory model and the rest of the program is a *byte*.
32-
Again, the question of "what is a byte" is not as trivial as it might seem; beyond `u8` values we have to represent `Pointer`s and [uninitialized memory][uninit].
33-
We define the `Byte` type as follows, where `Pointer` will later be instantiated with the `Memory::Pointer` associated type.
34-
35-
```rust
36-
enum Byte<Pointer> {
37-
/// The "normal" case: a (frozen, initialized) integer in `0..256`.
38-
Raw(u8),
39-
/// An uninitialized byte.
40-
Uninit,
41-
/// One byte of a pointer.
42-
PtrFragment {
43-
/// The pointer of which this is a byte.
44-
/// That is, the byte is a fragment of this pointer.
45-
ptr: Pointer,
46-
/// Which byte of the pointer this is.
47-
/// `idx` will always be in `0..PTR_SIZE`.
48-
idx: u8,
49-
}
50-
}
51-
```
52-
53-
The purpose of `PtrFragment` is to enable a byte-wise representation of a `Pointer`.
54-
On a 32-bit system, the sequence of 4 bytes representing `ptr: Pointer` is:
55-
```
56-
[
57-
PtrFragment { ptr, idx: 0 },
58-
PtrFragment { ptr, idx: 1 },
59-
PtrFragment { ptr, idx: 2 },
60-
PtrFragment { ptr, idx: 3 },
61-
]
62-
```
63-
64-
Based on the `PtrToInt` trait (see below), we can turn every initialized `Byte` into an integer in `0..256`:
65-
66-
```rust
67-
impl<Pointer: PtrToInt> Byte<Pointer> {
68-
fn as_int(self) -> Option<u8> {
69-
match self {
70-
Byte::Raw(int) => Some(int),
71-
Byte::Uninit => None,
72-
Byte::PtrFragment { ptr, idx } =>
73-
Some(ptr.get_byte(idx)),
74-
}
75-
}
76-
}
77-
```
78-
79-
## Memory interface
80-
81-
The Rust memory interface is described by the following (not-yet-complete) trait definition:
82-
83-
```rust
84-
/// All operations are fallible, so they return `Result`. If they fail, that
85-
/// means the program caused UB. What exactly the `UndefinedBehavior` type is
86-
/// does not matter here.
87-
type Result<T=()> = std::result::Result<T, UndefinedBehavior>;
88-
89-
/// *Note*: All memory operations can be non-deterministic, which means that
90-
/// executing the same operation on the same memory can have different results.
91-
/// We also let all operations potentially mutate memory. For example, reads
92-
/// actually do change the current state when considering concurrency or
93-
/// Stacked Borrows.
94-
/// This is pseudo-Rust, so we just use fully owned types everywhere for
95-
/// symmetry and simplicity.
96-
trait Memory {
97-
/// The type of pointer values.
98-
type Pointer: Copy + PtrToInt;
99-
100-
/// The size of pointer values.
101-
const PTR_SIZE: u64;
102-
103-
/// Create a new allocation.
104-
fn allocate(&mut self, size: u64, align: u64) -> Result<Self::Pointer>;
105-
106-
/// Remove an allocation.
107-
fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result;
108-
109-
/// Write some bytes to memory.
110-
fn write(&mut self, ptr: Self::Pointer, bytes: Vec<Byte<Self::Pointer>>) -> Result;
111-
112-
/// Read some bytes from memory.
113-
fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result<Vec<Byte<Self::Pointer>>>;
114-
115-
/// Offset the given pointer.
116-
fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode)
117-
-> Result<Self::Pointer>;
118-
119-
/// Cast the given integer to a pointer. (The other direction is handled by `PtrToInt` below.)
120-
fn int_to_ptr(&mut self, int: u64) -> Result<Self::Pointer>;
121-
}
122-
123-
/// The `Pointer` type must know how to extract its bytes, *without any access to the `Memory`*.
124-
trait PtrToInt {
125-
/// Get the `idx`-th byte of the pointer. `idx` must be in `0..PTR_SIZE`.
126-
fn get_byte(self, idx: u8) -> u8;
127-
}
128-
129-
/// The rules applying to this pointer offset operation.
130-
enum OffsetMode {
131-
/// Wrapping offset; never UB.
132-
Wrapping,
133-
/// Non-wrapping offset; UB if it wraps.
134-
NonWrapping,
135-
/// In-bounds offset; UB if it wraps or if old and new pointer are not both
136-
/// in-bounds of the same allocation (details are specified by the memory
137-
/// model).
138-
Inbounds,
139-
}
140-
```
141-
142-
We will generally assume we have a particular memory model in scope, and freely refer to its `PTR_SIZE` and `Pointer` items.
143-
We will also write `Byte` for `Byte<Pointer>`.
144-
145-
This is a very basic memory interface that is incomplete in at least the following ways:
146-
147-
* To implement rules like "dereferencing a null, unaligned, or dangling raw pointer is UB" (even if no memory access happens), there needs to be a way to do an "alignment, bounds and null-check".
148-
* There needs to be some way to do alignment checks -- either using the above operation, or by adding `align` parameters to `read` and `write`.
149-
* To represent concurrency, many operations need to take a "thread ID" and `read` and `write` need to take an [`Ordering`].
150-
* To represent [Stacked Borrows], there needs to be a "retag" operation, and that one will in fact be "lightly typed" (it cares about `UnsafeCell`).
151-
* Maybe we want operations that can compare pointers without casting them to integers.
152-
153-
[pointers-complicated]: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html
154-
[uninit]: https://www.ralfj.de/blog/2019/07/14/uninit.html
155-
[`Ordering`]: https://doc.rust-lang.org/nightly/core/sync/atomic/enum.Ordering.html
156-
[Stacked Borrows]: stacked-borrows.md
1+
This file [moved](https://github.com/RalfJung/minirust/blob/master/mem/interface.md)

wip/value-domain.md

+1-152
Original file line numberDiff line numberDiff line change
@@ -1,152 +1 @@
1-
# Rust Value Domain
2-
3-
**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology.
4-
5-
The purpose of this document is to describe what the set of *all possible values* is in Rust.
6-
This is an important definition: one key element of a Rust specification will be to define the [representation relation][representation] of every type.
7-
This relation relates values with lists of bytes: it says, for a given value and list of bytes, if that value is represented by that list.
8-
However, before we can even start specifying the relation, we have to specify the involved domains.
9-
`Byte` is defined as part of [the memory interface][memory-interface]; this document is about defining `Value`.
10-
11-
[representation]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#representation
12-
[memory-interface]: memory-interface.md
13-
14-
## Values
15-
16-
The Rust value domain is described by the following (incomplete) type definition (using the `Pointer` type defined by [the memory interface][memory-interface]):
17-
18-
```rust
19-
enum Value {
20-
/// A mathematical integer, used for `i*`/`u*` types.
21-
Int(BigInt),
22-
/// A Boolean value, used for `bool`.
23-
Bool(bool),
24-
/// A pointer value, used for (thin) references and raw pointers.
25-
Ptr(Pointer),
26-
/// An uninitialized value.
27-
Uninit,
28-
/// An n-tuple, used for arrays, structs, tuples, SIMD vectors.
29-
Tuple(Vec<Self>),
30-
/// A variant of a sum type, used for enums.
31-
Variant {
32-
idx: BigInt,
33-
data: Box<Self>,
34-
},
35-
/// A "bag of raw bytes", used for unions.
36-
RawBag(Vec<Byte>),
37-
/* ... */
38-
}
39-
```
40-
41-
The point of this type is to capture the mathematical concepts that are represented by the data we store in memory.
42-
The definition is likely incomplete, and even if it was complete now, we might expand it as Rust grows.
43-
That is okay; all previously defined representation relations are still well-defined when the domain grows, the newly added values will just not be valid for old types as one would expect.
44-
45-
## Example value relations
46-
47-
We show some examples for how one might want to use this `Value` domain to define the value relation for a type.
48-
49-
### `bool`
50-
51-
The value relation for `bool` relates `Bool(b)` with `[r]` if and only if `r.as_int() == Some(if b { 1 } else { 0 })`.
52-
(`as_int` is defined in [the memory interface][memory-interface].)
53-
54-
**Note:** Here and in the following, we implicitly perform a ptr-to-int cast when loading a `PtrFragment` at a non-pointer type.
55-
This basically means that non-pointer types carry no [provenance], and "superflous" provenance is implicitly stripped on loads.
56-
There are [quite a few problems](https://github.com/rust-lang/unsafe-code-guidelines/issues/181#issuecomment-519860562) with this approach,
57-
but there is also no known alternative that has no problems.
58-
For this document (in accordance with what Miri does), we chose the option that has least UB, to avoid false positives.
59-
But this means there are likely many false negatives, and the final Rust spec will likely have more UB than this WIP document!
60-
61-
[provenance]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#pointer-provenance
62-
63-
### `()`
64-
65-
The value relation for the `()` type relates the empty tuple `Tuple([])` (assuming we can use array notation to "match" on `Vec`) with the empty byte list `[]`, and that's it.
66-
67-
### `Pad`
68-
69-
The [hypothetical `Pad` type][pad] relates the empty tuple `Tuple([])` with any byte list of length 1.
70-
71-
[pad]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#padding
72-
73-
### `!`
74-
75-
The value relation for the `!` type is empty: nothing is related to anything at this type.
76-
77-
### `#[repr(C)] struct Pair<T, U>(T, U)`
78-
79-
The value relation for `Pair` is based on the value relations for `T` and `U`.
80-
A value `Tuple([t, u])` is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if:
81-
82-
* `t` is represented by `rt` at type `T`.
83-
* `u` is represented by `ru` at type `U`.
84-
* The length of `rt ++ pad1` is equal to the offset of the `U` field.
85-
* The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair<T, U>`.
86-
87-
This relation specifies that values of type `Pair` are always 2-tuples (aka, pairs).
88-
It also says that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length.
89-
So, for example when considering `Pair<u8, u16>`, the value `Tuple[42, 119]` is represented on a little-endian target by `[Raw(42), byte, Raw(119), Raw(0)]` for *any* `byte: Byte`.
90-
91-
### `&T`/`&mut T`
92-
93-
Reference types are tricky.
94-
But a possible value relation for sized `T` is:
95-
A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment { ptr, idx: PTR_SIZE-1 }]` if `ptr` is non-NULL and appropriately aligned (defining alignment is left open for now).
96-
97-
### `u8`
98-
99-
For the value representation of integer types, there are two different reasonable choices.
100-
Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_int() == Some(i)`.
101-
102-
And then, maybe, we also want to additionally say that value `Uninit` is related to byte list `[Uninit]`.
103-
This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value).
104-
105-
### `NonZeroU8`
106-
107-
`NonZeroU8` is basically the same as `u8` except that we remove the 0, and certainly do not consider `Uninit` a valid value.
108-
A value `Int(i)` where `i` in `1..256` is related to `[b]` if `b.as_int() == Some(i)`, and that's it.
109-
110-
### `union`
111-
112-
The `union` type does not even try to interpret memory, so for a `union` of size `n`, the value relation says that for any byte list `bytes` of that length, `RawBag(bytes)` is related to `bytes`.
113-
(Note however that [this definition might not be implementable](https://github.com/rust-lang/unsafe-code-guidelines/issues/156).)
114-
115-
## The role of the value representation in the operational semantics
116-
117-
One key use of the value representation is to define a "typed" interface to memory:
118-
119-
```rust
120-
trait TypedMemory: Memory {
121-
/// Write a value of the given type to memory.
122-
fn typed_write(&mut self, ptr: Self::Pointer, val: Value, ty: Type) -> Result;
123-
124-
/// Read a value of the given type.
125-
fn typed_read(&mut self, ptr: Self::Pointer, ty: Type) -> Result<Value>;
126-
}
127-
```
128-
129-
Here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler).
130-
This interface is inspired by [Cerberus](https://www.cl.cam.ac.uk/~pes20/cerberus/).
131-
132-
We can implement `TypedMemory` for any `Memory` as follows:
133-
* For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB.
134-
* For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB.
135-
136-
In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type `T`), or when an argument of type `T` is passed to a function or a return value of type `T` returned from a function, this is called a "typed copy".
137-
Its semantics is basically to do a `typed_read` at `T` followed by a `typed_write` at `T`.
138-
In particular, this means doing a "typed copy" at `T` of some memory that has no valid representation at `T` is undefined behavior!
139-
This also means that for `Pad` and more generally for types that have padding, the "typed copy" does not preserve the padding bytes.
140-
141-
## Relation to validity invariant
142-
143-
One way we *could* also use the value representation (and the author thinks this is exceedingly elegant) is to define the validity invariant.
144-
Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`.
145-
We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation".
146-
For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references.
147-
The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`.
148-
This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice.
149-
150-
Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like the `noalias` attribute.
151-
152-
[Stacked Borrows]: stacked-borrows.md
1+
This file [moved](https://github.com/RalfJung/minirust/blob/master/lang/values.md).

0 commit comments

Comments
 (0)