You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: wip/memory-interface.md
+11-11
Original file line number
Diff line number
Diff line change
@@ -4,21 +4,22 @@
4
4
5
5
The purpose of this document is to describe the interface between a Rust program and memory.
6
6
This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface:
7
-
* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order.
7
+
* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory.
8
8
* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads.
9
9
10
-
The interface is also opinionated in several ways; this is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust.
11
-
For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult non-normative defect report responses, to see this).
10
+
The interface shown below is also opinionated in several ways.
11
+
It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust.
12
+
For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult [non-normative defect report responses](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm), to see this).
12
13
Another key property of the interface presented below is that it is *untyped*.
13
-
This encodes the fact that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules.
14
-
At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not*[the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), so we might have to revisit this choice later.
14
+
This implies that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules.
15
+
At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not*[the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), and we might have to revisit this choice later if it turns out to not be workable.
15
16
16
17
## Pointers
17
18
18
19
One key question a memory model has to answer is *what is a pointer*.
19
20
It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated].
20
21
This becomes even more prominent with aliasing models such as [Stacked Borrows].
21
-
So we will leave this question open, and treat`Pointer` as an "associated type" of the memory interface
22
+
So the interface will leave it up to the concrete instance to answer this question, and carry`Pointer` as an associated type.
Copy file name to clipboardExpand all lines: wip/value-domain.md
+19-12
Original file line number
Diff line number
Diff line change
@@ -23,8 +23,6 @@ enum Value<Pointer> {
23
23
Bool(bool),
24
24
/// A pointer.
25
25
Ptr(Pointer),
26
-
/// A zero-sized "unit".
27
-
Unit,
28
26
/// An uninitialized value.
29
27
Uninit,
30
28
/// An n-tuple.
@@ -34,6 +32,8 @@ enum Value<Pointer> {
34
32
idx:u64,
35
33
data:Box<Self>,
36
34
},
35
+
/// A "bag of raw bytes".
36
+
RawBag(Vec<Byte<Pointer>>),
37
37
/* ... */
38
38
}
39
39
```
@@ -46,28 +46,30 @@ We show some examples for how one might want to use this `Value` domain to defin
46
46
47
47
### `bool`
48
48
49
-
The value relation for `bool` relates `Bool(b)` with `[bb]` if and only if `bb.as_int() == Some(if b { 1 } else { 0 })`.
49
+
The value relation for `bool` relates `Bool(b)` with `[r]` if and only if `r.as_int() == Some(if b { 1 } else { 0 })`.
50
+
(`as_int` is defined in [the memory interface][memory-interface].)
50
51
51
52
### `()`
52
53
53
-
The value relation for the `()` type relates `Unit`with the empty list `[]`, and that's it.
54
+
The value relation for the `()` type relates the empty tuple `Tuple([])` (assuming we can use array notation to "match" on `Vec`) with the empty byte list `[]`, and that's it.
54
55
55
56
### `!`
56
57
57
58
The value relation for the `!` type is empty: nothing is related to anything at this type.
58
59
59
60
### `#[repr(C)] struct Pair<T, U>(T, U)`
60
61
61
-
The value relation for `Pair`us based on the value relations for `T` and `U`.
62
-
A value `Tuple([t, u])`(assuming we can use array notation to "match" on `Vec`) is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if:
62
+
The value relation for `Pair`is based on the value relations for `T` and `U`.
63
+
A value `Tuple([t, u])` is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if:
63
64
64
65
*`t` is represented by `rt` at type `T`.
65
66
*`u` is represented by `ru` at type `U`.
66
67
* The length of `rt ++ pad1` is equal to the offset of the `U` field.
67
68
* The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair<T, U>`.
68
69
69
-
This relation demonstrates that value of type `Pair` are always 2-tuples (aka, pairs).
70
-
It also shows that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length.
70
+
This relation specifies that values of type `Pair` are always 2-tuples (aka, pairs).
71
+
It also says that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length.
72
+
So, for example when considering `Pair<u8, u16>`, the value `Tuple[42, 119]` is represented on a little-endian target by `[Raw(42), byte, Raw(119), Raw(0)]` for *any*`byte: Byte`.
71
73
72
74
### `&T`/`&mut T`
73
75
@@ -80,9 +82,14 @@ A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment
80
82
For the value representation of integer types, there are two different reasonable choices.
81
83
Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_int() == Some(i)`.
82
84
83
-
And then, maybe, we also want to additionally say that value `Uninit` is related to `[Uninit]`.
85
+
And then, maybe, we also want to additionally say that value `Uninit` is related to byte list `[Uninit]`.
84
86
This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value).
85
87
88
+
### `union`
89
+
90
+
The `union` type does not even try to interpret memory, so for a `union` of size `n`, the value relation says that for any byte list `bytes` of that length, `RawBag(bytes)` is related to `bytes`.
91
+
(Note however that [this definition might not be implementable](https://github.com/rust-lang/unsafe-code-guidelines/issues/156).)
92
+
86
93
## The role of the value representation in the operational semantics
87
94
88
95
One key use of the value representation is to define a "typed" interface to memory:
@@ -97,7 +104,7 @@ trait TypedMemory: Memory {
97
104
}
98
105
```
99
106
100
-
here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler).
107
+
Here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler).
101
108
We can implement `TypedMemory` for any `Memory` as follows:
102
109
* For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB.
103
110
* For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB.
@@ -109,13 +116,13 @@ This also means that for types that have padding, the "typed copy" does not pres
109
116
110
117
## Relation to validity invariant
111
118
112
-
One way we *could* also use the value representation (and the author things this is exceedingly elegant) is to define the validity invariant.
119
+
One way we *could* also use the value representation (and the author thinks this is exceedingly elegant) is to define the validity invariant.
113
120
Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`.
114
121
We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation".
115
122
For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references.
116
123
The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`.
117
124
This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice.
118
125
119
-
Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like that is needed to justify the `noalias` attribute.
126
+
Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like the `noalias` attribute.
0 commit comments