Skip to content

Commit ca891ac

Browse files
committed
more editing
1 parent 648ec02 commit ca891ac

File tree

2 files changed

+30
-23
lines changed

2 files changed

+30
-23
lines changed

wip/memory-interface.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,22 @@
44

55
The purpose of this document is to describe the interface between a Rust program and memory.
66
This interface is a key part of the Rust Abstract Machine: it lets us separate concerns by splitting the Machine (i.e., its specification) into two pieces, connected by this well-defined interface:
7-
* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order.
7+
* The *expression/statement semantics* of Rust boils down to explaining which "memroy events" (calls to the memory interface) happen in which order. This part of the specification is *pure* in the sense that it has no "state": everything that needs to be remembered from one expression evaluation to the next is communicated through memory.
88
* The Rust *memory model* explains which interactions with the memory are legal (the others are UB), and which values can be returned by reads.
99

10-
The interface is also opinionated in several ways; this is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust.
11-
For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult non-normative defect report responses, to see this).
10+
The interface shown below is also opinionated in several ways.
11+
It is not intended to be able to support *any imaginable* memory model, but rather start the process of reducing the design space of what we consider a "reasonable" memory model for Rust.
12+
For example, it explicitly acknowledges that pointers are not just integers and that uninitialized memory is special (both are true for C and C++ as well but you have to read the standard very careful, and consult [non-normative defect report responses](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm), to see this).
1213
Another key property of the interface presented below is that it is *untyped*.
13-
This encodes the fact that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules.
14-
At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), so we might have to revisit this choice later.
14+
This implies that in Rust, *operations are typed, but memory is not*---a key difference to C and C++ with their type-based strict aliasing rules.
15+
At the same time, the memory model provides a *side-effect free* way to turn pointers into "raw bytes", which is *not* [the direction C++ is moving towards](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2364.pdf), and we might have to revisit this choice later if it turns out to not be workable.
1516

1617
## Pointers
1718

1819
One key question a memory model has to answer is *what is a pointer*.
1920
It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated].
2021
This becomes even more prominent with aliasing models such as [Stacked Borrows].
21-
So we will leave this question open, and treat `Pointer` as an "associated type" of the memory interface
22+
So the interface will leave it up to the concrete instance to answer this question, and carry `Pointer` as an associated type.
2223

2324
## Bytes
2425

@@ -58,7 +59,7 @@ impl<Pointer: PtrToInt> Byte<Pointer> {
5859
Byte::Raw(int) => Some(int),
5960
Byte::Uninit => None,
6061
Byte::PtrFragment { ptr, idx } =>
61-
ptr.get_byte(idx),
62+
Some(ptr.get_byte(idx)),
6263
}
6364
}
6465
}
@@ -96,9 +97,10 @@ trait Memory {
9697
fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result<Vec<Byte<Self::Pointer>>, Error>;
9798

9899
/// Offset the given pointer.
99-
fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result<Self::Pointer, Error>;
100+
fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode)
101+
-> Result<Self::Pointer, Error>;
100102

101-
/// Cast the given integer to a pointer.
103+
/// Cast the given integer to a pointer. (The other direction is handled by `PtrToInt` below.)
102104
fn int_to_ptr(&mut self, int: u64) -> Result<Self::Pointer, Error>;
103105
}
104106

@@ -129,8 +131,6 @@ This is a very basic memory interface that is incomplete in at least the followi
129131
* To represent [Stacked Borrows], there needs to be a "retag" operation, and that one will in fact be "lightly typed" (it cares about `UnsafeCell`).
130132
* Maybe we want operations that can compare pointers without casting them to integers.
131133

132-
But I think it can still be useful to provide some basic terminology and grounds for further discussion.
133-
134134
[pointers-complicated]: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html
135135
[uninit]: https://www.ralfj.de/blog/2019/07/14/uninit.html
136136
[`Ordering`]: https://doc.rust-lang.org/nightly/core/sync/atomic/enum.Ordering.html

wip/value-domain.md

+19-12
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@ enum Value<Pointer> {
2323
Bool(bool),
2424
/// A pointer.
2525
Ptr(Pointer),
26-
/// A zero-sized "unit".
27-
Unit,
2826
/// An uninitialized value.
2927
Uninit,
3028
/// An n-tuple.
@@ -34,6 +32,8 @@ enum Value<Pointer> {
3432
idx: u64,
3533
data: Box<Self>,
3634
},
35+
/// A "bag of raw bytes".
36+
RawBag(Vec<Byte<Pointer>>),
3737
/* ... */
3838
}
3939
```
@@ -46,28 +46,30 @@ We show some examples for how one might want to use this `Value` domain to defin
4646

4747
### `bool`
4848

49-
The value relation for `bool` relates `Bool(b)` with `[bb]` if and only if `bb.as_int() == Some(if b { 1 } else { 0 })`.
49+
The value relation for `bool` relates `Bool(b)` with `[r]` if and only if `r.as_int() == Some(if b { 1 } else { 0 })`.
50+
(`as_int` is defined in [the memory interface][memory-interface].)
5051

5152
### `()`
5253

53-
The value relation for the `()` type relates `Unit` with the empty list `[]`, and that's it.
54+
The value relation for the `()` type relates the empty tuple `Tuple([])` (assuming we can use array notation to "match" on `Vec`) with the empty byte list `[]`, and that's it.
5455

5556
### `!`
5657

5758
The value relation for the `!` type is empty: nothing is related to anything at this type.
5859

5960
### `#[repr(C)] struct Pair<T, U>(T, U)`
6061

61-
The value relation for `Pair` us based on the value relations for `T` and `U`.
62-
A value `Tuple([t, u])` (assuming we can use array notation to "match" on `Vec`) is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if:
62+
The value relation for `Pair` is based on the value relations for `T` and `U`.
63+
A value `Tuple([t, u])` is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if:
6364

6465
* `t` is represented by `rt` at type `T`.
6566
* `u` is represented by `ru` at type `U`.
6667
* The length of `rt ++ pad1` is equal to the offset of the `U` field.
6768
* The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair<T, U>`.
6869

69-
This relation demonstrates that value of type `Pair` are always 2-tuples (aka, pairs).
70-
It also shows that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length.
70+
This relation specifies that values of type `Pair` are always 2-tuples (aka, pairs).
71+
It also says that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length.
72+
So, for example when considering `Pair<u8, u16>`, the value `Tuple[42, 119]` is represented on a little-endian target by `[Raw(42), byte, Raw(119), Raw(0)]` for *any* `byte: Byte`.
7173

7274
### `&T`/`&mut T`
7375

@@ -80,9 +82,14 @@ A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment
8082
For the value representation of integer types, there are two different reasonable choices.
8183
Certainly, a value `Int(i)` where `i` in `0..256` is related to `[b]` if `b.as_int() == Some(i)`.
8284

83-
And then, maybe, we also want to additionally say that value `Uninit` is related to `[Uninit]`.
85+
And then, maybe, we also want to additionally say that value `Uninit` is related to byte list `[Uninit]`.
8486
This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value).
8587

88+
### `union`
89+
90+
The `union` type does not even try to interpret memory, so for a `union` of size `n`, the value relation says that for any byte list `bytes` of that length, `RawBag(bytes)` is related to `bytes`.
91+
(Note however that [this definition might not be implementable](https://github.com/rust-lang/unsafe-code-guidelines/issues/156).)
92+
8693
## The role of the value representation in the operational semantics
8794

8895
One key use of the value representation is to define a "typed" interface to memory:
@@ -97,7 +104,7 @@ trait TypedMemory: Memory {
97104
}
98105
```
99106

100-
here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler).
107+
Here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler).
101108
We can implement `TypedMemory` for any `Memory` as follows:
102109
* For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB.
103110
* For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB.
@@ -109,13 +116,13 @@ This also means that for types that have padding, the "typed copy" does not pres
109116

110117
## Relation to validity invariant
111118

112-
One way we *could* also use the value representation (and the author things this is exceedingly elegant) is to define the validity invariant.
119+
One way we *could* also use the value representation (and the author thinks this is exceedingly elegant) is to define the validity invariant.
113120
Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`.
114121
We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation".
115122
For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references.
116123
The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`.
117124
This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice.
118125

119-
Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like that is needed to justify the `noalias` attribute.
126+
Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like the `noalias` attribute.
120127

121128
[Stacked Borrows]: stacked-borrows.md

0 commit comments

Comments
 (0)