Skip to content

Commit 499fa9e

Browse files
committed
begin with value domain
1 parent d0c9729 commit 499fa9e

File tree

2 files changed

+137
-12
lines changed

2 files changed

+137
-12
lines changed

wip/memory-interface.md

+17-12
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ This encodes the fact that in Rust, *operations are typed, but memory is not*---
1616

1717
One key question a memory model has to answer is *what is a pointer*.
1818
It might seem like the answer is just "an integer of appropriate size", but [that is not the case][pointers-complicated].
19+
This becomes even more prominent with aliasing models such as [Stacked Borrows].
1920
So we will leave this question open, and treat `Pointer` as an "associated type" of the memory interface
2021

2122
## Bytes
@@ -31,7 +32,7 @@ enum Byte<Pointer> {
3132
/// An uninitialized byte.
3233
Uninit,
3334
/// One byte of a pointer.
34-
Pointer {
35+
PtrFragment {
3536
/// The pointer of which this is a byte.
3637
ptr: Pointer,
3738
/// Which byte of the pointer this is.
@@ -41,6 +42,12 @@ enum Byte<Pointer> {
4142
}
4243
```
4344

45+
The purpose of `PtrFragment` is to be able to have a byte-wise representation of a `Pointer`.
46+
On a 32-bit system, the sequence of 4 bytes representing `ptr: Pointer` is:
47+
```
48+
[PtrFragment { ptr, idx: 0 }, PtrFragment { ptr, idx: 1 }, PtrFragment { ptr, idx: 2 }, PtrFragment { ptr, idx: 3 }]
49+
```
50+
4451
## Memory interface
4552

4653
The Rust memory interface is described by the following (not-yet-complete) trait definition:
@@ -51,34 +58,32 @@ The Rust memory interface is described by the following (not-yet-complete) trait
5158
/// We also let all operations potentially mutated memory. For example, reads
5259
/// actually do change the current state when considering concurrency or
5360
/// Stacked Borrows.
61+
/// And finally, all operations are fallible (they return `Result`); if they
62+
/// fail, that means the program caused UB.
5463
trait Memory {
5564
/// The type of pointer values.
5665
type Pointer;
5766

58-
/// The type of memory errors (i.e., ways in which the program can cause UB
59-
/// by interacting with memory).
60-
type Error;
61-
6267
/// Create a new allocation.
63-
fn allocate(&mut self, size: u64, align: u64) -> Result<Self::Pointer, Self::Error>;
68+
fn allocate(&mut self, size: u64, align: u64) -> Result<Self::Pointer, Error>;
6469

6570
/// Remove an allocation.
66-
fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result<(), Self::Error>;
71+
fn deallocate(&mut self, ptr: Self::Pointer, size: u64, align: u64) -> Result<(), Error>;
6772

6873
/// Write some bytes to memory.
69-
fn write(&mut self, ptr: Self::Pointer, bytes: Vec<Byte<Self::Pointer>>) -> Result<(), Self::Error>;
74+
fn write(&mut self, ptr: Self::Pointer, bytes: Vec<Byte<Self::Pointer>>) -> Result<(), Error>;
7075

7176
/// Read some bytes from memory.
72-
fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result<Vec<Byte<Self::Pointer>>, Self::Error>;
77+
fn read(&mut self, ptr: Self::Pointer, len: u64) -> Result<Vec<Byte<Self::Pointer>>, Error>;
7378

7479
/// Offset the given pointer.
75-
fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result<Self::Pointer, Self::Error>;
80+
fn offset(&mut self, ptr: Self::Pointer, offset: u64, mode: OffsetMode) -> Result<Self::Pointer, Error>;
7681

7782
/// Cast the given pointer to an integer.
78-
fn ptr_to_int(&mut self, ptr: Self::Pointer) -> Result<u64, Self::Error>;
83+
fn ptr_to_int(&mut self, ptr: Self::Pointer) -> Result<u64, Error>;
7984

8085
/// Cast the given integer to a pointer.
81-
fn int_to_ptr(&mut self, int: u64) -> Result<Self::Pointer, Self::Error>;
86+
fn int_to_ptr(&mut self, int: u64) -> Result<Self::Pointer, Error>;
8287
}
8388

8489
/// The rules applying to this pointer offset operation.

wip/value-domain.md

+120
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Rust Value Domain
2+
3+
**Note:** This document is not normative nor endorsed by the UCG WG. Its purpose is to be the basis for discussion and to set down some key terminology.
4+
5+
The purpose of this document is to describe what the set of *all possible values* is in Rust.
6+
This is an important definition: one key element of a Rust specification will be to define the [representation relation][representation] of every type.
7+
This relation relates values with lists of bytes, so before we can even start specifying the relation we have to specify the involved domains.
8+
`Byte` is defined as part of [the memory interface][memory-interface]; this document is about defining `Value`.
9+
10+
[representation]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#representation
11+
[memory-interface]: memory-interface.md
12+
13+
## Values
14+
15+
The Rust value domain is described by the following (incomplete) type definition (in terms of an arbitrary `Pointer` type instantiated by [the memory interface][memory-interface]):
16+
17+
```rust
18+
enum Value<Pointer> {
19+
/// A mathematical integer.
20+
Int(BigInt),
21+
/// A Boolean value.
22+
Bool(bool),
23+
/// A pointer.
24+
Ptr(Pointer),
25+
/// A zero-sized "unit".
26+
Unit,
27+
/// An uninitialized value.
28+
Uninit,
29+
/// An n-tuple.
30+
Tuple(Vec<Self>),
31+
/// A variant of a sum type.
32+
Variant {
33+
idx: u64,
34+
data: Box<Self>,
35+
},
36+
/* ... */
37+
}
38+
```
39+
40+
As Rust grows, we might expand this definition. That is okay; all previously defined representation relations are still well-defined when the domain grows, the newly added values will just not be valid for old types as one would expect.
41+
42+
## Example value relations
43+
44+
We show some examples for how one might want to use this `Value` domain to define the value relation for a type.
45+
46+
### `bool`
47+
48+
The value relation for `bool` relates `Bool(true)` with `[Raw(0)]` and `Bool(false)` with [`Raw(1)`], and that's it.
49+
50+
### `()`
51+
52+
The value relation for the `()` type relates `Unit` with the empty list `[]`, and that's it.
53+
54+
### `!`
55+
56+
The value relation for the `!` type is empty: nothing is related to anything at this type.
57+
58+
### `#[repr(C)] struct Pair<T, U>(T, U)`
59+
60+
The value relation for `Pair` us based on the value relations for `T` and `U`.
61+
A value `Tuple([t, u])` (assuming we can use array notation to "match" on `Vec`) is represented by a list of bytes `rt ++ pad1 ++ ru ++ pad2` (using `++` for list concatenation) if:
62+
63+
* `t` is represented by `rt` at type `T`.
64+
* `u` is represented by `ru` at type `U`.
65+
* The length of `rt ++ pad1` is equal to the offset of the `U` field.
66+
* The length of the entire list `rt ++ pad1 ++ ru ++ pad2` is equal to the size of `Pair<T, U>`.
67+
68+
This relation demonstrates that value of type `Pair` are always 2-tuples (aka, pairs).
69+
It also shows that the actual content of the padding bytes is entirely irrelevant, we only care to have the right number of them to "pad" `ru` to the right place and to "pad" the entire list to have the right length.
70+
71+
### `&T`/`&mut T`
72+
73+
Reference types are tricky.
74+
But a possible value relation for sized `T` is:
75+
A value `Ptr(ptr)` is related to `[PtrFragment { ptr, idx: 0 }, ..., PtrFragment { ptr, idx: N-1 }]` where `N == size_of::<usize>()` if `ptr` is non-NULL and aligned to `align_of::<T>()`.
76+
77+
### `u8`
78+
79+
For the value representation of integer types, there are two different reasonable choices.
80+
Certainly, a value `Int(i)` where `i` in `0..256` is related to `[Raw(i as u8)]`.
81+
82+
And then, maybe, we also want to additionally say that value `Uninit` is related to `[Uninit]`.
83+
This essentially corresponds to saying that uninitialized memory is a valid representation of a `u8` value (namely, the uninitialized value).
84+
85+
## The role of the value representation in the operational semantics
86+
87+
One key use of the value representation is to define a "typed" interface to memory:
88+
89+
```rust
90+
trait TypedMemory: Memory {
91+
/// Write a value of the given type to memory.
92+
fn typed_write(&mut self, ptr: Self::Pointer, val: Value, ty: Type) -> Result<(), Error>;
93+
94+
/// Read a value of the given type.
95+
fn typed_read(&mut self, ptr: Self::Pointer, ty: Type) -> Result<Value, Error>;
96+
}
97+
```
98+
99+
here, `Type` is some representation of the Rust type system (akin to [`Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/type.Ty.html) in the compiler).
100+
We can implement `TypedMemory` for any `Memory` as follows:
101+
* For `typed_write`, pick any representation of `val` for `ty`, and call `Memory::write`. If no representation exists, we have UB.
102+
* For `typed_read`, read `ty.size()` many bytes from memory, and then determine which value this list of bytes represents. If it does not represent any value, we have UB.
103+
104+
In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type `T`), or when an argument of type `T` is passed to a function or a return value of type `T` returned from a function, this is called a "typed copy".
105+
Its semantics is basically to do a `typed_read` at `T` followed by a `typed_write` at `T`.
106+
In particular, this means doing a "typed copy" at `T` of some memory that has no valid representation at `T` is undefined behavior!
107+
This also means that for types that have padding, the "typed copy" does not preserve the padding bytes.
108+
109+
## Relation to validity invariant
110+
111+
One way we *could* also use the value representation (and the author things this is exceedingly elegant) is to define the validity invariant.
112+
Certainly, it is the case that if a list of bytes is not related to any value for a given type `T`, then that list of bytes is *invalid* for `T` and it should be UB to produce such a list of bytes at type `T`.
113+
We could decide that this is an "if and only if", i.e., that the validity invariant for a type is exactly "must be in the value representation".
114+
For many types this is likely what we will do anyway (e.g., for `bool` and `!` and `()` and integers), but for references, this choice would mean that *validity of the reference cannot depend on what memory looks like*---so "dereferencable" and "points to valid data" cannot be part of the validity invariant for references.
115+
The reason this is so elegant is that, as we have seen above, a "typed copy" already very naturally is UB when the memory that is copied is not a valid representation of `T`.
116+
This means we do not even need a special clause in our specification for the validity invariant---in fact, the term does not even have to appear in the specification---as everything juts falls out of how a "typed copy" applies the value representation twice.
117+
118+
Justifying the `dereferencable` LLVM attribute is, in this case, left to the aliasing model (e.g. [Stacked Borrows]), just like that is needed to justify the `noalias` attribute.
119+
120+
[Stacked Borrows]: stacked-borrows.md

0 commit comments

Comments
 (0)