Skip to content

Commit 3c99a2b

Browse files
committed
glossary: talk about bytes
Also define and use markdown links for more of our sections
1 parent 564612e commit 3c99a2b

File tree

1 file changed

+36
-7
lines changed

1 file changed

+36
-7
lines changed

Diff for: reference/src/glossary.md

+36-7
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,31 @@
11
## Glossary
22

3+
### Abstract Byte
4+
[abstract byte]: #abstract-byte
5+
6+
The "byte" is the smallest unit of storage in Rust.
7+
Memory allocations are thought of as storing a list of bytes, and at the lowest level each load return a list of bytes and each store takes a list of bytes and puts it into memory.
8+
(The [representation relation] then defines how to convert between those lists of bytes and higher-level values such as mathematical integers or pointers.)
9+
10+
However, a "byte" in the Rust Abstract Machine is more complicated than just a `u8` -- think if it as there being some extra "shadow state" that is relevant for the Abstract Machine execution (in particular, for whether this execution has UB), but that disappears when compiling the program to assembly.
11+
That's why we call it "abstract byte", to distinguish it from the physical machine byte that is represented by a `u8`.
12+
The most obvious "shadow state" is tracking whether memory is initialized.
13+
See [this blog post](https://www.ralfj.de/blog/2019/07/14/uninit.html) for details, but the gist of it is that bytes in memory are more like `Option<u8>` where `None` indicates that this byte is uninitialized.
14+
Operations like `copy` work on that representation, so if you copy from some uninitialized memory into initialized memory, the target memory becomes "de-initialized".
15+
Another piece of shadow state is [pointer provenance][provenance]: the Abstract Machine tracks the "origin" of each pointer value to enforce the rule that a pointer used to access some memory is "based on" the original pointer produced when that memory got allocated.
16+
This provenance must be preserved when the pointer is stored to memory and loaded again later, which implies that abstract bytes must be able to carry provenance.
17+
18+
Without committing to the exact shape of provenance in Rust, we can therefore say that an abstract byte in the Rust Abstract Machine looks as follows:
19+
20+
```rust
21+
pub enum AbstractByte<Provenance> {
22+
/// An uninitialized byte.
23+
Uninit,
24+
/// An initialized byte, optionally with some provenance (if it is encoding a pointer).
25+
Init(u8, Option<Provenance>),
26+
}
27+
```
28+
329
### Aliasing
430

531
*Aliasing* occurs when one pointer or reference points to a "span" of memory
@@ -114,18 +140,20 @@ This definition works fine for product types (structs, tuples, arrays, ...).
114140
The desired notion of "padding byte" for enums and unions is still unclear.
115141

116142
### Place
143+
[place]: #place
117144

118145
A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr].
119-
A place is basically a pointer (pointing to some location in memory, potentially carrying [provenance](#pointer-provenance)), but might contain more information such as size or alignment (the details will have to be determined as the Rust Abstract Machine gets specified more precisely).
120-
A place has a type, indicating the type of [values](#value) that it stores.
146+
A place is basically a pointer (pointing to some location in memory, potentially carrying [provenance]), but might contain more information such as size or alignment (the details will have to be determined as the Rust Abstract Machine gets specified more precisely).
147+
A place has a type, indicating the type of [values][value] that it stores.
121148

122149
The key operations on a place are:
123-
* Storing a [value](#value) of the same type in it (when it is used on the left-hand side of an assignment).
124-
* Loading a [value](#value) of the same type from it (through the place-to-value coercion).
150+
* Storing a [value] of the same type in it (when it is used on the left-hand side of an assignment).
151+
* Loading a [value] of the same type from it (through the place-to-value coercion).
125152
* Converting between a place (of type `T`) and a pointer value (of type `&T`, `&mut T`, `*const T` or `*mut T`) using the `&` and `*` operators.
126153
This is also the only way a place can be "stored": by converting it to a value first.
127154

128155
### Pointer Provenance
156+
[provenance]: #pointer-provenance
129157

130158
The *provenance* of a pointer is used to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal).
131159
Provenance is extra state that only exists in the Rust Abstract Machine; it is needed to specify program behavior but not present any more when the program runs on real hardware.
@@ -168,7 +196,7 @@ For some more information, see [this blog post](https://www.ralfj.de/blog/2018/0
168196
### Representation (relation)
169197
[representation relation]: #representation-relation
170198

171-
A *representation* of a [value](#value) is a list of bytes that is used to store or "represent" that value in memory.
199+
A *representation* of a [value] is a list of [(abstract) bytes][abstract byte] that is used to store or "represent" that value in memory.
172200

173201
We also sometimes speak of the *representation of a type*; this should more correctly be called the *representation relation* as it relates values of this type to lists of bytes that represent this value.
174202
The term "relation" here is used in the mathematical sense: the representation relation is a predicate that, given a value and a list of bytes, says whether this value is represented by that list of bytes (`val -> list byte -> Prop`).
@@ -241,12 +269,13 @@ To summarize: *Data must always be valid, but it only must be safe in safe code.
241269
For some more information, see [this blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html).
242270

243271
### Value
272+
[value]: #value
244273

245-
A *value* (called "value of the expression" or "rvalue" in C and "prvalue" in C++) is what gets stored in a [place](#place), and also the result of computing a [*value expression*][place-value-expr].
274+
A *value* (called "value of the expression" or "rvalue" in C and "prvalue" in C++) is what gets stored in a [place], and also the result of computing a [*value expression*][place-value-expr].
246275
A value has a type, and it denotes the abstract mathematical concept that is represented by data in our programs.
247276

248277
For example, a value of type `u8` is a mathematical integer in the range `0..256`.
249-
Values can be (according to their type) turned into a list of bytes, which is called a [representation](#representation) of the value.
278+
Values can be (according to their type) turned into a list of [(abstract) bytes][abstract byte], which is called a [representation][representation relation] of the value.
250279
Values are ephemeral; they arise during the computation of an instruction but are only ever persisted in memory through their representation.
251280
(This is comparable to how run-time data in a program is ephemeral and is only ever persisted in serialized form.)
252281

0 commit comments

Comments
 (0)