diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 07a3d602..bf6d257f 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -1,6 +1,6 @@ ## Glossary -#### Aliasing +### Aliasing *Aliasing* occurs when one pointer or reference points to a "span" of memory that overlaps with the span of another pointer or reference. A span of memory is @@ -55,7 +55,77 @@ somewhat differently from this definition. However, that's considered a low level detail of a particular Rust implementation. When programming Rust, the Abstract Rust Machine is intended to operate according to the definition here. -#### (Pointer) Provenance +### Interior mutability + +*Interior Mutation* means mutating memory where there also exists a live shared reference pointing to the same memory; or mutating memory through a pointer derived from a shared reference. +"live" here means a value that will be "used again" later. +"derived from" means that the pointer was obtained by casting a shared reference and potentially adding an offset. +This is not yet precisely defined, which will be fixed as part of developing a precise aliasing model. + +Finding live shared references propagates recursively through references, but not through raw pointers. +So, for example, if data immediately pointed to by a `&T` or `& &mut T` is mutated, that's interior mutability. +If data immediately pointed to by a `*const T` or `&*const T` is mutated, that's *not* interior mutability. + +*Interior mutability* refers to the ability to perform interior mutation without causing UB. +All interior mutation in Rust has to happen inside an [`UnsafeCell`](https://doc.rust-lang.org/core/cell/struct.UnsafeCell.html), so all data structures that have interior mutability must (directly or indirectly) use `UnsafeCell` for this purpose. + +### Layout +[layout]: #layout + +The *layout* of a type defines its size and alignment as well as the offsets of its subobjects (e.g. fields of structs/unions/enum/... or elements of arrays). +Moreover, the layout of a type records its *function call ABI* (or just *ABI* for short): how the type is passed *by value* across a function boundary. + +Note: Originally, *layout* and *representation* were treated as synonyms, and Rust language features like the `#[repr]` attribute reflect this. +In this document, *layout* and *representation* are not synonyms. + +### Niche + +The *niche* of a type determines invalid bit-patterns that will be used by layout optimizations. + +For example, `&mut T` has at least one niche, the "all zeros" bit-pattern. This +niche is used by layout optimizations like ["`enum` discriminant +elision"](layout/enums.html#discriminant-elision-on-option-like-enums) to +guarantee that `Option<&mut T>` has the same size as `&mut T`. + +While all niches are invalid bit-patterns, not all invalid bit-patterns are +niches. For example, the "all bits uninitialized" is an invalid bit-pattern for +`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a +niche. + +### Padding +[padding]: #padding + +*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. + +Padding can be thought of as the type containing secret fields of type `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties: +* `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit`. +* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized. + +Note that padding is a property of the *type* and not the memory: reading from the padding of an `&Foo` (by casting to a byte reference) may produce initialized values if the `&Foo` is pointing to memory that was initialized (for example, if it was originally a byte buffer initialized to `0`), but the moment you perform a typed copy out of that reference you will have uninitialized padding bytes in the copy. + + +We can also define padding in terms of the [representation relation]: +A byte at index `i` is a padding byte for type `T` if, +for all values `v` and lists of bytes `b` such that `v` and `b` are related at `T` (let's write this `Vrel_T(v, b)`), +changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`). +In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any. + +This definition works fine for product types (structs, tuples, arrays, ...). +The desired notion of "padding byte" for enums and unions is still unclear. + +### Place + +A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr]. +A place is basically a pointer (pointing to some location in memory, potentially carrying [provenance](#pointer-provenance)), but might contain more information such as size or alignment (the details will have to be determined as the Rust Abstract Machine gets specified more precisely). +A place has a type, indicating the type of [values](#value) that it stores. + +The key operations on a place are: +* Storing a [value](#value) of the same type in it (when it is used on the left-hand side of an assignment). +* Loading a [value](#value) of the same type from it (through the place-to-value coercion). +* Converting between a place (of type `T`) and a pointer value (of type `&T`, `&mut T`, `*const T` or `*mut T`) using the `&` and `*` operators. + This is also the only way a place can be "stored": by converting it to a value first. + +### Pointer Provenance The *provenance* of a pointer is used to distinguish pointers that point to the same memory address (i.e., pointers that, when cast to `usize`, will compare equal). Provenance is extra state that only exists in the Rust Abstract Machine; it is needed to specify program behavior but not present any more when the program runs on real hardware. @@ -95,21 +165,45 @@ For some more information, see [this document proposing a more precise definitio Another example of pointer provenance is the "tag" from [Stacked Borrows][stacked-borrows]. For some more information, see [this blog post](https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html). -#### Interior mutability +### Representation (relation) +[representation relation]: #representation-relation -*Interior Mutation* means mutating memory where there also exists a live shared reference pointing to the same memory; or mutating memory through a pointer derived from a shared reference. -"live" here means a value that will be "used again" later. -"derived from" means that the pointer was obtained by casting a shared reference and potentially adding an offset. -This is not yet precisely defined, which will be fixed as part of developing a precise aliasing model. +A *representation* of a [value](#value) is a list of bytes that is used to store or "represent" that value in memory. -Finding live shared references propagates recursively through references, but not through raw pointers. -So, for example, if data immediately pointed to by a `&T` or `& &mut T` is mutated, that's interior mutability. -If data immediately pointed to by a `*const T` or `&*const T` is mutated, that's *not* interior mutability. +We also sometimes speak of the *representation of a type*; this should more correctly be called the *representation relation* as it relates values of this type to lists of bytes that represent this value. +The term "relation" here is used in the mathematical sense: the representation relation is a predicate that, given a value and a list of bytes, says whether this value is represented by that list of bytes (`val -> list byte -> Prop`). -*Interior mutability* refers to the ability to perform interior mutation without causing UB. -All interior mutation in Rust has to happen inside an [`UnsafeCell`](https://doc.rust-lang.org/core/cell/struct.UnsafeCell.html), so all data structures that have interior mutability must (directly or indirectly) use `UnsafeCell` for this purpose. +The relation should be functional for a fixed list of bytes (i.e., every list of bytes has at most one associated representation). +It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`). +For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes). + +See the [value domain][value-domain] for an example how values and representation relations can be made more precise. + +### Soundness (of code / of a library) +[soundness]: #soundness-of-code--of-a-library + +*Soundness* is a type system concept (actually originating from the study of logics) and means that the type system is "correct" in the sense that well-typed programs actually have the desired properties. +For Rust, this means well-typed programs cannot cause [Undefined Behavior][ub]. +This promise only extends to safe code however; for `unsafe` code, it is up to the programmer to uphold this contract. + +Accordingly, we say that a library (or an individual function) is *sound* if it is impossible for safe code to cause Undefined Behavior using its public API. +Conversely, the library/function is *unsound* if safe code *can* cause Undefined Behavior. + +### Undefined Behavior +[ub]: #undefined-behavior + +*Undefined Behavior* is a concept of the contract between the Rust programmer and the compiler: +The programmer promises that the code exhibits no undefined behavior. +In return, the compiler promises to compile the code in a way that the final program does on the real hardware what the source program does according to the Rust Abstract Machine. +If it turns out the program *does* have undefined behavior, the contract is void, and the program produced by the compiler is essentially garbage (in particular, it is not bound by any specification; the program does not even have to be well-formed executable code). + +In Rust, the [Nomicon](https://doc.rust-lang.org/nomicon/what-unsafe-does.html) and the [Reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) both have a list of behavior that the language considers undefined. +Rust promises that safe code cannot cause Undefined Behavior---the compiler and authors of unsafe code takes the burden of this contract on themselves. +For unsafe code, however, the burden is still on the programmer. -#### Validity and safety invariant +Also see: [Soundness][soundness]. + +### Validity and safety invariant The *validity invariant* is an invariant that all data must uphold any time it is accessed or copied in a typed manner. This invariant is known to the compiler and exploited by optimizations such as improved enum layout or eliding in-bounds checks. @@ -146,96 +240,7 @@ Moreover, such unsafe code must not return a non-UTF-8 string to the "outside" o To summarize: *Data must always be valid, but it only must be safe in safe code.* For some more information, see [this blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html). -#### Undefined Behavior -[ub]: #undefined-behavior - -*Undefined Behavior* is a concept of the contract between the Rust programmer and the compiler: -The programmer promises that the code exhibits no undefined behavior. -In return, the compiler promises to compile the code in a way that the final program does on the real hardware what the source program does according to the Rust Abstract Machine. -If it turns out the program *does* have undefined behavior, the contract is void, and the program produced by the compiler is essentially garbage (in particular, it is not bound by any specification; the program does not even have to be well-formed executable code). - -In Rust, the [Nomicon](https://doc.rust-lang.org/nomicon/what-unsafe-does.html) and the [Reference](https://doc.rust-lang.org/reference/behavior-considered-undefined.html) both have a list of behavior that the language considers undefined. -Rust promises that safe code cannot cause Undefined Behavior---the compiler and authors of unsafe code takes the burden of this contract on themselves. -For unsafe code, however, the burden is still on the programmer. - -Also see: [Soundness][soundness]. - -#### Soundness (of code / of a library) -[soundness]: #soundness-of-code--of-a-library - -*Soundness* is a type system concept (actually originating from the study of logics) and means that the type system is "correct" in the sense that well-typed programs actually have the desired properties. -For Rust, this means well-typed programs cannot cause [Undefined Behavior][ub]. -This promise only extends to safe code however; for `unsafe` code, it is up to the programmer to uphold this contract. - -Accordingly, we say that a library (or an individual function) is *sound* if it is impossible for safe code to cause Undefined Behavior using its public API. -Conversely, the library/function is *unsound* if safe code *can* cause Undefined Behavior. - -#### Layout -[layout]: #layout - -The *layout* of a type defines its size and alignment as well as the offsets of its subobjects (e.g. fields of structs/unions/enum/... or elements of arrays). -Moreover, the layout of a type records its *function call ABI* (or just *ABI* for short): how the type is passed *by value* across a function boundary. - -Note: Originally, *layout* and *representation* were treated as synonyms, and Rust language features like the `#[repr]` attribute reflect this. -In this document, *layout* and *representation* are not synonyms. - -#### Niche - -The *niche* of a type determines invalid bit-patterns that will be used by layout optimizations. - -For example, `&mut T` has at least one niche, the "all zeros" bit-pattern. This -niche is used by layout optimizations like ["`enum` discriminant -elision"](layout/enums.html#discriminant-elision-on-option-like-enums) to -guarantee that `Option<&mut T>` has the same size as `&mut T`. - -While all niches are invalid bit-patterns, not all invalid bit-patterns are -niches. For example, the "all bits uninitialized" is an invalid bit-pattern for -`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a -niche. - -#### Zero-sized type / ZST - -Types with zero size are called zero-sized types, which is abbreviated as "ZST". -This document also uses the "1-ZST" abbreviation, which stands for "one-aligned -zero-sized type", to refer to zero-sized types with an alignment requirement of 1. - -For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment -requirement of 2. - -#### Padding -[padding]: #padding - -*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. - -Padding can be thought of as the type containing secret fields of type `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties: -* `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit`. -* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized. - -Note that padding is a property of the *type* and not the memory: reading from the padding of an `&Foo` (by casting to a byte reference) may produce initialized values if the `&Foo` is pointing to memory that was initialized (for example, if it was originally a byte buffer initialized to `0`), but the moment you perform a typed copy out of that reference you will have uninitialized padding bytes in the copy. - - -We can also define padding in terms of the [representation relation]: -A byte at index `i` is a padding byte for type `T` if, -for all values `v` and lists of bytes `b` such that `v` and `b` are related at `T` (let's write this `Vrel_T(v, b)`), -changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`). -In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any. - -This definition works fine for product types (structs, tuples, arrays, ...). -The desired notion of "padding byte" for enums and unions is still unclear. - -#### Place - -A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr]. -A place is basically a pointer (pointing to some location in memory, potentially carrying [provenance](#pointer-provenance)), but might contain more information such as size or alignment (the details will have to be determined as the Rust Abstract Machine gets specified more precisely). -A place has a type, indicating the type of [values](#value) that it stores. - -The key operations on a place are: -* Storing a [value](#value) of the same type in it (when it is used on the left-hand side of an assignment). -* Loading a [value](#value) of the same type from it (through the place-to-value coercion). -* Converting between a place (of type `T`) and a pointer value (of type `&T`, `&mut T`, `*const T` or `*mut T`) using the `&` and `*` operators. - This is also the only way a place can be "stored": by converting it to a value first. - -#### Value +### Value A *value* (called "value of the expression" or "rvalue" in C and "prvalue" in C++) is what gets stored in a [place](#place), and also the result of computing a [*value expression*][place-value-expr]. A value has a type, and it denotes the abstract mathematical concept that is represented by data in our programs. @@ -245,19 +250,14 @@ Values can be (according to their type) turned into a list of bytes, which is ca Values are ephemeral; they arise during the computation of an instruction but are only ever persisted in memory through their representation. (This is comparable to how run-time data in a program is ephemeral and is only ever persisted in serialized form.) -#### Representation (relation) -[representation relation]: #representation-relation +### Zero-sized type / ZST -A *representation* of a [value](#value) is a list of bytes that is used to store or "represent" that value in memory. - -We also sometimes speak of the *representation of a type*; this should more correctly be called the *representation relation* as it relates values of this type to lists of bytes that represent this value. -The term "relation" here is used in the mathematical sense: the representation relation is a predicate that, given a value and a list of bytes, says whether this value is represented by that list of bytes (`val -> list byte -> Prop`). - -The relation should be functional for a fixed list of bytes (i.e., every list of bytes has at most one associated representation). -It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`). -For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes). +Types with zero size are called zero-sized types, which is abbreviated as "ZST". +This document also uses the "1-ZST" abbreviation, which stands for "one-aligned +zero-sized type", to refer to zero-sized types with an alignment requirement of 1. -See the [value domain][value-domain] for an example how values and representation relations can be made more precise. +For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment +requirement of 2. [stacked-borrows]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md [value-domain]: https://github.com/rust-lang/unsafe-code-guidelines/tree/master/wip/value-domain.md