Skip to content

Initial Glossary entry for Aliasing #170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jul 25, 2019
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 60 additions & 21 deletions reference/src/glossary.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,43 @@
## Glossary

#### Aliasing

(Please note: a full aliasing model for Rust has not yet been constructed, but
at the moment we can give the following definition.)

*Aliasing* is any time one pointer or reference points to a "span" of memory
that overlaps with the span of another pointer or reference. A span of memory is
similar to how a slice works: there's a base byte address as well as a length in
bytes.

Consider the following example:

```rust
fn main() {
let u: u64 = 7_u64;
let r: &u64 = &u;
let s: &[u8] = unsafe {
core::slice::from_raw_parts(&u as *const u64 as *const u8, 8)
};
let (head, tail) = s.split_first().unwrap();
}
```

In this case, both `r` and `s` alias each other, since they both point to all of
the bytes of `u`.

However, `head` and `tail` do not alias each other: `head` points to the first
byte of `u` and `tail` points to the other seven bytes of `u` after it.

* The span length of `&T`, `&mut T`, `*const T`, or `*mut T` when `T` is
[`Sized`](https://doc.rust-lang.org/core/marker/trait.Sized.html) is
`size_of<T>()`.
* When `T` is not `Sized` the span length is `size_of_val(t)`.

One interesting side effect of these rules is that references and pointers to
Zero Sized Types _never_ alias each other, because their span length is always 0
bytes.

#### Interior mutability

*Interior Mutation* means mutating memory where there also exists a live shared reference pointing to the same memory; or mutating memory through a pointer derived from a shared reference.
Expand All @@ -14,22 +52,15 @@ If data immediately pointed to by a `*const T` or `&*const T` is mutated, that's
*Interior mutability* refers to the ability to perform interior mutation without causing UB.
All interior mutation in Rust has to happen inside an [`UnsafeCell`](https://doc.rust-lang.org/core/cell/struct.UnsafeCell.html), so all data structures that have interior mutability must (directly or indirectly) use `UnsafeCell` for this purpose.

#### Validity and safety invariant
#### Layout

The *validity invariant* is an invariant that all data must uphold any time it is accessed or copied in a typed manner.
This invariant is known to the compiler and exploited by optimizations such as improved enum layout or eliding in-bounds checks.
The *layout* of a type defines its size and alignment as well as the offsets of its subobjects (e.g. fields of structs/unions/enum/... or elements of arrays).
Moreover, the layout of a type records its *function call ABI* (or just *ABI* for short): how the type is passed *by value* across a function boundary.

In terms of MIR statements, "accessed or copied" means whenever an assignment statement is executed.
That statement has a type (LHS and RHS must have the same type), and the data being assigned must be valid at that type.
Moreover, arguments passed to a function must be valid at the type given in the callee signature, and the return value of a function must be valid at the type given in the caller signature.
OPEN QUESTION: Are there more cases where data must be valid?
Note: Originally, *layout* and *representation* were treated as synonyms, and Rust language features like the `#[repr]` attribute reflect this.
In this document, *layout* and *representation* are not synonyms.

In terms of code, some data computed by `TERM` is valid at type `T` if and only if the following program does not have UB:
```rust,ignore
fn main() { unsafe {
let t: T = std::mem::transmute(TERM);
} }
```
#### Safety Invariant

The *safety* invariant is an invariant that safe code may assume all data to uphold.
This invariant is used to justify which operations safe code can perform.
Expand All @@ -51,14 +82,6 @@ Moreover, such unsafe code must not return a non-UTF-8 string to the "outside" o
To summarize: *Data must always be valid, but it only must be safe in safe code.*
For some more information, see [this blog post](https://www.ralfj.de/blog/2018/08/22/two-kinds-of-invariants.html).

#### Layout

The *layout* of a type defines its size and alignment as well as the offsets of its subobjects (e.g. fields of structs/unions/enum/... or elements of arrays).
Moreover, the layout of a type records its *function call ABI* (or just *ABI* for short): how the type is passed *by value* across a function boundary.

Note: Originally, *layout* and *representation* were treated as synonyms, and Rust language features like the `#[repr]` attribute reflect this.
In this document, *layout* and *representation* are not synonyms.

#### Niche

The *niche* of a type determines invalid bit-patterns that will be used by layout optimizations.
Expand All @@ -73,6 +96,22 @@ niches. For example, the "all bits uninitialized" is an invalid bit-pattern for
`&mut T`, but this bit-pattern cannot be used by layout optimizations, and is not a
niche.

#### Validity Invariant

The *validity invariant* is an invariant that all data must uphold any time it is accessed or copied in a typed manner.
This invariant is known to the compiler and exploited by optimizations such as improved enum layout or eliding in-bounds checks.

In terms of MIR statements, "accessed or copied" means whenever an assignment statement is executed.
That statement has a type (LHS and RHS must have the same type), and the data being assigned must be valid at that type.
Moreover, arguments passed to a function must be valid at the type given in the callee signature, and the return value of a function must be valid at the type given in the caller signature.
OPEN QUESTION: Are there more cases where data must be valid?

In terms of code, some data computed by `TERM` is valid at type `T` if and only if the following program does not have UB:
```rust,ignore
fn main() { unsafe {
let t: T = std::mem::transmute(TERM);
} }
```

### TODO

Expand Down