From cec567184b034a29a75489ab618bd5423ea40d74 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 14 Aug 2019 22:31:44 +0200 Subject: [PATCH 1/5] define padding with hypothetical Pad type --- reference/src/glossary.md | 8 ++++++++ wip/value-domain.md | 8 +++++++- 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 5d1fedbd..71f9c0d3 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -177,6 +177,14 @@ zero-sized type", to refer to zero-sized types with an alignment requirement of For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment requirement of 2. +#### Padding + +*Padding* refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. + +Padding can be though of as `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties: +* `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit`. +* Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized. + #### Place A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr]. diff --git a/wip/value-domain.md b/wip/value-domain.md index 0c840a5d..f49eff66 100644 --- a/wip/value-domain.md +++ b/wip/value-domain.md @@ -55,6 +55,12 @@ The value relation for `bool` relates `Bool(b)` with `[r]` if and only if `r.as_ The value relation for the `()` type relates the empty tuple `Tuple([])` (assuming we can use array notation to "match" on `Vec`) with the empty byte list `[]`, and that's it. +### `Pad` + +The [hypothetical `Pad` type][pad] relates the empty tuple `Tuple([])` with any byte list of length 1. + +[pad]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/glossary.md#padding + ### `!` The value relation for the `!` type is empty: nothing is related to anything at this type. @@ -121,7 +127,7 @@ We can implement `TypedMemory` for any `Memory` as follows: In particular, this defines the semantics of a "typed copy": when an assignment is executed in Rust (at some type `T`), or when an argument of type `T` is passed to a function or a return value of type `T` returned from a function, this is called a "typed copy". Its semantics is basically to do a `typed_read` at `T` followed by a `typed_write` at `T`. In particular, this means doing a "typed copy" at `T` of some memory that has no valid representation at `T` is undefined behavior! -This also means that for types that have padding, the "typed copy" does not preserve the padding bytes. +This also means that for `Pad` and more generally for types that have padding, the "typed copy" does not preserve the padding bytes. ## Relation to validity invariant From 2e5f2dadbfac3664317bae47da9e6ac00df750d5 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 14 Aug 2019 22:36:59 +0200 Subject: [PATCH 2/5] repr relation def --- reference/src/glossary.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 71f9c0d3..93ce056b 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -179,12 +179,18 @@ requirement of 2. #### Padding -*Padding* refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. +*Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. Padding can be though of as `[Pad; N]` for some hypothetical type `Pad` (of size 1) with the following properties: * `Pad` is valid for any byte, i.e., it has the same validity invariant as `MaybeUninit`. * Copying `Pad` ignores the source byte, and writes *any* value to the target byte. Or, equivalently (in terms of Abstract Machine behavior), copying `Pad` marks the target byte as uninitialized. +We can also define padding in terms of the [representation relation]: +A byte at index `i` is a padding byte for type `T` if, +for all values `v` and lists of bytes `b` such that `v` and `b` are related at `T` (let's write this `Vrel_T(v, b)`), +changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`). +In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any. + #### Place A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr]. @@ -208,6 +214,7 @@ Values are ephemeral; they arise during the computation of an instruction but ar (This is comparable to how run-time data in a program is ephemeral and is only ever persisted in serialized form.) #### Representation (relation) +[representation relation]: #representation-relation A *representation* of a [value](#value) is a list of bytes that is used to store or "represent" that value in memory. From 7511eb09a557d3ac8ad3fc37d504959dfd6dd968 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 14 Aug 2019 23:36:31 +0200 Subject: [PATCH 3/5] cross-ref --- reference/src/glossary.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 93ce056b..77db6371 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -178,6 +178,7 @@ For example, `()` is a "1-ZST" but `[u16; 0]` is not because it has an alignment requirement of 2. #### Padding +[padding]: #padding *Padding* (of a type `T`) refers to the space that the compiler leaves between fields of a struct or enum variant to satisfy alignment requirements, and before/after variants of a union or enum to make all variants equally sized. @@ -223,7 +224,7 @@ The term "relation" here is used in the mathematical sense: the representation r The relation should be functional for a fixed list of bytes (i.e., every list of bytes has at most one associated representation). It is partial in both directions: not all values have a representation (e.g. the mathematical integer `300` has no representation at type `u8`), and not all lists of bytes correspond to a value of a specific type (e.g. lists of the wrong size correspond to no value, and the list consisting of the single byte `0x10` corresponds to no value of type `bool`). -For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a padding byte so changing it does not affect the value represented by a list of bytes). +For a fixed value, there can be many representations (e.g., when considering type `#[repr(C)] Pair(u8, u16)`, the second byte is a [padding byte][padding] so changing it does not affect the value represented by a list of bytes). See the [value domain][value-domain] for an example how values and representation relations can be made more precise. From 57b5de9dc6c31b99dec60b929332fe9f10f50d60 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 15 Aug 2019 10:57:28 +0200 Subject: [PATCH 4/5] enum and union are unclear --- reference/src/glossary.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index 77db6371..ea465c04 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -192,6 +192,9 @@ for all values `v` and lists of bytes `b` such that `v` and `b` are related at ` changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`). In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any. +This definition works fine for structs. +The desired notion of "padding byte" for enums and unions is still unclear. + #### Place A *place* (called "lvalue" in C and "glvalue" in C++) is the result of computing a [*place expression*][place-value-expr]. From 7ab59d4afd9a96d29604155b751814dfdb6ffa07 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 15 Aug 2019 14:22:19 +0200 Subject: [PATCH 5/5] not just structs --- reference/src/glossary.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/glossary.md b/reference/src/glossary.md index ea465c04..84a009ee 100644 --- a/reference/src/glossary.md +++ b/reference/src/glossary.md @@ -192,7 +192,7 @@ for all values `v` and lists of bytes `b` such that `v` and `b` are related at ` changing `b` at index `i` to any other byte yields a `b'` such `v` and `b'` are related (`Vrel_T(v, b')`). In other words, the byte at index `i` is entirely ignored by `Vrel_T` (the value relation for `T`), and two lists of bytes that only differ in padding bytes relate to the same value(s), if any. -This definition works fine for structs. +This definition works fine for product types (structs, tuples, arrays, ...). The desired notion of "padding byte" for enums and unions is still unclear. #### Place