From 1f3ef96ba8c6950d4a7c8eeff2c3821ad2376567 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 10 Jan 2019 11:51:25 -0500 Subject: [PATCH 01/14] add a chapter on enum representation --- reference/src/SUMMARY.md | 1 + reference/src/representation/enums.md | 376 ++++++++++++++++++++++++++ 2 files changed, 377 insertions(+) create mode 100644 reference/src/representation/enums.md diff --git a/reference/src/SUMMARY.md b/reference/src/SUMMARY.md index d72a887a..f016c8ba 100644 --- a/reference/src/SUMMARY.md +++ b/reference/src/SUMMARY.md @@ -16,6 +16,7 @@ - [Data layout](./layout.md) - [Structs and tuples](./layout/structs-and-tuples.md) - [Integers and Floating Points](./layout/integers-floatingpoint.md) + - [Enums](./layout/enums.md) - [Unions](./layout/unions.md) - [Vectors](./layout/vectors.md) - [Optimizations](./optimizations.md) diff --git a/reference/src/representation/enums.md b/reference/src/representation/enums.md new file mode 100644 index 00000000..db78b7fa --- /dev/null +++ b/reference/src/representation/enums.md @@ -0,0 +1,376 @@ +# Representation of Rust `enum` types + +**Disclaimer:** Some parts of this section were decided in RFCs, but +others represent the consensus from issue [#10]. The text will attempt +to clarify which parts are "guaranteed" (owing to the RFC decision) +and which parts are still in a "preliminary" state. + +[#10]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10 + +## Background + +**C-like enums.** The simplest form of enum is simply a list of +variants: + +```rust +enum SomeEnum { + Variant1, + Variant2, + Variant3, +``` + +Such enums are called "C-like" because they correspond quite closely +with enums in the C language (though there are important differences +as well, covered later). Presuming that they have more than one +variant, these sorts of enums are always represented as a simple integer, +though the size will vary. + +C-like enums may also specify the value of their discriminants explicitly: + +```rust +enum SomeEnum { + Variant22 = 22, + Variant44 = 44, + Variant45, +} +``` + +As in C, discriminant values that are not specified are defined as +either 0 (for the first variant) or as one more than the prior +variant. + +**Data-carrying enums.** Enums whose enums have fields are called +"data-carrying" enums. Note that for the purposes of this definition, +it is not relevant whether those fields are zero-sized. Therefore this +enum is considered "data-carrying": + +```rust +enum Foo { + Bar(()), + Baz, +} +``` + +**Option-like enums.** As a special case of data-carrying enums, we +identify "option-like" enums as enums where all of the variants but +one have no fields, and one variant has a single field. The most +common example is `Option` itself. In some cases, as described below, +the compiler may apply special optimization rules to the layout of +option-like enums. The **payload** of an option-like enum is the value +of that single field. + +## Enums with a specified representation + +Enums may be annotation using the following `#[repr]` tags: + +- A specific integer type (called `Int` as a shorthand below): + - `#[repr(u8)]` + - `#[repr(u16)]` + - `#[repr(u32)]` + - `#[repr(u64)]` + - `#[repr(i8)]` + - `#[repr(i16)]` + - `#[repr(i32)]` + - `#[repr(i64)]` +- C-compatible layout: + - `#[repr(C)]` +- C-compatible layout with a specified discriminant size: + - `#[repr(C, u8)]` + - `#[repr(C, u16)]` + - etc + +We cover each of the categories below. The layout rules for enums with +explicit `#[repr]` annotations are specified in [RFC 2195][]. + +[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html + +### Layout of an enum with no variants + +An enum with no variants can never be instantiated and is logically +equivalent to the "never type" `!`. Such enums are guaranteed to have +the same layout as `!` (zero size and alignment 1). + +### Layout of a C-like enum + +If there is no `#[repr]` attached to a C-like enum, it is guaranteed +to be represented as an integer of sufficient size to store the +discriminants for all possible variants. The size is selected by the +compiler but must be at least a `u8`. + +When a `#[repr(Int)]`-style annotation is attached to a C-like enum +(one without any data for its variants), it will cause the enum to be +represented as a simple integer of the specified size `Int`. This must +be sufficient to store all the required discriminant values. + +The `#[repr(C)]` annotation is equivalent, but it selects the same +size as the C compiler would use for the given target for an +equivalent C-enum declaration. + +Combining a `C` and `Int` representation (e.g., `#[repr(C, u8)]`) is +not permitted on a C-like enum. + +The values used for the discriminant will match up with what is +specified (or automatically assigned) in the enum definition. For +example, the following enum defines the discriminants for its variants +as 22 and 23 respectively: + +```rust +enum Foo { + // Specificy discriminant of this variant as 22: + Variant22 = 22, + + // Default discriminant is one more than the previous, + // so 23 will be assigned. + Variant23 +} +``` + +**Unresolved question:** What about platforms where `-fshort-enums` +are the default? Do we know/care about that? + +### Layout for enums that carry data + +For enums that carry data, the layout differs depending on whether +C-compatibility is requested or not. + +#### Non-C-compatible layouts + +When an enum is tagged with `#[repr(Int)]` for some integral type +`Int` (e.g., `#[repr(u8)]`), it will be represented as a C-union of a +series of `#[repr(C)]` structs, one per variant. Each of these structs +begins with an integral field containing the **discriminant**, which +specifies which variant is active. They then contain the remaining +fields associated with that variant. + +**Example.** The following enum uses an `repr(u8)` annotation: + +```rust +#[repr(u8)] +enum TwoCases { + A(u8, u16), + B(u16), +} +``` + +This will be laid out equivalently to the following more +complex Rust types: + +``` +union TwoCasesRepr { + A: TwoCasesVariantA, + B: TwoCasesVariantB, +} + +#[repr(u8)] +enum TwoCasesTag { A, B } + +#[repr(C)] +struct TwoCasesVariantA(TwoCasesTag, u8, u16); + +#[repr(C)] +struct TwoCasesVariantB(TwoCasesTag, u16); +``` + +Note that the `TwoCasesVariantA` and `TwoCasesVariantB` structs are +`#[repr(C)]`; this is needed to ensure that the `TwoCasesTag` value +appears at offset 0 in both cases, so that we can read it to determine +the current variant. + +#### C-compatible layouts. + +When the `#[repr]` tag includes `C`, e.g., `#[repr(C)]` or `#[repr(C, +u8)]`, the layout of enums is changed to better match C++ enums. In +this mode, the data is laid out as a tuple of `(discriminant, union)`, +where `union` represents a C union of all the possible variants. The +type of the discriminant will be the integral type specified (`u8`, +etc) -- if no type is specified, then the compiler will select one +based on what a size a C-like enum would have with the same number of +variants. + +This layout, while more compatible and arguably more obvious, is also +less efficient than the non-C compatible layout in some cases in terms +of total size. + +**Example.** The following enum: + +```rust +#[repr(C, Int)] +enum MyEnum { + A(u32), + B(f32, u64), + C { x: u32, y: u8 }, + D, +} +``` + +is equivalent to the following Rust definition: + +```rust +#[repr(C)] +struct MyEnumRepr { + tag: MyEnumTag, + payload: MyEnumPayload, +} + +#[repr(Int)] +enum MyEnumTag { A, B, C, D } + +#[repr(C)] +union MyEnumPayload { + A: u32, + B: MyEnumPayloadB, + C: MyEnumPayloadC, + D: (), +} + +#[repr(C)] +struct MyEnumPayloadB(f32, u64); + +#[repr(C)] +struct MyEnumPayloadC { x: u32, y: u8 } +} +``` + +This enum can also be represented in C++ as follows: + +```c++ +#include + +enum class MyEnumTag: CppEquivalentOfInt { A, B, C, D }; +struct MyEnumPayloadB { float _0; uint64_t _1; }; +struct MyEnumPayloadC { uint32_t x; uint8_t y; }; + +union MyEnumPayload { + uint32_t A; + MyEnumPayloadB B; + MyEnumPayloadC C; +}; + +struct MyEnum { + MyEnumTag tag; + MyEnumPayload payload; +}; +``` + +## Enums without a specified representation + +If no explicit `#[repr]` attribute is used, then the layout of most +enums is not specified, with one crucial exception: option-like enums +may in some cases use a compact layout that is identical to their +payload. + +(Meta-note: The content in this section is not described by any RFC +and is therefore "non-normative".) + +### Discriminant elision on Option-like enums + +**Definition.** An **option-like enum** is an enum which has: + +- one variant with a single field, +- other variants with no fields ("unit" variants). + +The simplest example is `Option` itself, where the `Some` variant +has a single field (of type `T`), and the `None` variant has no +fields. But other enums that fit that same template (and even enums +that include multiple `None`-like fields) fit. + +**Definition.** The **payload** of an option-like enum is the single +field which it contains; in the case of `Option`, the payload has +type `T`. + +**Definition.** In some cases, the payload type may contain illegal +values, which are called **niches**. For example, a value of type `&T` +may never be NULL, and hence defines a niche consisting of the +bitstring `0`. Similarly, the standard library types [`NonZeroU8`] +and friends may never be zero, and hence also define the value of `0` +as a niche. (Types that define niche values will say so as part of the +description of their representation invariant.) + +[`NonZeroU8`]: https://doc.rust-lang.org/std/num/struct.NonZeroU8.html + +**Option-like enums where the payload defines an adequate number of +niche values are guaranteed to be represented without using any +discriminant at all.** This is called **discriminant elision**. If +discriminant elision is in effect, then the layout of the enum is +equal to the layout of its payload. + +The most common example is that `Option<&u8>` can be represented as an +nullable `&u8` reference -- the `None` variant is then represented +using the niche value zero. This is because a valid `&u8` value can +never be zero, so if we see a zero value, we know that this must be +`None` variant. + +In order for the optimization to apply, the payload type must define a +number of niches greater than or equal to the number of unit variants. +In the case of `Option`, this means that any niche at all will +suffice, as there is only one unit variant (`None`). + +**Example.** The type `Option<&u32>` will be represented at runtime as +a nullable pointer. FFI interop often depends on this property. + +**Example.** As `fn` types are non-nullable, the type `Option` will be represented at runtime as a nullable function +pointer (which is therefore equivalent to a C function pointer) . FFI +interop often depends on this property. + +**Example.** Consider the following enum definitions: + +```rust +enum Enum1 { + Present(T), + Absent1, + Absent2, +} + +enum Enum2 { + A, B, C +} +``` + +`Enum1<&u8>` is not eligible for discriminant elision, since `&u8` +defines a single niche value, but `Enum1` has two unit +variants. However, `Enum2` has only three legal values (0 for `A`, 1 +for `B`, and 2 for `C`), and hence defines a plethora of niche values[^caveat]. +Therefore, `Enum1` is guaranteed to be laid out the same as +`Enum2` ([consider the results of applying +`size_of`](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eadff247f2c5713b8f3b6c9cda297711)). + +[^caveat]: Strictly speaking, niche values are considered part of the "representation invariant" for an enum and not its type. Therefore, this section is added only as a preview for future unsafe-code-guidelines discussion. + +### Other optimizations + +The previous section specified a relatively narrow set of layout +optimizations that are **guaranteed** by the compiler. However, the +compiler is always free to perform **more** optimizations than this +minimal set. For example, the compiler presently treats `Result` and `Option` as equivalent, but this behavior is not +guaranteed to continue as `Result` is not considered +"option-like". + +As of this writing, the compiler's current behavior is to attempt to +elide discriminants whenever possible. Furthermore, a variant whose +only fields are of zero-size is considered a unit variant for this +purpose. If eliding discriminants is not possible (e.g., because the +payload does not define sufficient niche values), then the compiler +will select an appropriate discriminant size `N` and use a +representation roughly equivalent to `#[repr(N)]`, though without the +strict `#[repr(C)]` guarantees on each struct. However, this behavior +is not guaranteed to remain the same in future versions of the +compiler and should not be relied upon. (While it is not expected that +existing layout optimizations will be removed, it is possible -- it is +also possible for the compiler to introduce new sorts of +optimizations.) + +## Niche values + +C-like enums with N variants and no specified representation are +guaranteed to supply niche values corresponding to 256 - N (presuming +that is a positive number). This is because a C-like enum must be +represented using an integer and that integer must correspond to a +valid variant: the precise size of C-like enums is not specified but +it must be at least one byte, which means that there are at least 256 +possible bitstrings (only N of which are valid). + +Other enums -- or enums with a specified representation -- may supply +niches if their representation invariant permits it, but that is not +**guaranteed**. From 8c6ac6d28935a3b029d2cadd79d843c7b809e5e5 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 10 Jan 2019 11:51:40 -0500 Subject: [PATCH 02/14] rename representation -> layout --- reference/src/{representation => layout}/enums.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename reference/src/{representation => layout}/enums.md (100%) diff --git a/reference/src/representation/enums.md b/reference/src/layout/enums.md similarity index 100% rename from reference/src/representation/enums.md rename to reference/src/layout/enums.md From 59bd61106806ffe1c4a0ef88469e205c66bca406 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Wed, 19 Dec 2018 17:07:19 -0500 Subject: [PATCH 03/14] revise to take feedback into account: - only guarantee with one unit variant - fieldless enums - remove potentially confusing discussion of what compiler does today - remove discussion of niche values from enums, not important (yet) - generally reorganize layout rules to be by category of enum --- reference/src/layout/enums.md | 184 ++++++++++++++-------------------- 1 file changed, 74 insertions(+), 110 deletions(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index db78b7fa..f50deb3f 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -7,10 +7,14 @@ and which parts are still in a "preliminary" state. [#10]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10 -## Background +## Categories of enums -**C-like enums.** The simplest form of enum is simply a list of -variants: +**Empty enums.** Enums with no variants can never be instantiated and +are equivalent to the `!` type. They do not accept any `#[repr]` +annotations. + +**Fieldless enums.** The simplest form of enum is one where none of +the variants have any fields: ```rust enum SomeEnum { @@ -19,13 +23,13 @@ enum SomeEnum { Variant3, ``` -Such enums are called "C-like" because they correspond quite closely -with enums in the C language (though there are important differences -as well, covered later). Presuming that they have more than one -variant, these sorts of enums are always represented as a simple integer, -though the size will vary. +Such enums correspond quite closely with enums in the C language +(though there are important differences as well). Presuming that they +have more than one variant, these sorts of enums are always +represented as a simple integer, though the size will vary. -C-like enums may also specify the value of their discriminants explicitly: +Fieldless enums may also specify the value of their discriminants +explicitly: ```rust enum SomeEnum { @@ -51,17 +55,9 @@ enum Foo { } ``` -**Option-like enums.** As a special case of data-carrying enums, we -identify "option-like" enums as enums where all of the variants but -one have no fields, and one variant has a single field. The most -common example is `Option` itself. In some cases, as described below, -the compiler may apply special optimization rules to the layout of -option-like enums. The **payload** of an option-like enum is the value -of that single field. - -## Enums with a specified representation +## repr annotations accepted on enums -Enums may be annotation using the following `#[repr]` tags: +In general, enums may be annotation using the following `#[repr]` tags: - A specific integer type (called `Int` as a shorthand below): - `#[repr(u8)]` @@ -79,25 +75,36 @@ Enums may be annotation using the following `#[repr]` tags: - `#[repr(C, u16)]` - etc -We cover each of the categories below. The layout rules for enums with -explicit `#[repr]` annotations are specified in [RFC 2195][]. +Note that manually specifying the alignment using `#[repr(align)]` is +not permitted on an enum. -[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html +The set of repr annotations accepted by an enum depends on its category, +as defined above: + +- Empty enums: no repr annotations are permitted. +- Fieldless enums: `#[repr(Int)]`-style and `#[repr(C)]` annotations are permitted, but `#[repr(C, Int)]` annotations are not. +- Data-carrying enums: all repr annotations are permitted. -### Layout of an enum with no variants +## Enum layout rules -An enum with no variants can never be instantiated and is logically -equivalent to the "never type" `!`. Such enums are guaranteed to have -the same layout as `!` (zero size and alignment 1). +The rules for enum layout vary depending on the category. -### Layout of a C-like enum +### Layout of an empty enum -If there is no `#[repr]` attached to a C-like enum, it is guaranteed -to be represented as an integer of sufficient size to store the -discriminants for all possible variants. The size is selected by the -compiler but must be at least a `u8`. +An **empty enum** is an enum with no variants; empty enums can never +be instantiated and are logically equivalent to the "never type" +`!`. `#[repr]` annotations are not accepted on empty enums. Empty +enums are guaranteed to have the same layout as `!` (zero size and +alignment 1). -When a `#[repr(Int)]`-style annotation is attached to a C-like enum +### Layout of a fieldless enum + +If there is no `#[repr]` attached to a fieldless enum, it is +guaranteed to be represented as an integer of sufficient size to store +the discriminants for all possible variants. The size is selected by +the compiler but must be at least a `u8`. + +When a `#[repr(Int)]`-style annotation is attached to a fieldless enum (one without any data for its variants), it will cause the enum to be represented as a simple integer of the specified size `Int`. This must be sufficient to store all the required discriminant values. @@ -107,7 +114,7 @@ size as the C compiler would use for the given target for an equivalent C-enum declaration. Combining a `C` and `Int` representation (e.g., `#[repr(C, u8)]`) is -not permitted on a C-like enum. +not permitted on a fieldless enum. The values used for the discriminant will match up with what is specified (or automatically assigned) in the enum definition. For @@ -128,12 +135,19 @@ enum Foo { **Unresolved question:** What about platforms where `-fshort-enums` are the default? Do we know/care about that? -### Layout for enums that carry data +### Layout of a data-carrying enums with an explicit repr annotation -For enums that carry data, the layout differs depending on whether -C-compatibility is requested or not. +This section concerns data-carrying enums **with an explicit repr +annotation of some form**. The memory layout of such cases was +specified in [RFC 2195][] and is therefore normative. -#### Non-C-compatible layouts +[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html + +The layout of data-carrying enums that do **not** have an explicit +repr annotation is generally undefined, but with certain specific +exceptions: see the next section for details. + +#### Non-C-compatible representation selected When an enum is tagged with `#[repr(Int)]` for some integral type `Int` (e.g., `#[repr(u8)]`), it will be represented as a C-union of a @@ -176,7 +190,7 @@ Note that the `TwoCasesVariantA` and `TwoCasesVariantB` structs are appears at offset 0 in both cases, so that we can read it to determine the current variant. -#### C-compatible layouts. +#### C-compatible representation selected When the `#[repr]` tag includes `C`, e.g., `#[repr(C)]` or `#[repr(C, u8)]`, the layout of enums is changed to better match C++ enums. In @@ -184,7 +198,7 @@ this mode, the data is laid out as a tuple of `(discriminant, union)`, where `union` represents a C union of all the possible variants. The type of the discriminant will be the integral type specified (`u8`, etc) -- if no type is specified, then the compiler will select one -based on what a size a C-like enum would have with the same number of +based on what a size a fieldless enum would have with the same number of variants. This layout, while more compatible and arguably more obvious, is also @@ -252,27 +266,26 @@ struct MyEnum { }; ``` -## Enums without a specified representation +### Layout of a data-carrying enums without a repr annotation + +If no explicit `#[repr]` attribute is used, then the layout of a +data-carrying enum is typically **not specified**. However, in certain +select cases, there are **guaranteed layout optimizations** that may +apply, as described below. -If no explicit `#[repr]` attribute is used, then the layout of most -enums is not specified, with one crucial exception: option-like enums -may in some cases use a compact layout that is identical to their -payload. +#### Discriminant elision on Option-like enums (Meta-note: The content in this section is not described by any RFC and is therefore "non-normative".) -### Discriminant elision on Option-like enums +**Definition.** An **option-like enum** is a 2-variant enum where: -**Definition.** An **option-like enum** is an enum which has: - -- one variant with a single field, -- other variants with no fields ("unit" variants). +- one variant has a single field, and +- the other variant has no fields (the "unit variant"). The simplest example is `Option` itself, where the `Some` variant has a single field (of type `T`), and the `None` variant has no -fields. But other enums that fit that same template (and even enums -that include multiple `None`-like fields) fit. +fields. But other enums that fit that same template fit. **Definition.** The **payload** of an option-like enum is the single field which it contains; in the case of `Option`, the payload has @@ -284,15 +297,17 @@ may never be NULL, and hence defines a niche consisting of the bitstring `0`. Similarly, the standard library types [`NonZeroU8`] and friends may never be zero, and hence also define the value of `0` as a niche. (Types that define niche values will say so as part of the -description of their representation invariant.) +description of their representation invariant, which -- as of this +writing -- are the next topic up for discussion in the unsafe code +guidelines process.) [`NonZeroU8`]: https://doc.rust-lang.org/std/num/struct.NonZeroU8.html -**Option-like enums where the payload defines an adequate number of -niche values are guaranteed to be represented without using any -discriminant at all.** This is called **discriminant elision**. If -discriminant elision is in effect, then the layout of the enum is -equal to the layout of its payload. +**Option-like enums where the payload defines at least one niche value +are guaranteed to be represented using the same memory layout as their +payload.** This is called **discriminant elision**, as there is no +explicit discriminant value stored anywhere. Instead, niche values are +used to represent the unit variant. The most common example is that `Option<&u8>` can be represented as an nullable `&u8` reference -- the `None` variant is then represented @@ -313,7 +328,8 @@ a nullable pointer. FFI interop often depends on this property. pointer (which is therefore equivalent to a C function pointer) . FFI interop often depends on this property. -**Example.** Consider the following enum definitions: +**Example.** The following enum definition is **not** option-like, +as it has two unit variants: ```rust enum Enum1 { @@ -321,56 +337,4 @@ enum Enum1 { Absent1, Absent2, } - -enum Enum2 { - A, B, C -} ``` - -`Enum1<&u8>` is not eligible for discriminant elision, since `&u8` -defines a single niche value, but `Enum1` has two unit -variants. However, `Enum2` has only three legal values (0 for `A`, 1 -for `B`, and 2 for `C`), and hence defines a plethora of niche values[^caveat]. -Therefore, `Enum1` is guaranteed to be laid out the same as -`Enum2` ([consider the results of applying -`size_of`](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eadff247f2c5713b8f3b6c9cda297711)). - -[^caveat]: Strictly speaking, niche values are considered part of the "representation invariant" for an enum and not its type. Therefore, this section is added only as a preview for future unsafe-code-guidelines discussion. - -### Other optimizations - -The previous section specified a relatively narrow set of layout -optimizations that are **guaranteed** by the compiler. However, the -compiler is always free to perform **more** optimizations than this -minimal set. For example, the compiler presently treats `Result` and `Option` as equivalent, but this behavior is not -guaranteed to continue as `Result` is not considered -"option-like". - -As of this writing, the compiler's current behavior is to attempt to -elide discriminants whenever possible. Furthermore, a variant whose -only fields are of zero-size is considered a unit variant for this -purpose. If eliding discriminants is not possible (e.g., because the -payload does not define sufficient niche values), then the compiler -will select an appropriate discriminant size `N` and use a -representation roughly equivalent to `#[repr(N)]`, though without the -strict `#[repr(C)]` guarantees on each struct. However, this behavior -is not guaranteed to remain the same in future versions of the -compiler and should not be relied upon. (While it is not expected that -existing layout optimizations will be removed, it is possible -- it is -also possible for the compiler to introduce new sorts of -optimizations.) - -## Niche values - -C-like enums with N variants and no specified representation are -guaranteed to supply niche values corresponding to 256 - N (presuming -that is a positive number). This is because a C-like enum must be -represented using an integer and that integer must correspond to a -valid variant: the precise size of C-like enums is not specified but -it must be at least one byte, which means that there are at least 256 -possible bitstrings (only N of which are valid). - -Other enums -- or enums with a specified representation -- may supply -niches if their representation invariant permits it, but that is not -**guaranteed**. From e59001f5453f2d9a1d68b45466f3b9fa706769b4 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Wed, 19 Dec 2018 17:11:24 -0500 Subject: [PATCH 04/14] give example where it can be more efficient --- reference/src/layout/enums.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index f50deb3f..6eeb254d 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -189,7 +189,7 @@ Note that the `TwoCasesVariantA` and `TwoCasesVariantB` structs are `#[repr(C)]`; this is needed to ensure that the `TwoCasesTag` value appears at offset 0 in both cases, so that we can read it to determine the current variant. - + #### C-compatible representation selected When the `#[repr]` tag includes `C`, e.g., `#[repr(C)]` or `#[repr(C, @@ -203,7 +203,9 @@ variants. This layout, while more compatible and arguably more obvious, is also less efficient than the non-C compatible layout in some cases in terms -of total size. +of total size. For example, the `TwoCases` example given in the +preivous section only occupies 4 bytes with `#[repr(u8)]`, but would +occupy 6 bytes with `#[repr(C, u8)]`, as more padding is required. **Example.** The following enum: From bf3e435f7ad73e0efd585ec266c43d644d43f2f5 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Wed, 19 Dec 2018 17:15:08 -0500 Subject: [PATCH 05/14] clarify section headers a bit more --- reference/src/layout/enums.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 6eeb254d..06cfd8dc 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -147,7 +147,7 @@ The layout of data-carrying enums that do **not** have an explicit repr annotation is generally undefined, but with certain specific exceptions: see the next section for details. -#### Non-C-compatible representation selected +#### Explicit repr annotation without C compatibility When an enum is tagged with `#[repr(Int)]` for some integral type `Int` (e.g., `#[repr(u8)]`), it will be represented as a C-union of a @@ -190,7 +190,7 @@ Note that the `TwoCasesVariantA` and `TwoCasesVariantB` structs are appears at offset 0 in both cases, so that we can read it to determine the current variant. -#### C-compatible representation selected +#### Explicit repr annotation with C compatibility When the `#[repr]` tag includes `C`, e.g., `#[repr(C)]` or `#[repr(C, u8)]`, the layout of enums is changed to better match C++ enums. In From 6aa4d03aeabcbc478b6e7e2931ab0f2b853d6f22 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 20 Dec 2018 11:20:50 -0500 Subject: [PATCH 06/14] s/enums/variants/ Co-Authored-By: nikomatsakis --- reference/src/layout/enums.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 06cfd8dc..99620ceb 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -43,7 +43,7 @@ As in C, discriminant values that are not specified are defined as either 0 (for the first variant) or as one more than the prior variant. -**Data-carrying enums.** Enums whose enums have fields are called +**Data-carrying enums.** Enums whose variants have fields are called "data-carrying" enums. Note that for the purposes of this definition, it is not relevant whether those fields are zero-sized. Therefore this enum is considered "data-carrying": From c0c5ed4bc32faac5e53b554456469086a1bf85ff Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 20 Dec 2018 11:33:47 -0500 Subject: [PATCH 07/14] clarify that `#[repr(C)]` matches *default* settings of C compiler fix a few other nits too --- reference/src/layout/enums.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 99620ceb..c0d246c0 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -21,6 +21,7 @@ enum SomeEnum { Variant1, Variant2, Variant3, +} ``` Such enums correspond quite closely with enums in the C language @@ -132,8 +133,14 @@ enum Foo { } ``` -**Unresolved question:** What about platforms where `-fshort-enums` -are the default? Do we know/care about that? +**Note:** some C compilers offer flags (e.g., `-fshort-enums`) that +change the layout of enums from the default settings that are standard +for the platform. The integer size selected by `#[repr(C)]` is defined +to match the **default** settings for a given target, when no such +flags are supplied. If interop with code that uses other flags is +desired, then one should either specify the sizes of enums manually or +else use an alternate target definition that is tailored to the +compiler flags in use. ### Layout of a data-carrying enums with an explicit repr annotation @@ -317,11 +324,6 @@ using the niche value zero. This is because a valid `&u8` value can never be zero, so if we see a zero value, we know that this must be `None` variant. -In order for the optimization to apply, the payload type must define a -number of niches greater than or equal to the number of unit variants. -In the case of `Option`, this means that any niche at all will -suffice, as there is only one unit variant (`None`). - **Example.** The type `Option<&u32>` will be represented at runtime as a nullable pointer. FFI interop often depends on this property. From d1c0e4d70acccf46d96e5a29f260975b7d5d7fc0 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 20 Dec 2018 11:44:21 -0500 Subject: [PATCH 08/14] don't guarantee at least one byte --- reference/src/layout/enums.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index c0d246c0..bc72be5b 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -100,10 +100,13 @@ alignment 1). ### Layout of a fieldless enum -If there is no `#[repr]` attached to a fieldless enum, it is -guaranteed to be represented as an integer of sufficient size to store -the discriminants for all possible variants. The size is selected by -the compiler but must be at least a `u8`. +If there is no `#[repr]` attached to a fieldless enum, the compiler +will represent it using an integer of sufficient size to store the +discriminants for all possible variants -- note that if there is only +one variant, then 0 bits are required, so it is possible that the enum +may have zero size. In the absence of a `#[repr]` annotation, the +number of bits used by the compiler are not defined and are subject to +change. When a `#[repr(Int)]`-style annotation is attached to a fieldless enum (one without any data for its variants), it will cause the enum to be From 219a584dca6e0edd1133ed0d8ada5b60e9e0434a Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 20 Dec 2018 12:05:40 -0500 Subject: [PATCH 09/14] add `#[repr(C)]` to the union --- reference/src/layout/enums.md | 1 + 1 file changed, 1 insertion(+) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index bc72be5b..e3558c05 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -180,6 +180,7 @@ This will be laid out equivalently to the following more complex Rust types: ``` +#[repr(C)] union TwoCasesRepr { A: TwoCasesVariantA, B: TwoCasesVariantB, From c84b1c362ed631cccb3c01b11fa4db9d80b0b3c1 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 20 Dec 2018 12:06:36 -0500 Subject: [PATCH 10/14] s/representation invariant/validity invariant/ --- reference/src/layout/enums.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index e3558c05..3ac1d9ee 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -310,9 +310,9 @@ may never be NULL, and hence defines a niche consisting of the bitstring `0`. Similarly, the standard library types [`NonZeroU8`] and friends may never be zero, and hence also define the value of `0` as a niche. (Types that define niche values will say so as part of the -description of their representation invariant, which -- as of this -writing -- are the next topic up for discussion in the unsafe code -guidelines process.) +description of their validity invariant, which -- as of this writing +-- are the next topic up for discussion in the unsafe code guidelines +process.) [`NonZeroU8`]: https://doc.rust-lang.org/std/num/struct.NonZeroU8.html From ebd36392ea032726199bc63636cc52e1b597bea7 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 10 Jan 2019 11:52:18 -0500 Subject: [PATCH 11/14] rename representation to layout in header --- reference/src/layout/enums.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 3ac1d9ee..c177ef8d 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -1,4 +1,4 @@ -# Representation of Rust `enum` types +# Layout of Rust `enum` types **Disclaimer:** Some parts of this section were decided in RFCs, but others represent the consensus from issue [#10]. The text will attempt From e735b315f7306ca3a4d7fd3a5f973d0e9ca3bd72 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 10 Jan 2019 11:52:25 -0500 Subject: [PATCH 12/14] use `repr` and not "representation" --- reference/src/layout/enums.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index c177ef8d..023e7e06 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -117,7 +117,7 @@ The `#[repr(C)]` annotation is equivalent, but it selects the same size as the C compiler would use for the given target for an equivalent C-enum declaration. -Combining a `C` and `Int` representation (e.g., `#[repr(C, u8)]`) is +Combining a `C` and `Int` `repr` (e.g., `#[repr(C, u8)]`) is not permitted on a fieldless enum. The values used for the discriminant will match up with what is From f09d7e579b0fa5bcc8b0241a12cf6a8ccdc0c9d9 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 10 Jan 2019 12:28:10 -0500 Subject: [PATCH 13/14] add an unresolved question --- reference/src/layout/enums.md | 52 +++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 023e7e06..66456706 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -346,3 +346,55 @@ enum Enum1 { Absent2, } ``` + +## Unresolved questions + +### Layout of single variant enums + +[Issue #79.](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/79) + +Enums that contain a **single variant** and which do not have an +explicit `#[repr]` annotation are an important special case. Since +there is only a single variant, the enum must be instantiated with +that variant, which means that the enum is in fact equivalent to a +struct. The question then is to what extent we should **guarantee** +that the two share an equivalent layout, and also how to define the +interaction with uninhabited types. + +As presently implemented, the compiler will use the same layout for +structs and for single variant enums (as long as they do not have a +`#[repr]` annotation that overrides that choice). So, for example, the +struct `SomeStruct` and the enum `SomeEnum` would have an equivalent +layout ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3697ac684c3d021892694956df957653)):: + +```rust +struct SomeStruct; +enum SomeEnum { + SomeVariant, +} +``` + +Similarly, the struct `SomeStruct` and the enum `SomeVariant` in this +example would also be equivalent in their layout +([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=924724764419f846c788a8763da45992)): + +```rust +struct SomeStruct { x: u32 } +enum SomeEnum { + SomeVariant { x: u32 }, +} +``` + +In fact, the compiler will use this optimized layout even for enums +that define multiple variants, as long as all but one of the variants +is uninhabited +([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3cc1484c5b91097f3dc2015b7c207a0e)): + +```rust +struct SomeStruct { x: u32 } +enum SomeEnum { + SomeVariant { x: u32 }, + UninhabitedVariant { y: Void }, +} +``` + From f066fd9b057414fc2c3dc3be3b588574c4849432 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 10 Jan 2019 12:29:04 -0500 Subject: [PATCH 14/14] clarify disclaimer text --- reference/src/layout/enums.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/reference/src/layout/enums.md b/reference/src/layout/enums.md index 66456706..ce1997c6 100644 --- a/reference/src/layout/enums.md +++ b/reference/src/layout/enums.md @@ -3,7 +3,9 @@ **Disclaimer:** Some parts of this section were decided in RFCs, but others represent the consensus from issue [#10]. The text will attempt to clarify which parts are "guaranteed" (owing to the RFC decision) -and which parts are still in a "preliminary" state. +and which parts are still in a "preliminary" state, at least until we +start to open RFCs ratifying parts of the Unsafe Code Guidelines +effort. [#10]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/10