|
1 |
| -# Data structure representation |
| 1 | +# Data structure representation and validity requirements |
2 | 2 |
|
3 |
| -In general, Rust makes few guarantees about memory layout, unless you |
4 |
| -define your structs as `#[repr(rust)]`. But there are some things that |
5 |
| -we do guarantee. Let's write about them. |
| 3 | +## Introduction |
6 | 4 |
|
7 |
| -TODO: |
| 5 | +This discussion is meant to focus on the following things: |
8 | 6 |
|
9 |
| -- Find and link to the various RFCs |
10 |
| -- Enumerate things that we *might* in fact guarantee, even for non-C types: |
11 |
| - - e.g., `&T` and `Option<&T>` are both pointer sized |
12 |
| - - size of `extern fn` etc (at least on some platforms)? |
13 |
| - - For which `T` is `None` represented as a "null pointer" etc? |
14 |
| - - (Which "niche" optimizations can we rely on) |
| 7 | +- What guarantees does Rust make regarding the layout of data structures? |
| 8 | +- What guarantees does Rust make regarding ABI compatibility? |
| 9 | + |
| 10 | +NB. Oftentimes, choices of layout will only be possible if we can |
| 11 | +guarantee various invariants -- this is particularly true when |
| 12 | +optimizing the layout of `Option` or other enums. However, designing |
| 13 | +those invariants is left for a future discussion -- here, we should |
| 14 | +document/describe what we currently do and/or aim to support. |
| 15 | + |
| 16 | +### Layout of data structures |
| 17 | + |
| 18 | +In general, Rust makes few guarantees about the memory layout of your |
| 19 | +structures. For example, by default, the compiler has the freedom to |
| 20 | +rearrange the field order of your structures for more efficiency (as |
| 21 | +of this writing, we try to minimize the overall size of your |
| 22 | +structure, but this is the sort of detail that can easily change). For |
| 23 | +safe code, of course, any rearrangements "just work" transparently. |
| 24 | + |
| 25 | +If, however, you need to write unsafe code, you may wish to have a |
| 26 | +fixed data structure layout. In that case, there are ways to specify |
| 27 | +and control how an individual struct will be laid out -- notably with |
| 28 | +`#[repr]` annotations. One purpose of this section, then, is to layout |
| 29 | +what sorts of guarantees we offer when it comes to layout, and also |
| 30 | +what effect the various `#[repr]` annotations have. |
| 31 | + |
| 32 | +### ABI compatibilty |
| 33 | + |
| 34 | +When one either calls a foreign function or is called by one, extra |
| 35 | +care is needed to ensure that all the ABI details line up. ABI compatibility |
| 36 | +is related to data structure layout but -- in some cases -- can add another |
| 37 | +layer of complexity. For example, consider a struct with one field, like this one: |
| 38 | + |
| 39 | +```rust |
| 40 | +#[repr(C)] |
| 41 | +struct Foo { field: u32 } |
| 42 | +``` |
| 43 | + |
| 44 | +The memory layout of `Foo` is identical to a `u32`. But in many ABIs, |
| 45 | +the struct type `Foo` is treated differently at the point of a |
| 46 | +function call than a `u32` would be. Eliminating these gaps is the |
| 47 | +goal of the `#[repr(transparent)]` annotation introduced in [RFC |
| 48 | +1758]. For built-in types, such as `&T` and so forth, it is important |
| 49 | +for us to specify how they are treated at the point of a function |
| 50 | +call. |
| 51 | + |
| 52 | +## Goals |
| 53 | + |
| 54 | +- Document current behavior of compiler. |
| 55 | + - Indicate which behavior is "permitted" for compiler and which |
| 56 | + aspects are things that unsafe code can rely upon. |
| 57 | + - Include the effect of `#[repr]` annotations. |
| 58 | +- Uncover the sorts of layout optimizations we may wish to do in the |
| 59 | + future. |
| 60 | + |
| 61 | +## Some interesting examples and questions |
| 62 | + |
| 63 | +- `&T` where `T: Sized` |
| 64 | + - This is **guaranteed** to be a non-null pointer |
| 65 | +- `Option<&T>` where `T: Sized` |
| 66 | + - This is **guaranteed** to be a nullable pointer |
| 67 | +- `Option<extern "C" fn()>` |
| 68 | + - Can this be assumed to be a non-null pointer? |
| 69 | +- `usize` |
| 70 | + - Platform dependent size, but guaranteed to be able to store a pointer? |
| 71 | + - Also an array length? |
| 72 | +- Uninitialized bits -- for which types are uninitialized bits valid? |
| 73 | +- If you have `struct A { .. }` and `struct B { .. }` with no |
| 74 | + `#[repr]` annotations, and they have the same field types, can we |
| 75 | + say that they will have the same layout? |
| 76 | + - or do we have the freedom to rearrange the types of `A` but not |
| 77 | + `B`, e.g. based on PGO results |
| 78 | + - What about different instantiations of the same struct? (`Vec<A>` |
| 79 | + vs `Vec<B>`) |
| 80 | +- Rust currently says that no single value may be larger than `isize` bytes |
| 81 | + - is this good? can it be changed? does it matter *here* anyway? |
| 82 | + |
| 83 | +## Active threads |
| 84 | + |
| 85 | +To start, we will create threads for each major categories of types |
| 86 | +(with a few suggested focus points): |
| 87 | + |
| 88 | +- Integers and floating points |
| 89 | + - What about signaling NaN etc? ([Seems like a |
| 90 | + non-issue](https://github.com/rust-lang/rust/issues/40470#issuecomment-343803381), |
| 91 | + but it'd be good to resummarize the details). |
| 92 | + - is `usize` the native size of a pointer? [the max of various other considerations](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212702266)? |
| 93 | + what are edge cases here? |
| 94 | + - Rust currently states that the maximum size of any single value must fit in with `isize` |
| 95 | + - Can we say a bit more about why? (e.g., [ensuring that "pointer diff" is representable](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212703192) |
| 96 | +- Booleans |
| 97 | + - Prior discussions ([#46156][], [#46176][]) documented bool as a single |
| 98 | + byte that is either 0 or 1. |
| 99 | +- Enums |
| 100 | + - See dedicated thread about "niches" and `Option`-style layout optimization |
| 101 | + below. |
| 102 | + - Define: C-like enum |
| 103 | + - Can a C-like enum ever have an invalid discriminant? (Presumably not) |
| 104 | + - Empty enums and the `!` type |
| 105 | + - [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads. |
| 106 | + - [RFC 2363][] offers a proposal to permit specifying discriminations. |
| 107 | +- Structs |
| 108 | + - Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out |
| 109 | + (and/or treated by the ABI)? |
| 110 | + - e.g., what about different structs with same definition |
| 111 | + - across executions of the same program? |
| 112 | + - For example, [rkruppe |
| 113 | + writes](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212776247) |
| 114 | + that we might "want to guarantee (some subset of) newtype |
| 115 | + unpacking and relegate `#[repr(transparent)]` to being the way |
| 116 | + to guarantee to other crates that a type with private fields is |
| 117 | + and will remain a newtype?" |
| 118 | +- Tuples |
| 119 | + - Are these effectively anonymous structs? |
| 120 | +- Unions |
| 121 | + - Can we ever say anything about the initialized contents of a union? |
| 122 | + - Is `#[repr(C)]` meaningful on a union? |
| 123 | + - When (if ever) do we guarantee that all fields have the same address? |
| 124 | +- Fn pointers (`fn()`, `extern "C" fn()`) |
| 125 | + - When is transmuting from one `fn` type to another allowed? |
| 126 | + - Can you transmute from a `fn` to `usize` or raw pointer? |
| 127 | + - In theory this is platform dependent, and C certainly draws a |
| 128 | + distinction between `void*` and a function pointer, but are |
| 129 | + there any modern and/or realisic platforms where it is an |
| 130 | + issue? |
| 131 | + - Is `Option<extern "C" fn()>` guaranteed to be a pointer (possibly null)? |
| 132 | +- References `&T` and `&mut T` |
| 133 | + - Out of scope: aliasing rules |
| 134 | + - Always aligned, non-null |
| 135 | + - When using the C ABI, these map to the C pointer types, presumably |
| 136 | +- Raw pointers |
| 137 | + - Effectively same as integers? |
| 138 | + - Is `ptr::null` etc guaranteed to be equal in representation to `0_usize`? |
| 139 | + - C does guarantee that `0` when cast to a pointer is NULL |
| 140 | +- Representation knobs: |
| 141 | + - Custom alignment ([RFC 1358]) |
| 142 | + - Packed ([RFC 1240] talks about some safety issues) |
| 143 | + |
| 144 | +[#46156]: https://github.com/rust-lang/rust/pull/46156 |
| 145 | +[#46176]: https://github.com/rust-lang/rust/pull/46176 |
| 146 | +[RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363 |
| 147 | +[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html |
| 148 | +[RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html |
| 149 | +[RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html |
| 150 | +[RFC 1758]: https://rust-lang.github.io/rfcs/1758-repr-transparent.html |
0 commit comments