Skip to content

Commit d241694

Browse files
authored
Merge pull request #5 from nikomatsakis/data-repr
Area proposal: Representation and validity invariants
2 parents 8fc9145 + 358feac commit d241694

File tree

1 file changed

+147
-11
lines changed

1 file changed

+147
-11
lines changed

active_discussion/representation.md

+147-11
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,150 @@
1-
# Data structure representation
1+
# Data structure representation and validity requirements
22

3-
In general, Rust makes few guarantees about memory layout, unless you
4-
define your structs as `#[repr(rust)]`. But there are some things that
5-
we do guarantee. Let's write about them.
3+
## Introduction
64

7-
TODO:
5+
This discussion is meant to focus on the following things:
86

9-
- Find and link to the various RFCs
10-
- Enumerate things that we *might* in fact guarantee, even for non-C types:
11-
- e.g., `&T` and `Option<&T>` are both pointer sized
12-
- size of `extern fn` etc (at least on some platforms)?
13-
- For which `T` is `None` represented as a "null pointer" etc?
14-
- (Which "niche" optimizations can we rely on)
7+
- What guarantees does Rust make regarding the layout of data structures?
8+
- What guarantees does Rust make regarding ABI compatibility?
9+
10+
NB. Oftentimes, choices of layout will only be possible if we can
11+
guarantee various invariants -- this is particularly true when
12+
optimizing the layout of `Option` or other enums. However, designing
13+
those invariants is left for a future discussion -- here, we should
14+
document/describe what we currently do and/or aim to support.
15+
16+
### Layout of data structures
17+
18+
In general, Rust makes few guarantees about the memory layout of your
19+
structures. For example, by default, the compiler has the freedom to
20+
rearrange the field order of your structures for more efficiency (as
21+
of this writing, we try to minimize the overall size of your
22+
structure, but this is the sort of detail that can easily change). For
23+
safe code, of course, any rearrangements "just work" transparently.
24+
25+
If, however, you need to write unsafe code, you may wish to have a
26+
fixed data structure layout. In that case, there are ways to specify
27+
and control how an individual struct will be laid out -- notably with
28+
`#[repr]` annotations. One purpose of this section, then, is to layout
29+
what sorts of guarantees we offer when it comes to layout, and also
30+
what effect the various `#[repr]` annotations have.
31+
32+
### ABI compatibilty
33+
34+
When one either calls a foreign function or is called by one, extra
35+
care is needed to ensure that all the ABI details line up. ABI compatibility
36+
is related to data structure layout but -- in some cases -- can add another
37+
layer of complexity. For example, consider a struct with one field, like this one:
38+
39+
```rust
40+
#[repr(C)]
41+
struct Foo { field: u32 }
42+
```
43+
44+
The memory layout of `Foo` is identical to a `u32`. But in many ABIs,
45+
the struct type `Foo` is treated differently at the point of a
46+
function call than a `u32` would be. Eliminating these gaps is the
47+
goal of the `#[repr(transparent)]` annotation introduced in [RFC
48+
1758]. For built-in types, such as `&T` and so forth, it is important
49+
for us to specify how they are treated at the point of a function
50+
call.
51+
52+
## Goals
53+
54+
- Document current behavior of compiler.
55+
- Indicate which behavior is "permitted" for compiler and which
56+
aspects are things that unsafe code can rely upon.
57+
- Include the effect of `#[repr]` annotations.
58+
- Uncover the sorts of layout optimizations we may wish to do in the
59+
future.
60+
61+
## Some interesting examples and questions
62+
63+
- `&T` where `T: Sized`
64+
- This is **guaranteed** to be a non-null pointer
65+
- `Option<&T>` where `T: Sized`
66+
- This is **guaranteed** to be a nullable pointer
67+
- `Option<extern "C" fn()>`
68+
- Can this be assumed to be a non-null pointer?
69+
- `usize`
70+
- Platform dependent size, but guaranteed to be able to store a pointer?
71+
- Also an array length?
72+
- Uninitialized bits -- for which types are uninitialized bits valid?
73+
- If you have `struct A { .. }` and `struct B { .. }` with no
74+
`#[repr]` annotations, and they have the same field types, can we
75+
say that they will have the same layout?
76+
- or do we have the freedom to rearrange the types of `A` but not
77+
`B`, e.g. based on PGO results
78+
- What about different instantiations of the same struct? (`Vec<A>`
79+
vs `Vec<B>`)
80+
- Rust currently says that no single value may be larger than `isize` bytes
81+
- is this good? can it be changed? does it matter *here* anyway?
82+
83+
## Active threads
84+
85+
To start, we will create threads for each major categories of types
86+
(with a few suggested focus points):
87+
88+
- Integers and floating points
89+
- What about signaling NaN etc? ([Seems like a
90+
non-issue](https://github.com/rust-lang/rust/issues/40470#issuecomment-343803381),
91+
but it'd be good to resummarize the details).
92+
- is `usize` the native size of a pointer? [the max of various other considerations](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212702266)?
93+
what are edge cases here?
94+
- Rust currently states that the maximum size of any single value must fit in with `isize`
95+
- Can we say a bit more about why? (e.g., [ensuring that "pointer diff" is representable](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212703192)
96+
- Booleans
97+
- Prior discussions ([#46156][], [#46176][]) documented bool as a single
98+
byte that is either 0 or 1.
99+
- Enums
100+
- See dedicated thread about "niches" and `Option`-style layout optimization
101+
below.
102+
- Define: C-like enum
103+
- Can a C-like enum ever have an invalid discriminant? (Presumably not)
104+
- Empty enums and the `!` type
105+
- [RFC 2195][] defined the layout of `#[repr(C)]` enums with payloads.
106+
- [RFC 2363][] offers a proposal to permit specifying discriminations.
107+
- Structs
108+
- Do we ever say *anything* about how a `#[repr(rust)]` struct is laid out
109+
(and/or treated by the ABI)?
110+
- e.g., what about different structs with same definition
111+
- across executions of the same program?
112+
- For example, [rkruppe
113+
writes](https://github.com/rust-rfcs/unsafe-code-guidelines/pull/5#discussion_r212776247)
114+
that we might "want to guarantee (some subset of) newtype
115+
unpacking and relegate `#[repr(transparent)]` to being the way
116+
to guarantee to other crates that a type with private fields is
117+
and will remain a newtype?"
118+
- Tuples
119+
- Are these effectively anonymous structs?
120+
- Unions
121+
- Can we ever say anything about the initialized contents of a union?
122+
- Is `#[repr(C)]` meaningful on a union?
123+
- When (if ever) do we guarantee that all fields have the same address?
124+
- Fn pointers (`fn()`, `extern "C" fn()`)
125+
- When is transmuting from one `fn` type to another allowed?
126+
- Can you transmute from a `fn` to `usize` or raw pointer?
127+
- In theory this is platform dependent, and C certainly draws a
128+
distinction between `void*` and a function pointer, but are
129+
there any modern and/or realisic platforms where it is an
130+
issue?
131+
- Is `Option<extern "C" fn()>` guaranteed to be a pointer (possibly null)?
132+
- References `&T` and `&mut T`
133+
- Out of scope: aliasing rules
134+
- Always aligned, non-null
135+
- When using the C ABI, these map to the C pointer types, presumably
136+
- Raw pointers
137+
- Effectively same as integers?
138+
- Is `ptr::null` etc guaranteed to be equal in representation to `0_usize`?
139+
- C does guarantee that `0` when cast to a pointer is NULL
140+
- Representation knobs:
141+
- Custom alignment ([RFC 1358])
142+
- Packed ([RFC 1240] talks about some safety issues)
143+
144+
[#46156]: https://github.com/rust-lang/rust/pull/46156
145+
[#46176]: https://github.com/rust-lang/rust/pull/46176
146+
[RFC 2363]: https://github.com/rust-lang/rfcs/pull/2363
147+
[RFC 2195]: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html
148+
[RFC 1358]: https://rust-lang.github.io/rfcs/1358-repr-align.html
149+
[RFC 1240]: https://rust-lang.github.io/rfcs/1240-repr-packed-unsafe-ref.html
150+
[RFC 1758]: https://rust-lang.github.io/rfcs/1758-repr-transparent.html

0 commit comments

Comments
 (0)