-
Notifications
You must be signed in to change notification settings - Fork 59
add a stucts-and-tuples chapter #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
avadacatavra
merged 19 commits into
rust-lang:master
from
nikomatsakis:structs-and-tuples
Oct 25, 2018
Merged
Changes from 12 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
c3a54f9
add a stucts-and-tuples chapter
nikomatsakis 08062bd
add nits from eddyb
nikomatsakis eb951c2
correct typo
nikomatsakis d5a144b
strengthen to "subject to change between compilations"
nikomatsakis 569338b
add note that packed adjusts alignment of the struct as a whole
nikomatsakis d96723f
`[T; 3]` and `(T, (T, T))` would have the same memory layout.
nikomatsakis 4974192
rename ABI compatibility to "function call ABI compatibility"
nikomatsakis 89a8a70
rework to call out details *not* relevant to layout
nikomatsakis c7d728e
make it clear that C17 etc are the "normative" spec for `#[repr(C)]`
nikomatsakis 65f65f9
extract controversial unresolved questions
nikomatsakis fbafc2d
clarify wording
nikomatsakis fbf35bf
assign issues
nikomatsakis 97622bc
generalize "input program" to "input to compiler"
nikomatsakis b0c86ea
be clearer about when reordering fields would be a breaking change
nikomatsakis f146fc4
rephrase "any" struct for clarity
nikomatsakis 72f5db3
rework "default layout" to guarantee nothing for now
nikomatsakis 17dab33
document zero-sized struct behavior
nikomatsakis a9223e2
say more about zero-sized things
nikomatsakis ece91f5
fix typo
nikomatsakis File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,349 @@ | ||
# Representation of structs and tuples | ||
|
||
**Disclaimer:** This chapter represents the consensus from issues | ||
[#11] and [#12]. The statements in here are not (yet) "guaranteed" | ||
not to change until an RFC ratifies them. | ||
|
||
[#11]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11 | ||
[#12]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12 | ||
|
||
## Tuple types | ||
|
||
In general, an anonymous tuple type `(T1..Tn)` of arity N is laid out | ||
"as if" there were a corresponding tuple struct declared in libcore: | ||
|
||
```rust | ||
#[repr(Rust)] | ||
struct TupleN<P1..Pn:?Sized>(P1..Pn); | ||
``` | ||
|
||
In this case, `(T1..Tn)` would be compatible with `TupleN<T1..Tn>`. | ||
As discussed below, this generally means that the compiler is **free | ||
to re-order field layout** as it wishes. Thus, if you would like a | ||
guaranteed layout from a tuple, you are generally advised to create a | ||
named struct with a `#[repr(C)]` annotation (see [the section on | ||
structs for more details](#structs)). | ||
|
||
Note that the final element of a tuple (`Pn`) is marked as `?Sized` to | ||
permit unsized tuple coercion -- this is implemented on nightly but is | ||
currently unstable ([tracking issue][#42877]). In the future, we may | ||
extend unsizing to other elements of tuples as well. | ||
|
||
[#42877]: https://github.com/rust-lang/rust/issues/42877 | ||
|
||
### Other notes on tuples | ||
|
||
Some related discussion: | ||
|
||
- [RFC #1582](https://github.com/rust-lang/rfcs/pull/1582) proposed | ||
that tuple structs should have a "nested representation", where | ||
e.g. `(T1, T2, T3)` would in fact be laid out as `(T1, (T2, | ||
T3))`. The purpose of this was to permit variadic matching and so | ||
forth against some suffix of the struct. This RFC was not accepted, | ||
however. This lay out requires extra padding and seems somewhat | ||
surprising: it means that the layout of tuples and tuple structs | ||
would diverge significantly from structs with named fields. | ||
|
||
<a name="structs"></a> | ||
|
||
## Struct types | ||
|
||
Structs come in two principle varieties: | ||
|
||
```rust | ||
// Structs with named fields | ||
struct Foo { f1: T1, .., fn: Tn } | ||
|
||
// Tuple structs | ||
struct Foo(T1, .., Tn); | ||
``` | ||
|
||
In terms of their layout, tuple structs can be understood as | ||
equivalent to a named struct with fields named `0..n-1`: | ||
|
||
```rust | ||
struct Foo { | ||
0: T1, | ||
... | ||
n-1: Tn | ||
} | ||
``` | ||
|
||
(In fact, one may use such field names in patterns or in accessor | ||
expressions like `foo.0`.) | ||
|
||
Structs can have various `#[repr]` flags that influence their layout: | ||
|
||
- `#[repr(Rust)]` -- the default. | ||
- `#[repr(C)]` -- request C compatibility | ||
- `#[repr(align(N))]` -- specify the alignment | ||
- `#[repr(packed)]` -- request packed layout where fields are not internally aligned | ||
- `#[repr(transparent)]` -- request that a "wrapper struct" be treated | ||
"as if" it were an instance of its field type when passed as an | ||
argument | ||
|
||
### Default layout ("repr rust") | ||
|
||
The default layout of structs is not specified. Effectively, the | ||
compiler provdes a deterministic function per struct definition that | ||
defines its layout. This function may as principle take as input the | ||
entire input program. Therefore: | ||
|
||
- the types of the struct's fields | ||
- the layout of other structs (including structs not included within this struct) | ||
- compiler settings | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
As of this writing, we have not reached a full consensus on what | ||
limitations should exist on possible field struct layouts. Therefore, | ||
the default layout of structs is considered undefined and subject to | ||
change between individual compilations. This implies that (among other | ||
things) two structs with the same field types may not be laid out in | ||
the same way (for example, the hypothetical struct representing tuples | ||
may be laid out differently from user-declared structs). | ||
|
||
Further, the layout of structs is always defined relative to the | ||
**struct definition** plus a substitution supplying values for each of | ||
the struct's generic types (in contrast to just considering the fully | ||
monomorphized field types). This is necessary because the presence or | ||
absence of generics can make a difference (e.g., `struct Foo { x: u16, | ||
y: u32 }` and `struct Foo<T> { x: u16, y: T }` where `T = u32` are not | ||
guaranteed to be identical), owing to the possibility of unsizing | ||
coercions. | ||
|
||
**Compiler's current behavior.** As of the time of this writing, the | ||
compiler will reorder struct fields to minimize the overall size of | ||
the struct (and in particular to eliminate padding due to alignment | ||
restrictions). The final field, however, is not reordered if an | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
unsizing coercion may be applied. | ||
|
||
#### Unresolved questions | ||
|
||
During the course of the discussion in [#11] and [#12], various | ||
suggestions arose to limit the compiler's flexibility. These questions | ||
are currently considering **unresolved** and -- for each of them -- an | ||
issue has been opened for further discussion on the repository. This | ||
section documents the questions and gives a few light details, but the | ||
reader is referred to the issues for further discussion. | ||
|
||
**Zero-sized structs ([#37]).** If you have a struct which -- | ||
transitively -- contains no data of non-zero size, then the size of | ||
that struct will be zero as well. These zero-sized structs appear | ||
frequently as exceptions in other layout considerations (e.g., | ||
single-field structs). An example of such a struct is | ||
`std::marker::PhantomData`. | ||
|
||
[#37]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/37 | ||
|
||
**Single-field structs ([#34]).** If you have a struct with single field | ||
(`struct Foo { x: T }`), should we guarantee that the memory layout of | ||
`Foo` is identical to the memory layout of `T` (note that ABI details | ||
around function calls may still draw a distinction, which is why | ||
`#[repr(transparent)]` is needed). What about zero-sized types like | ||
`PhantomData`? | ||
|
||
[#34]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/34 | ||
|
||
**Homogeneous structs ([#36]).** If you have homogeneous structs, where all | ||
the `N` fields are of a single type `T`, can we guarantee a mapping to | ||
the memory layout of `[T; N]`? How do we map between the field names | ||
and the indices? What about zero-sized types? | ||
|
||
[#36]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/36 | ||
|
||
**Deterministic layout ([#35]).** Can we say that layout is some deterministic | ||
function of a certain, fixed set of inputs? This would allow you to be | ||
sure that if you do not alter those inputs, your struct layout would | ||
not change, even if it meant that you can't predict precisely what it | ||
will be. For example, we might say that struct layout is a function of | ||
the struct's generic types and its substitutions, full stop -- this | ||
would imply that any two structs with the same definition are laid out | ||
the same. This might interfere with our ability to do profile-guided | ||
layout or to analyze how a struct is used and optimize based on | ||
that. Some would call that a feature. | ||
|
||
[#35]: https://github.com/rust-rfcs/unsafe-code-guidelines/issues/35 | ||
|
||
### C-compatible layout ("repr C") | ||
|
||
For structs tagged `#[repr(C)]`, the compiler will apply a C-like | ||
layout scheme. See section 6.7.2.1 of the [C17 specification][C17] for | ||
a detailed write-up of what such rules entail (as well as the relevant | ||
specs for your platform). For most platforms, however, this means the | ||
following: | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[C17]: http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf | ||
|
||
- Field order is preserved. | ||
- The first field begins at offset 0. | ||
- Assuming the struct is not packed, each field's offset is aligned[^aligned] to | ||
the ABI-mandated alignment for that field's type, possibly creating | ||
unused padding bits. | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- The total size of the struct is rounded up to its overall alignment. | ||
|
||
[^aligned]: Aligning an offset O to an alignment A means to round up the offset O until it is a multiple of the alignment A. | ||
|
||
The intention is that if one has a set of C struct declarations and a | ||
corresponding set of Rust struct declarations, all of which are tagged | ||
with `#[repr(C)]`, then the layout of those structs will all be | ||
identical. Note that this setup implies that none of the structs in | ||
question can contain any `#[repr(Rust)]` structs (or Rust tuples), as | ||
those would have no corresponding C struct declaration -- as | ||
`#[repr(Rust)]` types have undefined layout, you cannot safely declare | ||
their layout in a C program. | ||
|
||
See also the notes on [ABI compatibility](#fnabi) under the section on `#[repr(transparent)]`. | ||
|
||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Fixed alignment | ||
|
||
The `#[repr(align(N))]` attribute may be used to raise the alignment | ||
of a struct, as described in [The Rust Reference][TRR-align]. | ||
|
||
[TRR-align]: (https://doc.rust-lang.org/stable/reference/type-layout.html#the-align-representation). | ||
|
||
### Packed layout | ||
|
||
The `#[repr(packed(N))]` attribute may be used to impose a maximum | ||
limit on the alignments for individual fields. It is most commonly | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
used with an alignment of 1, which makes the struct as small as | ||
possible. For example, in a `#[repr(packed(2))]` struct, a `u8` or | ||
`u16` would be aligned at 1- or 2-bytes respectively (as normal), but | ||
a `u32` would be aligned at only 2 bytes instead of 4. In the absence | ||
of an explicit `#[repr(align)]` directive, `#[repr(packed(N))]` also | ||
sets the alignment for the struct as a whole to N bytes. | ||
|
||
The resulting fields may not fall at properly aligned boundaries in | ||
memory. This makes it unsafe to create a Rust reference (`&T` or `&mut | ||
T`) to those fields, as the compiler requires that all reference | ||
values must always be aligned (so that it can use more efficient | ||
load/store instructions at runtime). See the [Rust reference for more | ||
details][TRR-packed]. | ||
|
||
[TRR-packed]: https://doc.rust-lang.org/stable/reference/type-layout.html#the-packed-representation | ||
|
||
<a name="fnabi"> </a> | ||
|
||
### Function call ABI compatibility | ||
|
||
In general, when invoking functions that use the C ABI, `#[repr(C)]` | ||
structs are guaranteed to be passed in the same way as their | ||
corresponding C counterpart (presuming one exists). `#[repr(Rust)]` | ||
structs have no such guarantee. This means that if you have an `extern | ||
"C"` function, you cannot pass a `#[repr(Rust)]` struct as one of its | ||
arguments. Instead, one would typically pass `#[repr(C)]` structs (or | ||
possibly pointers to Rust-structs, if those structs are opaque on the | ||
other side, or the callee is defined in Rust). | ||
|
||
However, there is a subtle point about C ABIs: in some C ABIs, passing | ||
a struct with one field of type `T` as an argument is **not** | ||
equivalent to just passing a value of type `T`. So e.g. if you have a | ||
C function that is defined to take a `uint32_t`: | ||
|
||
```C | ||
void some_function(uint32_t value) { .. } | ||
``` | ||
|
||
It is **incorrect** to pass in a struct as that value, even if that | ||
struct is `#[repr(C)`] and has only one field: | ||
|
||
```rust | ||
#[repr(C)] | ||
struct Foo { x: u32 } | ||
|
||
extern "C" some_function(Foo); | ||
|
||
some_function(Foo { x: 22 }); // Bad! | ||
``` | ||
|
||
Instead, you should declare the struct with `#[repr(transparent)]`, | ||
which specifies that `Foo` should use the ABI rules for its field | ||
type, `u32`. This is useful when using "wrapper structs" in Rust to | ||
give stronger typing guarantees. | ||
|
||
`#[repr(transparent)]` cannot be applied to *any* struct. It is | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
limited to structs with a single field whose type `T` has non-zero | ||
size, along with some number of other fields whose types are all | ||
zero-sized (typically `std::marker::PhantomData` fields). The struct | ||
then takes on the "ABI behavior" of the type `T` that has non-zero | ||
size. | ||
|
||
(Note further that the Rust ABI is undefined and theoretically may | ||
vary from compiler revision to compiler revision.) | ||
|
||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
## Unresolved question: Guaranteeing compatible layouts? | ||
|
||
One key unresolved question was whether we would want to guarantee | ||
that two `#[repr(Rust)]` structs whose fields have the same types are | ||
laid out in a "compatible" way, such that one could be transmuted to | ||
the other. @rkruppe laid out a [number of | ||
examples](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-419956939) | ||
where this might be a reasonable thing to expect. As currently | ||
written, and in an effort to be conservative, we make no such | ||
guarantee, though we do not firmly rule out doing such a thing in the future. | ||
|
||
It seems like it may well be desirable to -- at minimum -- guarantee | ||
that `#[repr(Rust)]` layout is "some deterministic function of the | ||
struct declaration and the monomorphized types of its fields". Note | ||
that it is not sufficient to consider the monomorphized type of a | ||
struct's fields: due to unsizing coercions, it matters whether the | ||
struct is declared in a generic way or not, since the "unsized" field | ||
must presently be [laid out last in the | ||
structure](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-417843595). (Note | ||
that tuples are always coercible (see [#42877] for more information), | ||
and are always declared as generics.) This implies that our | ||
"deterministic function" also takes as input the form in which the | ||
fields are declared in the struct. | ||
|
||
However, that rule is not true today. For example, the compiler | ||
includes an option (called "optimization fuel") that will enable us to | ||
alter the layout of only the "first N" structs declared in the | ||
source. When one is accidentally relying on the layout of a structure, | ||
this can be used to track down the struct that is causing the problem. | ||
|
||
[#42877]: https://github.com/rust-lang/rust/issues/42877 | ||
[pg-unsized-tuple]: https://play.rust-lang.org/?gist=46399bb68ac685f23beffefc014203ce&version=nightly&mode=debug&edition=2015 | ||
|
||
There are also benefits to having fewer guarantees. For example: | ||
|
||
- Code hardening tools can be used to randomize the layout of individual structs. | ||
- Profile-guided optimization might analyze how instances of a | ||
particular struct are used and tweak the layout (e.g., to insert | ||
padding and reduce false sharing). | ||
- However, there aren't many tools that do this sort of thing | ||
([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420650851), | ||
[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420681763)). Moreover, | ||
it would probably be better for the tools to merely recommend | ||
annotations that could be added | ||
([1](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105), | ||
[2](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420077105)), | ||
such that the knowledge of the improved layouts can be recorded in the | ||
source. | ||
|
||
As a more declarative alternative, @alercah [proposed a possible | ||
extension](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/12#issuecomment-420165155) | ||
that would permit one to declare that the layout of two structs or | ||
types are compatible (e.g., `#[repr(as(Foo))] struct Bar { .. }`), | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
thus permitting safe transmutes (and also ABI compatibility). One | ||
might also use some weaker form of `#[repr(C)]` to specify a "more | ||
deterministic" layout. These areas need future exploration. | ||
|
||
## Counteropinions and other notes | ||
|
||
@joshtripplet [argued against reordering struct | ||
fields](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-417953576), | ||
suggesting instead it would be better if users reordering fields | ||
themselves. However, there are a number of downsides to such a | ||
proposal (and -- further -- it does not match our existing behavior): | ||
|
||
- In a generic struct, the [best ordering of fields may not be known | ||
ahead of | ||
time](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420659840), | ||
so the user cannot do it manually. | ||
- If layout is defined, then it becomes part of your API, such taht | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Typo: s/taht/that/ |
||
reordering fields is a breaking change for your clients (if we | ||
consider unsafe code that may rely on the layout, then this applies | ||
[even to structs with named | ||
fields](https://github.com/rust-rfcs/unsafe-code-guidelines/issues/11#issuecomment-420117856). | ||
nikomatsakis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Many people would prefer the name ordering to be chosen for | ||
"readability" and not optimal layout. | ||
|
||
## Footnotes |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this wording reserves the freedoms it want to reserve, and even if someone argued it does I would like it to be clarified. Since we did not get consensus to rule out profile-guided layout, the program source code is not all that informs layout. Not even if that includes build scripts, input data files, etc. that are used during the profiling run, since the program may not be deterministic w.r.t. these (e.g. it might have race conditions or depend on the system time or ...). So I don't think we can guarantee anything involving the words "deterministic function" and have to stick to something like "every time you invoke the compiler you may get a completely different layout".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have missed this, but did you managed to write your thoughts about this down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I agree with the gist of your comment @rkruppe but I think I don't quite agree with this part:
In particular, it seems like we would basically want to say that layout is determined by the program source code + other auxiliary inputs (e.g., compiler settings, output from PGO, etc).
I don't know that we need to give the freedom for rustc to choose arbitrary orderings on two consecutive runs where nothing at all changed (in particular, we've been shooting for deterministic compilation, and this would sort of contravene that). Of course it's ok to start there for now, since it doesn't say that we have to change layout...just seems looser than is needed.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: it is certainly possible to define PGO traces as part of the compiler input (beyond just sources and compiler flags), though it's a bit of a stretch IMO. That should be called out explicit in the text, though. What's currently written here could reasonably be read as saying the layout depends just on the source code and flags such as
-C opt-level
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, the updated version is good in that regard, thanks!