-
Notifications
You must be signed in to change notification settings - Fork 59
Adopt Minimum Union Validity Rules #494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@rustbot label +A-unions +A-validity +T-opsem |
I'm not sure whether I should also add S-pending-team-decision. |
I don't wanna pick on your wording too closely because it's a draft and all, but I think I know what you're trying to say, and I can barely tell what that means. |
Do you have a suggestion for better wording? |
I think what's giving me trouble is that it's all one sentence with too many little parts in the sentence. Also, the use of the past tense strikes me as a little odd, though maybe it fits in the larger context. Also, it doesn't matter why a byte can be uninit, just that it can be for whatever reason. Keep things simple by not trying to list particular reasons. Try this on for size:
|
The issue is that the way the rest of the chapter is I think makes it ill-defined to ask about the validity invariants of a padding byte. Validity checking is spec'd after value computation (and talks about the actual value), and padding bytes are completely ignored. |
what? Then change the rest of the chapter, I guess. If a spec can't tell me, very plainly, which bytes within a type must be initialized or not... well then something is very wrong with the organization of that spec. You could say "padding bytes are valid when initialized and when uninitialized" somewhere, and that seems sufficient explanation. |
The spec tells you when a given value is valid. Parts of the chapter (IE. validity for scalar types), say
Talking about validity in terms of individual bytes is actually ill-defined IMO, because validity is a compound property of a whole value ( |
I'm not talking about which specific initialized values each byte can be though. Needing to be initialized or not is an "earlier" property than needing to pick a specific value. All four bytes in |
I think that {initialized INCLUSIVE-OR byte can be uninitialized/padding in at least one variant} being sufficient validity should be fairly non-controversial1, and support an FCP canonicalizing such2. I don't know exactly how the draft spec is handling provenance, but the FCP should explicitly acknowledge it (we've got T-lang FCP accepted that provenance exists, IIRC) even if to only say that nothing is determined yet. Minor side notes:
Footnotes
|
The spec chapter is quite clear the repr(Rust) unions don't have defined offsets for each field. In general, I did my best to encapsilate only what guarantees currently exist while being complete. Unions were the only place I was unable to do that. |
|
Looks pretty good, but that does still sorta imply that some padding bytes aren't allowed to be uninit. EDIT: you may also want some wording that bytes past the end of a field count as being padding for this determination. |
Fixed to exclude bytes that aren't part of the field's storage, either before or after that storage. |
Focusing on "init vs uninit" is very strange to me. Validity can vary between "must be this particular byte" and "can be anything including uninit", why would we special case the init vs uninit distinction? If we require Also, validity should be defined as "this is the representation of some value of this type", so really we should be discussing what are the values of union type (in a domain like MiniRust's
What does it mean for a byte to be "valid when computed from an uninit value"? Validity is a property of sequences of bytes at a given type, you can't ask about just a single byte. And the "computed from" part is also entirely unclear to me. These things are much better expressed in logical notation than English, IMO. |
We had tons of prior discussion on this in #438 and the linked issues. Any attempt to come to a decision here should probably start by summarizing the main points we already discussed. |
The issue is that the rule gets wildly complex from there. The most simple counterexample is #[repr(C)]
pub union{
pub foo: NonZeroU16,
pub bar: (NonZeroU8, bool),
} Which, assuming a packed layout for the tuple can be in the state |
I think it's no more complicated than actually defining your "byte is valid when computed from an uninit value". But I agree it is too complicated, which is why I think unions should allow uninit throughout and leave the rest to explicit opt-in. |
I don't think this is entirely a fair characterization — the property that more fundamentally matters is whether a byte is potentially initialized as a padding byte. The bytes of So AIUI the spec is acknowledging three "kinds" of bytes — padding bytes (not preserved by value assignment) and value bytes (preserved by value assignment), with value bytes further classified as always initialized or as potentially uninitialized (due to that offset being padding in some enum/union variant). That's how "potentially uninit" can be an independently per-byte property while "valid value byte" isn't. Although, actually: defining "byte is valid when computed from an uninitialized value" per-byte does need to be careful w.r.t. enums, where whether a particular byte is valid to be uninitialized does depend on other value bytes (namely, the discriminant encoding). For clarity, I also think trivial validity is the semantics we do eventually want for unions; I conceptualize unions as essentially a set of typed places aliasing an untyped memory region. The benefit of the spec work using a stronger requirement is solely in not committing to trivial validity for unions. To that end, I think the intermediate spec language should use the maximal requirement (for each construction byte there exists a field where there exists a valid construction of that field with that byte at that offset). The awkward nested "exists" quantification is still required for the "valid when computed from an uninitialized value" construction, just over ADT enum discriminants instead of over value bytes. |
I don't think of padding bytes as a first-class concept that shows up in the opsem, it is merely a derived notion. I don't know what "initialized as padding" is supposed to mean.
Yes, it is exactly that that I am objecting to. We can add more classes of bytes, like "non-null" bytes or so. I don't think there is a need to have any such "class of bytes" show up in the definition of validity. They may be a useful way to compare certain validity invariants, but I think introducing them into the basic definitions is unnecessary complexity.
I am not convinced that it is very useful to commit to an intermediate spec. This is a spec hole. If we do for some reason have to commit to something, we could explicitly say: the validity invariant lies somewhere between the following two:
|
If by "somewhere between" you mean "here's a bunch of bounds, the validity invariant is unspecified between them", that's just the same thing as "Here's the strongest validity invariant in the range". It's not really meaningful to place a lower bound on what is considered undefined behaviour. If you mean literally that the spec says "It something between these two bounds, and Rust knows what it is but isn't going to say", then that's not really useful to anyone - users can't rely on it being anything other than the strictest constraint, and implementors can't rely on it being anything other than the weakest constraint. |
If this was just about defining UB, you'd be right. But people will inevitably refer to "the validity invariant of So I am arguing that saying "it is between these bounds" is not at all equivalent to saying "it is the strongest possible one". It is logically equivalent when only considering the AM, but not when considering the larger context and human factors. |
That's probably true of a lot more than just union validity, and the spec shouldn't have to worry about specifying too little. That's frankly a lot of excess typing for something that doesn't make a difference for the required observable behavior of a program. It may be worth a note, but certainly not normative text. |
(Constraining ill-formedness to between bounds makes more sense, because the upper bound of well-formedness is semantically meaningful - a valid implementation will not translate a program that exceeds that upper bound - in UB you can't exactly say "The compiler will definitely start doing whatever it wants past this point" because "what it wants" could be completely well-defined anyways) |
I'd like a T-opsem FCP on adopting at least a minimum set of Union Validity Rules. This would allow documenting all type validity invariants in the dynamic.layout spec chapter (spec#15).
Based on discussions on zulip (https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/Can.20we.20decide.20the.20validity.20invariant.20of.20unions.3F) the proposed rule is (in prose)
The term "value byte" is used in the above chapter to mean "Any byte that is not a padding byte". Notably, any byte that is padding in all fields is padding in the union, and thus always allowed to be uninit.
The minimum rules ensure we can widen the Validity Invariant in the future to something closer to (or exactly) a trivial validity invariant.
The text was updated successfully, but these errors were encountered: