-
Notifications
You must be signed in to change notification settings - Fork 59
can we in some cases have more limited forms of "undefined behavior"? #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with @rkruppe that in many cases UB is the safest choice since it allow us to define the behavior later. I am not sure I fully understood @gereeter argument (maybe it can elaborate), but it appears that it seems to be stating that there are other options beyond defining and not defining some program behavior. AFAICT, we can not define the behavior of something, e.g., if a null pointer is dereferenced, the program is illegal (the guidelines do not speak about the behavior of illegal programs). Anything else we do, makes it defined behavior:
Maybe @gereeter point is that we should prefer defined behavior to undefined behavior ? If so, I fully agree, but in many cases, like in the pointer dereference case, we can't and that's ok (e.g., those who want to avoid it can stick to safe Rust, those using raw pointers are explicitly opting into that for "reasons"). Also, because dereferencing a null pointer is undefined behavior, we could have a "fortified" build mode that |
From the other thread
It seems likely that LLVM will move to just one kind of "incorrect value", and that is poison. The "floating" undef has lead to no ends of problems, and many optimizations are unsound, some of them leading to real miscompilations. So, I think we should restrict ourselves to two possibilities when we want to declare an operation illegal:
LLVM basically uses the latter whenever it can get away with that, as that leads to easier code motion optimizations. However, when specifying a surface language, that is not our concern, and hence it is usually preferrable to declare UB -- the optimizer then still has the choice of actually "just" considering this as returning @gereeter proposes another form that's somewhat "in the middle": Weak UB taints the "future" of the execution, but cannot "travel back in time" like C's Strong UB can. Adding another kind of UB complicates the specification (and constraints optimizations and codegen to LLVM, because LLVM does not have anything resembling weak UB), so I'd like to see strong reasons for why the two kinds we have are not enough. Now, @gereeter provides some good arguments, but I think those goals are fairly unachievable. For example:
It is an open problem, as far as I know, how to specify a language that can be both optimized and provide reasonable guarantees for "cleaning" memory. Our work here is hard enough without adding additional unsolved problems to it, so I am worried that we are adding too many things on our plate here. Also, this as well as the other arguments only work out if all UB is weak. If we restrict ourselves to strong UB now, we can always offer a weak UB mode later once we actually have codegen that can support this. That is something which might be possible cranelift codegen backend for debug builds, but as long as we use LLVM, I think that is ways off. |
@RalfJung can you say a bit more about what "poison" means in LLVM (vs "floating undef" or whatever the alternatives are)? |
"poison" is like the "floating undef" (the |
@RalfJung thank you for writing this. When talking about
Could you maybe elaborate a bit more on how |
Poison is compile time fiction, it is used in the language semantics to justify some code transformations, but it is not a distinct value at run time (unless one deliberately builds a VM like miri that detects UB rather than exploit it). I wouldn't go as far as saying the choice between poison, undef, and instant UB has no impact on the compiled machine code at all, but it certainly isn't a real real value the program could detect and handle specially.
It is not immediately clear to me whether it is or should be instant UB if a value annotated as Attributes such as Another angle is, if you handle poison values, you're already hovering one centimeter over the pit of lava that is UB, and the only reason you don't fall in is you promised you won't try to get too smart with the garbage value. If this clashes with wanting to assume the value is well-behaved in certain ways, well, maybe stop assuming that, or don't handle garbage values, or manually fix up the result of |
Not sure what a "nonnull poison value is". Attributes are attached to variables/function arguments, not values -- and poison is a value. The semantics of Wrt. attributes, I do not know whether it is UB to pass "posion" for an argument with the |
Discussed in backlog bonanza, closing. The answer to the question in the title is probably "yes," but there's no super clear compelling motivation. People are welcome to open new issues if there are specific problems they are facing |
Note that C is proposing it: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3128.pdf |
We discussed that in the meeting (not the C proposal itself but the rust equivalent), there seems to be some disagreement about to what extent UB honors observable behaviors. (In particular, if one annotates a syscall or other external function call with a |
That C proposal seems rather limiting, e.g. one can no longer move memory accesses up across For Rust it's probably less bad since references give us stronger guarantees and earlier UB. But anyway I agree with Mario, we shouldn't have different kinds of "UB" but we might want to have an issue tracking whether our UB can exhibit "time travel" or not. |
Ah, it wasn't clear to me that there was actual disagreement here. My impression is that we can't actually consider anything other than time-travel UB because LLVM won't let us (although I'm having trouble getting it to do even obviously correct optimizations, so who knows) |
Well, UB "time-travel" is partially a result of colloquial misinterpretation of the execution semantics. UB time-travels past unobservable computations because those computations are not temporally sequenced in the first place, they are more like nodes in a data dependence graph. The kind of time-travel that is actually observable would be traveling past IO, and I would say that for the most part we don't allow that, except in specific (TBD) enumerated circumstances. If you call an opaque function and then do a UB the function call will definitely happen, because it might just abort the program. |
Right, but not all observable behaviour is entirely opaque - and as discussed, if the thing being implemented is the abstract machine (and not an emulator of the abstract machine, so something closer to miri), then it knows exactly when the program terminates or reaches a UB state (and the AM doesn't necessarily track the prior observable behaviour and may simply discard it when it halts with an error). |
I opened an issue for time-traveling UB: #407 |
In #5, @gereeter raised the point that defining all manner of errors as yielding "undefined behavior" is an awfully strong statement. It theoretically permits the compiler to "change the past" and may not mesh so well with the way that users think. On the other hand, weaker definitions may not permit the kinds of optimizations we want and may not be supported by LLVM.
In a sense, this is a cross-cutting concern: we need to figure out what's allowed and not allowed, but separately, we should consider if there are cases where we can contain the repercussions.
I'm not sure the best way to handle this, but I'm opening this up as its own potential discussion topic for the future.
The text was updated successfully, but these errors were encountered: