Skip to content

SB: Allowing function argument references to dangle under some circumstances #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Diggsey opened this issue Oct 14, 2020 · 11 comments
Open
Labels
A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows) A-dereferenceable Topic: when exactly does a reference need to point to regular dereferenceable memory?

Comments

@Diggsey
Copy link

Diggsey commented Oct 14, 2020

Continuing discussion from rust-lang/rust#55005 (comment)

Currently references in function arguments are special in that they must remain valid even after their last use. @RalfJung gave an example of a powerful optimization that relies on this behaviour.

Quoting the SB page:

Deallocating memory

Memory deallocation first acts like a write access through the pointer used for deallocation. After that is done, we additionally check all protectors remaining in the stack: if any of them is still active, we have undefined behavior.

What would be the impact of changing this to:

Deallocating memory

Memory deallocation first acts like a write access through the pointer used for deallocation. After that is done, we clear the stack,

It seems like the specific optimisation @RalfJung mentioned should still be valid, but perhaps there are other optimizations that would be prevented?

@CAD97
Copy link

CAD97 commented Oct 14, 2020

The key problem as I understand it is that SB is strictly operational. Using the previous example,

fn​ ​foo(x: ​&mut​ ​i32) {
  ​*​x ​=​ ​13;
  ​bar(); ​// some function we know nothing about, except that it is nounwind​
  ​*​x ​=​ ​27;
}

We cannot be guaranteed that foo exits, and thus that x is ever dropped, because bar could enter an infinite loop. If bar (even potentially) never returns, then any information we learn about x cannot backpropogate over the call to bar.

This is a secondhand remembered interpretation, don't give it more weight than that.

@digama0
Copy link

digama0 commented Oct 14, 2020

Here's a variation on the example in 100% safe code:

fn bar(y: &mut i32) -> ! {
    print!("{}", y);
    std::process::exit(0)
}
fn foo(y: &mut i32) {
    let x = &mut *y;
    *x = 13; // no way we can move this down
    bar(y);
    *x = 27;
}
fn main() { foo(&mut 0); }

Here y is a stand-in for "the environment", which x is borrowing from. It differs from Ralf's original example in that the compiler knows both that x is invalidated (at the call to bar(y)) and also that bar(y) doesn't return, which through the magic of NLL means that the later access to x is actually safe, because it is unreachable code. SB acts in essentially the same way as the borrow checker here, except it works when these properties can't be determined until run time.

@Diggsey
Copy link
Author

Diggsey commented Oct 15, 2020

@digama0 I don't really understand how that relates to this issue? *x = 13 cannot be moved down in any scenario, as that would change the behaviour of the function. Furthermore, the difference between SB and the borrow-checker should only matter when unsafe code is involved...

@RalfJung was giving an example where, with the right SB rules, it would be possible to move the assignment below the function call. The fact that this optimisation would be enabled was an argument for disallowing dangling references that were passed into the function.

@RalfJung RalfJung added the A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows) label Oct 15, 2020
@RalfJung RalfJung changed the title SB: Allowing dangling references SB: Allowing function argument references to dangle when unused Oct 15, 2020
@RalfJung
Copy link
Member

@Diggsey (in the other thread)

if I'm understanding correctly, this matters for function arguments and not local variables because for locals the compiler can determine whether they escape the function (and if not, the collusion you mentioned cannot happen), whereas for arguments that cannot be determined, and so dereferenceable must be part of the contract of the function?

I don't understand the question here. For local variables we simply cannot do this as we have no guarantee for how long they live. Basically what protectors do is they operationally codify the rule "references passed to a function outlive that function", a rule that is also present in the Rust's borrow checker (where lifetime parameters always outlive the function). There is no pendant of this for local variables, so we cannot do anything similar there.

Couldn't you have special treatment of a reference between its "last use" and when it is removed from the call-stack? After its last use it becomes a "zombie reference" which, by still existing in the stacked borrows "stack", disallows the collusion of bar with the caller, but does not actually require that the reference still be dereferenceable.

We don't know when a reference is "last used", so I don't understand this proposal. Also there are no references on the call stack. I am confused.^^

But it seems you moved on to another proposal, one that I actually understand:

Memory deallocation first acts like a write access through the pointer used for deallocation. After that is done, we clear the stack,

This is an interesting proposal. It removes an asymmetry between deallocation and writes that I added mostly based on a gut feeling that deallocation is "more destructive". I don't think this extra assumption played any role in our optimization proofs... for the protector-based proofs, we need to prevent the callee from writing (and sometimes even reading) the relevant location(s), and even with your new proposal "calle cannot write (without causing UB)" still implies "callee cannot deallocate (without causing UB)", so this should all still work. That would mean the optimizations we lose are only ones we have not considered yet. (To be fair, we have only considered the most obvious optimizations that first came to our mind.)

In fact, this is the key new property that unsafe code would get with your proposal: if you are allowed to use a pointer for writes to some memory, you are also allowed to deallocate that memory.

This would, however, still check protectors of other pointers that are invalidated by this write. We will have to carefully go over each of the relevant cases where protectors/dereferencable are a problem, and see if they are actually fixed by this change.
Even for the Arc case, this is far from obvious to me. For everything else, I think it does not help at all, as there is no deallocation involved there.

One thing that this definitely means is that we can no longer emit LLVM's dereferencable attribute, for any reference. There are programs that are allowed by this variant of SB that are UB with dereferencable.

On the other hand, this could mean that we can set protectors for Box, which is intriguing.


@digama0 the original example relied on not passing x (or any pointer derived from x) to bar. Of course if you violate a fundamental condition required for this optimization, the optimization no longer works.

@RalfJung RalfJung changed the title SB: Allowing function argument references to dangle when unused SB: Allowing function argument references to dangle under some circumstances Oct 15, 2020
@RalfJung
Copy link
Member

Here's an optimization that this proposal loses:

fn foo(x: &mut i32) -> i32 {
  let n = bar(&mut *x);
  let mut sum = 0;
  for i in 0..n { sum += *x; }
  sum
}

With your proposal we can no longer hoist the *x out of the loop, because bar might have deallocated x.

I am not saying that this is a particularly interesting optimization or that it would apply particularly often -- but this is what it means to have to use dereferencable_on_entry instead of dereferencable.

@Diggsey
Copy link
Author

Diggsey commented Oct 15, 2020

We don't know when a reference is "last used", so I don't understand this proposal. Also there are no references on the call stack. I am confused.^^

But it seems you moved on to another proposal, one that I actually understand:

I probably didn't explain very well: originally translated into SB, my suggestion would have been to retain the protectors but convert all the references in the stack to "Disabled".

Initially I thought that removing the protectors might somehow make some of the formal proofs more difficult as it would violate the rule that protectors exist until the stack frame is removed. However, I quickly realised that the behaviour would be identical to just clearing the stack anyway, so that's what I proposed here.

One thing that this definitely means is that we can no longer emit LLVM's dereferencable attribute, for any reference. There are programs that are allowed by this variant of SB that are UB with dereferencable.

I don't think that's true, as dereferenceable in LLVM was relaxed to only validate the reference on entry to the function. Wouldn't it just mean we couldn't emit "nofree" or "global_dereferenceable"?

Here's an optimization that this proposal loses:

Makes sense!

@RalfJung
Copy link
Member

RalfJung commented Oct 15, 2020

I don't think that's true, as dereferenceable in LLVM was relaxed to only validate the reference on entry to the function. Wouldn't it just mean we couldn't emit "nofree" or "global_dereferenceable"?

Oh, did that change already happen? I thought this was all still being discussed.

So to be clear, with Stacked Borrows unchanged we can emit dereferenceable_for_lifetime_of_function_call ("global" sounds way too strong, I recall arguing against such semantics in an LLVM review); with your proposal that is not possible any more, we can only emit dereferenceable_on_entry. Which of the two the dereferenceable maps to I am not sure.

@fogti
Copy link

fogti commented Oct 16, 2020

why don't we introduce an #[may_dangle_on_exit] attribute for function arguments to make that requirement ("references passed to the function must be valid at least until exit/return of the function") conditional / opt-out? (althrough I don't know how it would interact with traits (like Drop)).

@RustyYato
Copy link

@RalfJung couldn't that function be optimized to

fn foo(x: &mut i32) -> i32 {
  let n = bar(&mut *x);
  let mut sum = 0;
  if n != 0 {
    let x = *x;
    for i in 0..n { sum += x; }
  }
  sum
}

Thus lifting the dereference out of the loop, but only performing the dereference if the loop would have been run? A similar optimization could always work,

fn foo(x: &mut i32) -> i32 {
  before(x);
  while some_condition() { body(*x) }
  after()
}

Could "optimize" to

fn foo(x: &mut i32) -> i32 {
  before(x);
  if some_condition() {
    let x = *x;
    loop {
      body(x);
      if some_condition() { break }
    }
  }
  after()
}

@RalfJung
Copy link
Member

@RustyYato sure, we can always do loop hoisting if we are willing to split out the first iteration of the loop.

@RalfJung
Copy link
Member

While I like this model, it would mean degrading not only the optimizations we can do on &UnsafeCell, but also those on &mut -- which, when brought to the LLVM experts, raised plenty of concerns.

So my current thinking is we should instead make SharedReadWrite items not protected. This means even fewer optimizations on &UnsafeCell than this proposal, but that type is not expected to be heavily optimized anyway, so that seems acceptable.

@RalfJung RalfJung added the A-dereferenceable Topic: when exactly does a reference need to point to regular dereferenceable memory? label Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows) A-dereferenceable Topic: when exactly does a reference need to point to regular dereferenceable memory?
Projects
None yet
Development

No branches or pull requests

6 participants