Skip to content
This repository was archived by the owner on Apr 5, 2024. It is now read-only.

"Tootsie Pop" model #21

Open
nikomatsakis opened this issue Sep 3, 2016 · 7 comments
Open

"Tootsie Pop" model #21

nikomatsakis opened this issue Sep 3, 2016 · 7 comments

Comments

@nikomatsakis
Copy link
Contributor

The Tootsie Pop model leverages unsafe declarations to simultaneously permit aggressive optimization in safe code while being very accepting of unsafe code patterns. The high-level summary is roughly:

  • identify lexically scoped unsafe abstractions;
  • within an unsafe abstraction, disable optimizations and stick to a very simple model.

This has the advantage of being very permissive -- if we pick a suitable scope for the unsafe abstraction, I suspect that most any unsafe code that is out there in the wild which is sort of "remotely correct" will work out fine. But its achilles heel is that it can inhibit quite a lot of optimization. Somewhat annoyingly, it seems to interact poorly with both simple and complex cases of unsafe code:

  • The "unchecked get" use case is a good example where this can strike very innocent code (optimizing around unchecked-get #20).
  • But of course one can also imagine that sophisticated unsafe authors will want to supply detailed hints about performance -- for example, it would definitely make sense to ensure that we can wring detailed aliasing information out of the code in libstd.

Where the Tootsie Pop model does really well is the "middle" cases -- unsafe code that manipulates pointers and so forth, but where the author is not familiar with the unsafe code guidelines in depth. (The appeal of the Tootsie Pop model is thus dependent, to some extent, on how complex it is to understand and use the more advanced models.)

It's worth noting that even if we adopted the Tootsie Pop model, we'd likely still want to hammer out a more advanced model to cover the more advanced use cases.

@strega-nil
Copy link

strega-nil commented Sep 3, 2016

The issue I have with the "tootsie pop" model is that you then need to understand 2, if not 3 (if you want fast code) models. I still like it just about as much as I did when you first proposed it.

Giving up optimization opportunities to make the model easier to understand is, imho, a Good Thing; we don't need insane optimizations: our users are already writing procedural code, if they need faster, they can write it faster. User optimization (when based on evidence) will always win out over compiler optimization (because user optimization has compiler optimization to back it up ;P), and hurting the ability of the user to optimize makes me very uncomfortable (see: TBAA and signed integer overflow in C)

@mystor
Copy link

mystor commented Sep 3, 2016

How about by default performing the high optimization environment, but when people want to do crazy things that they aren't sure are safe, they can put a #![unsafe_module] which opts into lower optimizations, but more easy to understand UB semantics.

We then just teach people that "if you're writing unsafe code which interacts in potentially unsafe ways with safe code in the same module, use #![unsafe_module] to make sure that you don't trigger UB".

So the model would be
a) writing safe code, constraints are held, and stuff goes fast.
b) Something like unsafe indexing, unsafe code which doesn't break constraints, stuff still goes fast, and relatively low syntactic overhead
c) Something like Ref<'a, T> or RefMut<'a, T> where the safe code has to obey some unwritten constraints to make sure not to break the unsafe code in the module, the module gets annotated with #![unsafe_module], and those modules are made safe.

The unsafe_module attribute could also take arguments which describe what types of optimizations to inhibit.

I don't super like this idea, because it could be confusing, but it does make the lexical unsafety boundary explicitly visible, and give people power to move it.

@eternaleye
Copy link

@mystor

c) Something like Ref<'a, T> or RefMut<'a, T> where the safe code has to obey some unwritten constraints to make sure not to break the unsafe code in the module, the module gets annotated with #![unsafe_module], and those modules are made safe.

This phrasing, to me, illustrates what I think is one of the biggest problems. Namely, it's incorrect about what circumstances one would need to use #![unsafe_module] in, and it's incorrect in a way I suspect will be very common.

In particular, even in the "unchecked get case", the safe code needs to "obey some unwritten constraints" in order for it to be valid - it needs to not make the slice too short, etc.

The real difference is that #[unsafe_module] is needed when violations of the type system are capable of escaping the unsafe { } block. #![unsafe_module], then, says that those violations may escape as far as the containing module, and no farther.

With that framing, #![unsafe_module] is overly inflexible: As an attribute, it could potentially be applied to any construct, and indicate the boundary beyond which type-safety violations may not occur. This could be narrower (individual impl blocks), or even narrower still (the function, though IMO any such case should just be unsafe fn), or broader (a parent module), or even the absolute broadest (#![unsafe_boundary] at the crate root)

Of course, unsafe mod is unused syntax, and could cover the Tootsie Pop model with any module boundary.

@nikomatsakis
Copy link
Contributor Author

This phrasing, to me, illustrates what I think is one of the biggest problems. Namely, it's incorrect about what circumstances one would need to use #![unsafe_module] in, and it's incorrect in a way I suspect will be very common.

This is a concern of mine as well. Shortly after posting the original TPM post, I was going to write a follow-up basically describing this scheme, which describes the notion of narrowing the "unsafe abstraction" region.

But in writing up that post I realized that I myself had two distinct notions of what the unsafe abstraction region ought to mean and I hadn't even fully realized it. One of them is the "logical" abstraction region, which aligns with privacy. And the other is the "type trusting" region -- just as you describe.

This gave me pause and made me feel that perhaps this is indeed barking up the wrong tree. Perhaps there is a simpler way to frame things that winds up feeling less subtle.

I've since reconsidered and am now in the middle of the road again. =) I very much want to pursue other avenues, but I think that maybe talking explicitly about being able to designate the boundary where:

  • Rust can assume all types are valid upon entry
  • Rust can assume all types are valid upon exit

might work out ok, but I still hope to find an alternative.

@asajeffrey
Copy link

A couple of questions about the tootsie pop...

When you exit an unsafe boundary, are you required to restore the Rust memory safety invariants for all memory, or are you allowed to have memory that is only reachable via your module (e.g. via a private field)?

When an unsafe module calls a safe module, does that count as crossing a safety boundary, so the memory safety invariants need to be restored? If yes, then how does unsafe code do anything (e.g. use a logger)? If no, then do we need to compile every module twice, once as a safe module, and once as an unsafe one?

@RalfJung
Copy link
Member

When an unsafe module calls a safe module, does that count as crossing a safety boundary, so the memory safety invariants need to be restored? If yes, then how does unsafe code do anything (e.g. use a logger)? If no, then do we need to compile every module twice, once as a safe module, and once as an unsafe one?

Just my 2 cents: I would argue that the safety invariants of the part of memory reachable by the safe module need to be restored. Essentially, that's all global variables and all arguments, and everything transitively reachable from them. However, things that are private to the unsafe module should be allowed to stay "tainted".

@asajeffrey
Copy link

@RalfJung yes, I'd been thinking something in terms of safe reachability. We could try something like saying that the safe roots from a module are the ones that escape from it, either by being returned or by passing as a callback argument. Then the safely reachable heap is the subset of it that includes the safe roots, and is closed under dereferencing public &T pointers. Ditto for the safely mutable heap. Each module is responsible for maintaining that the safely reachable heap maintains the Rust memory invariants.

Something like this would answer both of my questions. It would also address some of the concurrency issues, since we could ask for unsafe code to always maintain safety of the safely reachable heap, not just at function call/return boundaries.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants