-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix init soundness #13472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix init soundness #13472
Conversation
liufengyun
commented
Sep 4, 2021
- Fix init soundness and align with the theory
- Code refactoring
99fed5a
to
ccde63f
Compare
72e50ee
to
efe7a9b
Compare
I have some high-level comments on the design that we should discuss in our next meeting. |
The heap is still an obstacle to reasoning about the correctness (soundness) of the analysis. The heap appears to conflate multiple different things. Some of those things are essential to the theory of the analysis and other things are just caches. It is difficult to see that those that are just caches really are used only as caches. Specifically, as far as I could tell, the heap contains, for each object, three things:
For number 2., do we even need to cache it in the heap? We should already be caching it in the normal cache when we There seems to also be some significant difference between heap objects for ThisRef (Hot) and for Warm refs. If there is, we should consider separating those as well. A separate thing that I don't understand, and specifically I don't understand how it affects soundness, is the |
Based on an idea from the online meeting, it seems there is a simple way to reason about the correctness of the heap. Suppose that in each iteration, we always use an empty heap. This always works (might be a little slower for some tests). The heap does not play a role, as it's not an input to the iteration function. It's easy to justify that this approach is correct as discussed. Now, when we reach a fixed point after an iteration, the values in the heap must all be fixed points. Then why throw them away, instead of treating them as stable cache? In case if the iteration does not reach a fixed point, we simply use the previous fixed-point heap for the next iteration (reversion). The implementation performs exactly this optimization. |
The problem is that the computation of what we place in the heap (presumably) depends on the current state of the |
Perhaps it's worthwhile also to think separately about the two outputs of Of course in the implementation we probably need to compute both functions together, but maybe even there there is some way to refactor parts of the code to make it clearer that the |
If I remember correctly from the meeing,
I don't see why it matters if the things it depends on are fixed points already and are then committed to global cache. Otherwise, global cache is problematic as well. |
I've tried several attempts to simplify this so we can better reason about its correctness, but I keep getting stuck. One thing that would help is if it were possible to separate the "populating parameters" part of checking an object from checking the class body. I see in some cases we already populate parameters without checking the class body. I also see that both rely on determining the linearization order. But (correct me if I'm wrong) populating parameters doesn't depend on reading non-parameter fields of the same object, so cannot read an uninitialized field. If that's the case, then it's not strictly necessary to interleave populating parameters and checking the class body of superclasses of a given class; we could populate parameters, and then, separately, check the class body. Separating them would require splitting out the code that determines the linearization order because that's needed for both. If we could do this, then we could have a "populate parameters" function whose result would be part of the abstract domain just like the tree->value map currently is. That is, the abstract domain would be two maps, tree->value and ref->parameters/outers (currently Objekt). Then it would be easier to show correctness since we would find a fixed point for this pair of maps. This is important because populating parameters depends on eval and eval depends on populating parameters. |
Populating the outers and class parameters may involve evaluating any code, as constructor arguments can be method calls, which in turn may contain any code.
Here it seems there is an important different between our analysis and the Abstract Definitional Interpreters paper: there, due to the nature of the whole program analysis, it seems that In our case, local reasoning enables modular checking. The iterative function of the modular check is Now, if |
if cache.containsObject(this) then this | ||
else this.ensureFresh().populateParams() | ||
|
||
def ensureObjectFreshAndPopulated(): Contextual[this.type] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused, consider removing
* | ||
* Invariant: fields are immutable and only set once | ||
*/ | ||
def updateField(field: Symbol, value: Value): Contextual[Unit] = log("set field " + field + " of " + ref + " to " + value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def updateField(field: Symbol, value: Value): Contextual[Unit] = log("set field " + field + " of " + ref + " to " + value) { | |
def updateField(field: Symbol, value: Value): Contextual[Unit] = log("set field " + field + " of " + ref + " to " + value) { | |
assert(!cache.stableHeapContains(ref)) |
This helps ensure that the stable heap really remains stable. During an iteration, we should not update fields of objects that are already in the stable heap and therefore supposed to be stable.
I've checked that the tests pass with this.
Note that this requires stableHeapContains
method to be added to Cache
because heapStable
is private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very good invariant to check. It seems your suggested change above already adds the same assertion in Cache.updateObject
, maybe we can skip this duplicate assertion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not intended to be a duplicate. stableHeap
is a two-level map: from heap with Ref
to Objekt
, then from Objekt
with field (Symbol
) to Value
. The two assertions are for the two levels of the map, to ensure that both levels stay stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The objects are immutable data structures, this method will call cache.updateObject(ref, obj2)
, thus reach the invariant there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I missed that Objekt
was immutable. I mistakenly thought fields
and outers
were mutable maps in an Objekt
.
case class Warm(klass: ClassSymbol, outer: Value, ctor: Symbol, args: List[Value]) extends Addr | ||
case class Warm(klass: ClassSymbol, outer: Value, ctor: Symbol, args: List[Value]) extends Ref { | ||
|
||
/** If a warm value is in the process of populating parameters, class bodies are not executed. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming populatingParams
to something like dontCheckBodies
.
It's not clear what "populating parameters" refers to since we always populate parameters of every class we abstractly interpret. As far as I can tell this is used only when we widen an existing ref to one with Cold outer and need to recompute parameters but don't want to check the bodies because we've already checked them. A comment would help here to explain why/when we do this (or something different if my understanding here is incorrect).
Having a flag like this seems fragile compared to passing a parameter to init
, callConstructor
, etc., or having one in the implicit context, but both of those are messy. I guess the flag is OK. Is there an issue with recursion? What if during computing a parameter of a Warm we instantiate the same Warm, is it possible that the flag will be incorrect for that other Warm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider having two flags or parameters or implicit parameters: populateParams
and checkBodies
. In init
, we go through constructors and do rather separate things in each one we encounter. Having them separated under flags would make it clearer what each part of the init
method is doing: some of it is common plumbing/iteration to identify superclasses in the right order, then other parts are specific to one or the other task.
Having two flags could help us avoid wasted work in recomputing parameters in the case explained in your comment in updateField
: in that case, we could run init
with checkBodies
true but populateParams
false (since we've already run populateParams
before but without checking bodies, and have now discovered that we need to check bodies as well.)
Also consider splitting these two tasks (populate params and check bodies) in init
to methods that could be called from init
. At the moment, init
is large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider having two flags or parameters or implicit parameters: populateParams and checkBodies.
As you commented, that will become messy because they have to thread through every place. Also, if we have multiple warm values in a context, it'll be even more confusing and will be a source of errors.
Is there an issue with recursion? What if during computing a parameter of a Warm we instantiate the same Warm, is it possible that the flag will be incorrect for that other Warm?
It could be --- in those cases, the outers and parameters might be set twice, that's one reason we need to weaken the invariant that the fields and outers are set once.
It's not clear what "populating parameters" refers to since we always populate parameters of every class we abstractly interpret. As far as I can tell this is used only when we widen an existing ref to one with Cold outer and need to recompute parameters but don't want to check the bodies because we've already checked them.
There are other cases: we first use a Warm
value from previous cache, which requires populating the parameters, and later we create an instance of the warm value, which requires initialization checking, thus resetting outers and parameters. The two might be interleaved, as mentioned in the Q/A above.
Also consider splitting these two tasks (populate params and check bodies) in init to methods that could be called from init.
init
follows the semantics of initialization, I don't see an easy way to make it simpler.
* 3. Revert heap if instable. | ||
* | ||
*/ | ||
def prepareForNextIteration(isStable: Boolean)(using State, Context) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be clearer with two methods rather than one method with a flag. That is, have one method commitToStable
and a second method prepareForNextIteration
, and then either call both of them in sequence or only prepareForNextIteration
.
final def work()(using State, Context): Unit = | ||
pendingTasks match | ||
case task :: rest => | ||
checkedTasks = checkedTasks + task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key iteration loop. It would make the overall algorithm more understandable if the code showed more explicitly what an iteration does. The structure I have in mind is something like:
// code to prepare cache and heap for analyzing a new class
def iterate = {
// code to prepare cache and heap for next iteration
doTask(task)
if(hasChanged)
iterate
else
// code to commit cache/heap to stable
}
iterate
pendingTasks = rest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clarify whether a "task" means doing all the iterations for a class or doing just one. Currently it's just one iteration but that doesn't fit with the idea of a worklist since it wouldn't be correct to interleave individual iterations for different classes. So I think a "task" should be all of the analysis of a single class.
We don't really need the worklist anymore but I think we should keep it in case we change the design to need it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we may need it in the future. Currently, we can't remove the worklist, because we only want to traverse the trees once.
|
||
var fieldsChanged = true | ||
|
||
// class body | ||
tpl.body.foreach { | ||
if !thisV.isWarm || !thisV.asInstanceOf[Warm].isPopulatingParams then tpl.body.foreach { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding isThisRef
method to avoid negated !isWarm
. See also below on line 1407.
This makes it closer to the Scala Symposium 21 paper.
The problem is that all warm objects are populated with outers and class parameters in Heap.prepare(). If we call `init` on those objects, the only-set-once invariant will be violated and crash.
Two motivations: First, it's close to the essence of its functionality. Second, it's easier to update the cache state in fixed-pointed computation.
The following test explodes: tests/init/pos/local-warm5.scala. Ideally we should use an actual set in RefSet, which we don't do for both performance and determinism. The performance concern might be a premature optimization.
Use correct klass for evaluating rhs of local variables.
Co-authored-by: Ondřej Lhoták <[email protected]> Co-authored-by: Natsu Kagami <[email protected]>
a266132
to
7ee58a3
Compare
This performance is checked in #13715 and can be seen here: https://dotty-bench.epfl.ch/13715/ The last two points correspond to the performance of |