Skip to content

Adding fix point evaluation in initialization checker #13379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

EnzeXing
Copy link
Contributor

This PR adds the fix point evaluation in the initialization checker to ensure soundness using a two-cache approach. It also includes tests with have initialization error that are not detected in the current checker.

@EnzeXing
Copy link
Contributor Author

@liufengyun @olhotak I have created a draft proposal, but this proposal would introduce other unsoundness issue: soundness2.scala and soundness6.scala would pass. It's seems we can't simply take the value from the outputCache if it includes the expression. The evalCache should behave like the global cache. I'll try to propose a fix tomorrow.

@@ -124,7 +124,7 @@ class Semantic {
*/
def updateField(field: Symbol, value: Value): Contextual[Unit] =
val fields = heap(ref).fields
assert(!fields.contains(field), field.show + " already init, new = " + value + ", ref =" + ref)
// assert(!fields.contains(field), field.show + " already init, new = " + value + ", ref =" + ref)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix-point evaluation needs to update fields that are already initialized, but this line doesn't allow to do that. Could we remove this assertion, or shall we find other ways to empty the heap before re-evaluation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an issue that we will have to consider in the overall design, in the theory. The assumption that the heap does not change once set was dependent on a single pass of evaluation. Now that we are doing a fixed-point computation, I don't think this assumption holds anymore.

So we might need to include the heap in the fixed-point evaluation, i.e. in the cache.

More generally, I think this points to a need to discuss the overall design more first before doing the implementation.

Still, the implementation was valuable in that it uncovered this theoretical issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts on the overall design:

One design that we know is correct (but probably inefficient) is to do the fixed-point computation at the very outermost level. That is, we do one pass of the whole analysis, for all classes. We then clear the cache, clear the heap, clear everything, and do the whole analysis again, except this time, instead of using Hot as a starting point for all unseen expressions, we use the result from the first phase. Repeat until a fixed point.

Maybe we should just implement this correct-but-inefficient strategy first and then look at ways to improve efficiency.

If we do not track any dependences, it is quite easy to see that at least two passes over every expression are necessary: the first eval of any expression could depend on something that turns out to be non-hot (we can't know for sure since we don't track dependences), so to be safe we need to analyze the expression at least once more.

If we decide that two passes over everything are too expensive, then we must look at ways to track dependences. Then, I think our target should be to find cheap, i.e. coarse-grained ways to track dependences. For example, if eval of some expression depends on some field of the heap, it may be cheaper to just record that it depends on the heap in general and redo the eval of that expression than to precisely record the exact set of fields of the heap that it depends on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can compute fixed points for field initializers and then store the result in the heap --- which follows the paper. Actually, that's one of the main motivations of computing fixed points for all values stored in the heap. We can view the heap as a global cache.

else
thisV.emptyField()
fixPointEval(expr, thisV, klass, outputCache, new evalCache)
}
Copy link
Contributor

@liufengyun liufengyun Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed something, we mentioned it's preferable to compute fixed points for field initializers & when accessing a missing key of warm objects. Otherwise, it will not integrate well with the heap-level immutable caches.

/** Eliminate the field is necessary when computing fix-point
*
*/
def emptyField(): Contextual[Unit] = heap(ref).fields.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this. Breaking invariants in code is not good.

@@ -268,7 +273,7 @@ class Semantic {
}

/** The state that threads through the interpreter */
type Contextual[T] = (Env, Context, Trace, Promoted) ?=> T
type Contextual[T] = (Env, (evalCache, evalCache), Context, Trace, Promoted) ?=> T
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Scala, types need to be capitalized. Meanwhile, opaque types can help here to make the interface nicer.


type evalCache = EqHashMap[Tree, Result]
def inputCache(using caches: (evalCache, evalCache)) = caches._1
def outputCache(using caches: (evalCache, evalCache)) = caches._2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type definition needs to be capitalized. I think opaque types will make the API nicer.

BTW, let's move domain definitions to the domain section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache key is incorrect: from the paper, it says Expr -> Value -> Value instead of Expr -> Result.

else
// need to compute fix-point for soundness
if !inputCache.contains(expr) then inputCache(expr) = Result(Hot, Errors.empty)
outputCache(expr) = inputCache(expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's simpler to write the code as follows:

outputCache(expr) = inputCache.getOrElse(expr, Hot)

if !res.errors.isEmpty then res
else if inputCache.equal(outputCache) then
inputCache.commitEvalCache(thisV)
res
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the meeting, @olhotak mentioned it's preferable to use a flag to remember whether something has changed, which should be faster than testing equality of caches.

@liufengyun
Copy link
Contributor

This is now superseded by #13471. Thank you @EnzeXing for trying and the tests.

@liufengyun liufengyun closed this Sep 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants