Skip to content

Performance issues with match type reduction #12267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
odersky opened this issue Apr 29, 2021 · 11 comments
Closed

Performance issues with match type reduction #12267

odersky opened this issue Apr 29, 2021 · 11 comments
Assignees

Comments

@odersky
Copy link
Contributor

odersky commented Apr 29, 2021

#12153 introduce a performance regression for the stdlib test. I tracked it down to attempts to do match type reductions that seem very slow.

I saw that overall there were 566 attempts at match type reductions, presumably when compiling the Tuple.scala (and maybe also Tuples.scala) files. Without the change to translucentSuperType there were none. But the change is necessary and 566 does not seem an excessive number. Nevertheless these reduction attempts (which all fail in the end) seem responsible for the significant slowdown of stdlib compilation as a whole.

We should get to the bottom of this. As a first step, we should try to get a flamegraph or other profile just for compiling Tuple.scala and Tuples.scala. Once we know where the slowdown happens we should see what we can do to optimize.

@liufengyun
Copy link
Contributor

liufengyun commented Apr 30, 2021

WIP update:

Flamegraph for compiling stdlib

Compile times after warmup (on my local machine)

  |----------------------------+-----------+-----------|
  |                            |    before |     after |
  |----------------------------+-----------+-----------|
  | stdLib213                  | 11804.010 | 18015.115 |
  |----------------------------+-----------+-----------|
  | Tuple.scala + Tuples.scala |   478.229 |   492.295 |
  |----------------------------+-----------+-----------|
  | Tuples.scala               |   378.240 |   424.442 |
  |----------------------------+-----------+-----------|
  | Tuple.scala                |   171.700 |   170.791 |
  |----------------------------+-----------+-----------|

@odersky
Copy link
Contributor Author

odersky commented Apr 30, 2021

So it's clearly not Tuple.scala where the tryMatchAlias calls are made. Question to ask: for which files are they made? And why do they take so much time? I instrumented everything else. It's really just those ~600 calls to tryMatchAlias that make all the difference. Nothing else is invoked at all.

@liufengyun
Copy link
Contributor

liufengyun commented Apr 30, 2021

From the data we have, it seems that the slowdown comes from the exhaustivity check (mostly redundency check). If we disable exhaustivity check, then the numbers go back as before.

The current flamegraph does not show enough stack depth to see where it reaches MatchType.tryNormalize. I'm collecting new data with higher sampling frequence to see if we can get more info about the calling stack.

@odersky
Copy link
Contributor Author

odersky commented Apr 30, 2021

What I am seeing is

  • the added tryMatchAlias calls are all made when compiling individual Tuple classes Tuple1 to Tuple22.
  • at least some of the calls arise from checkAllOverrides, and I suspect that the class that's investigated there is Tuple.scala which defines a lot of match types and functions taking match types.

This seems to be a very specific case. The only classes inheriting from Tuple are Tuple1 to Tuple 22 and a slowdown for them is not indicative of a slowdown elsewhere. On the other hand, we should verify whether there is some bottleneck in match type reduction. Such a big slowdown for ~600 calls looks excessive.

@odersky
Copy link
Contributor Author

odersky commented Apr 30, 2021

I verified that all calls to tryMatchAlias originate from RefChecks, with the stacktrace like this:

     at dotty.tools.dotc.core.TypeComparer$.isSubTypeWhenFrozen(TypeComparer.scala:2619)
at dotty.tools.dotc.core.Types$Type.frozen_$less$colon$less(Types.scala:1026)
at dotty.tools.dotc.core.Types$Type.overrides(Types.scala:1067)
at dotty.tools.dotc.typer.RefChecks$.checkMemberTypesOK$4$$anonfun$3$$anonfun$3(RefChecks.scala:775)
at scala.runtime.function.JProcedure1.apply(JProcedure1.java:15)
at scala.runtime.function.JProcedure1.apply(JProcedure1.java:10)
at scala.collection.immutable.List.foreach(List.scala:333)
at dotty.tools.dotc.typer.RefChecks$.checkMemberTypesOK$5$$anonfun$4(RefChecks.scala:781)
at scala.runtime.function.JProcedure1.apply(JProcedure1.java:15)
at scala.runtime.function.JProcedure1.apply(JProcedure1.java:10)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)
at dotty.tools.dotc.util.HashSet$EntryIterator.foreach(HashSet.scala:162)
at dotty.tools.dotc.util.ReadOnlySet.foreach(ReadOnlySet.scala:16)
at dotty.tools.dotc.typer.RefChecks$.checkMemberTypesOK$1(RefChecks.scala:781)
at dotty.tools.dotc.typer.RefChecks$.dotty$tools$dotc$typer$RefChecks$$$checkAllOverrides(RefChecks.scala:801)
at dotty.tools.dotc.typer.RefChecks.transformTemplate(RefChecks.scala:1225)

So I don't think exhaustivity was to blame for the original slowdown (but I noticed a further slowdown on top of the first one, which might well be due to it).

@liufengyun
Copy link
Contributor

liufengyun commented Apr 30, 2021

So I don't think exhaustivity was to blame for the original slowdown (but I noticed a further slowdown on top of the first one, which might well be due to it).

I agree. The exhaustivity check becomes much more expensive after #12153. In the flamegraph, you can search patmat., a large part is purple. In the flamegraph before the change, it's hardly visible.

@odersky
Copy link
Contributor Author

odersky commented Apr 30, 2021

#12300 gives some more data. @liufengyun can you try to dig deeper on this?

@liufengyun
Copy link
Contributor

#12300 gives some more data. @liufengyun can you try to dig deeper on this?

Thanks for the tips, I'll conduct a deeper analysis of the problem.

@liufengyun
Copy link
Contributor

[WIP Update]

Before #12153, the maxConstraint.size is 11. After #12153, the maxConstraint.size is 288:

--- a/compiler/src/dotty/tools/dotc/core/ConstraintRunInfo.scala
+++ b/compiler/src/dotty/tools/dotc/core/ConstraintRunInfo.scala
@@ -8,6 +8,7 @@ trait ConstraintRunInfo { self: Run =>
   private var maxSize = 0
   private var maxConstraint: Constraint = _
   def recordConstraintSize(c: Constraint, size: Int): Unit =
+    if size > 100 then println("Constraint.size = " + size)
     if (size > maxSize) {
       maxSize = size
       maxConstraint = c

And we have 4638553 recorded constraints > 100, most of them are around the size of 288. That also explains the different behavior in searching for OrderingConstraint in flamegraphs:

liufengyun added a commit to dotty-staging/dotty that referenced this issue May 3, 2021
liufengyun added a commit to dotty-staging/dotty that referenced this issue May 3, 2021
@liufengyun
Copy link
Contributor

liufengyun commented May 3, 2021

The root cause of the performance regression is that refchecks results in a large constraint that is propagated to all later phases. A fix is proposed in a7ff067.

contraint.size = 0 before phase typer
contraint.size = 1 after phase typer
contraint.size = 1 before phase inlinedPositions
contraint.size = 1 after phase inlinedPositions
contraint.size = 1 before phase posttyper
contraint.size = 1 after phase posttyper
contraint.size = 1 before phase pickler
contraint.size = 1 after phase pickler
contraint.size = 1 before phase inlining
contraint.size = 1 after phase inlining
contraint.size = 1 before phase postInlining
contraint.size = 1 after phase postInlining
contraint.size = 1 before phase staging
contraint.size = 1 after phase staging
contraint.size = 1 before phase pickleQuotes
contraint.size = 1 after phase pickleQuotes
contraint.size = 1 before phase MegaPhase{firstTransform, checkReentrant, elimPackagePrefixes, cookComments, checkStatic, betaReduce, inlineVals, expandSAMs}
contraint.size = 1 after phase MegaPhase{firstTransform, checkReentrant, elimPackagePrefixes, cookComments, checkStatic, betaReduce, inlineVals, expandSAMs}
contraint.size = 1 before phase MegaPhase{elimRepeated, protectedAccessors, extmethods, uncacheGivenAliases, byNameClosures, hoistSuperArgs, specializeApplyMethods, refchecks}
contraint.size = 287 after phase MegaPhase{elimRepeated, protectedAccessors, extmethods, uncacheGivenAliases, byNameClosures, hoistSuperArgs, specializeApplyMethods, refchecks}
contraint.size = 287 before phase MegaPhase{elimOpaque, tryCatchPatterns, patternMatcher, explicitOuter, explicitSelf, elimByName, stringInterpolatorOpt}
contraint.size = 287 after phase MegaPhase{elimOpaque, tryCatchPatterns, patternMatcher, explicitOuter, explicitSelf, elimByName, stringInterpolatorOpt}
contraint.size = 287 before phase MegaPhase{pruneErasedDefs, uninitializedDefs, inlinePatterns, vcInlineMethods, seqLiterals, intercepted, getters, specializeFunctions, liftTry, collectNullableFields, elimOuterSelect, resolveSuper, functionXXLForwarders, paramForwarding, genericTuples, letOverApply, arrayConstructors}
contraint.size = 287 after phase MegaPhase{pruneErasedDefs, uninitializedDefs, inlinePatterns, vcInlineMethods, seqLiterals, intercepted, getters, specializeFunctions, liftTry, collectNullableFields, elimOuterSelect, resolveSuper, functionXXLForwarders, paramForwarding, genericTuples, letOverApply, arrayConstructors}
contraint.size = 287 before phase erasure
contraint.size = 287 after phase erasure
contraint.size = 287 before phase MegaPhase{elimErasedValueType, pureStats, vcElideAllocations, arrayApply, elimPolyFunction, tailrec, completeJavaEnums, mixin, lazyVals, memoize, nonLocalReturns, capturedVars}
contraint.size = 287 after phase MegaPhase{elimErasedValueType, pureStats, vcElideAllocations, arrayApply, elimPolyFunction, tailrec, completeJavaEnums, mixin, lazyVals, memoize, nonLocalReturns, capturedVars}
contraint.size = 287 before phase constructors
contraint.size = 287 after phase constructors
contraint.size = 287 before phase MegaPhase{lambdaLift, elimStaticThis, countOuterAccesses}
contraint.size = 287 after phase MegaPhase{lambdaLift, elimStaticThis, countOuterAccesses}
contraint.size = 287 before phase MegaPhase{dropOuterAccessors, flatten, renameLifted, transformWildcards, moveStatic, expandPrivate, restoreScopes, selectStatic, collectSuperCalls, repeatableAnnotations}
contraint.size = 287 after phase MegaPhase{dropOuterAccessors, flatten, renameLifted, transformWildcards, moveStatic, expandPrivate, restoreScopes, selectStatic, collectSuperCalls, repeatableAnnotations}
contraint.size = 287 before phase genBCode
contraint.size = 287 after phase genBCode

liufengyun added a commit to dotty-staging/dotty that referenced this issue May 3, 2021
liufengyun added a commit to dotty-staging/dotty that referenced this issue May 3, 2021
To enforce the invariant, we need to ensure that tvars are
instantiated at end of phases.
@liufengyun
Copy link
Contributor

Fixed in #12333.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants