WIP: GADT-aware patmat exhaustivity checking #3454

abgruszecki · 2017-11-10T07:36:44Z

This PR reworks how patmat exhaustivity checking is performed so that exhaustivity warnings take into account GADT type constraints.

Note that this is still work in progress - although the code here works and passes all unit tests, some finishing touches still need to be applied for it to become fully functional. However, since I've been raising a lot of issues discovered while implementing this algorithm, it's high time for me to finally keep track of it here.

As a simple example of code for which better warnings are emitted:

  sealed trait Expr[T]
  case class BExpr(bool: Boolean) extends Expr[Boolean]
  case class IExpr(int: Int) extends Expr[Int]

  def join[T](e1: Expr[T], e2: Expr[T]): Expr[T] = (e1, e2) match {
    case (IExpr(i1), IExpr(i2)) => IExpr(i1 + i2)
    case (BExpr(b1), BExpr(b2)) => BExpr(b1 & b2)
  }

join is exhaustive, since it's not possible to give it "mismatched" arguments. However, current algorithm would emit a warning that there are no patterns for (IExpr(_), BExpr(_)) and vice-versa. With the algorithm in this PR, no warnings are emitted (correctly).

The algorithm itself is closely based on Karachalias' et al. Haskell GADT algorithm [1] - my work was to essentially unify Haskell algorithm with Liu's Dotty algorithm, taking the capability for constraint reasoning from the first and the representation of Scala values from the second.

The core adjustment which enables this algorithm to work is that spaces now keep track of constraints associated with them. For instance, when decomposing Expr[T] into IExpr and BExpr, type equality constraints T ~ Int and T ~ Boolean become associated with their respective spaces. After spaces are subtracted, spaces with contradictory constraints are removed from being considered. For example, (IExpr(_), BExpr(_)) space would have constraints T ~ Int and T ~ Boolean associated with it. As T cannot possibly be equal to both Int and Boolean, the constraints are contradictory and the space is not considered when emitting exhaustivity warnings. My Master's thesis contains a far more in-depth description - if you want to read it, please send me an email or message me on Gitter and I'll send it to you.

Remaining work

The most important remaining work is to reuse Scala's typechecking for constraint verification. All other points are essentially nice-to-haves.

Making constraint verification serious

The constraint verification checker is currently extremely naive. It suffices for simple cases only. Haskell's algorithm reuses its type inference algorithm for constraint checking, and something similar should be implemented here.

From what I gather, it seems that Dotty contains no code which explicitly solves type constraints. Instead, as far as I understand, constraints are kept in Context and can be adjusted by performing comparisons between types. Integrating this with the algorithm is currently the highest priority, as it will enable the constraint verifier to be as powerful as Scala's typechecking.

Note that even with the current naive verifier, a lot of code can be checked. For examples, see gadt-basic.scala, gadt-nontrivial.scala and gadt-nontrivial2.scala tests - I'll be mentioning them in code comments.

Use space intersection for finding useless patterns

Current way of finding useless patterns only locates redundant patterns. With GADTs, it's easy to create a useless, but non-redundant case - for instance, (IExpr(_), BExpr(_)) in join above would be just such a pattern. The code already contains an implementation of using space intersection for finding useless patterns, but there's still a number of issues in surrounding code preventing it from working correctly. These are: #3333 #3455 . There are other issues related to subtyping checks when comparing a type param with any other type, but they should be cleared up once typechecker integration is done.

Find useless subpatterns

With the current code, it shouldn't be too difficult to find not only useless patterns, but useless subpatterns as well. Consider:

def m(p: (Int, Int) = p match {
  case (1, 2) => ;
  case (1|2, 2|3) => ;
}

In the second pattern, (1, 2) is useless and probably a programmer error, so a warning should be emitted.

While this is completely unrelated to GADTs, it's a nice-to-have.

Changes to guard handling

Haskell algorithm essentially considers patterns with guards to never match, unless they can be constant-folded to true. I was actually surprised to find that this wasn't how scalac behaved - guards which can be constant-folded are treated (correctly) as if they always/never match, but other guards are consider to always match for exhaustivity checks and never match for redundancy checks. I mimicked this behaviour (which changed the output of some tests), but I find it surprising, since I believed that scalac behaved like Haskell. Is there any specific reason why scalac behaves like that? If possible, I'd change it to behave like Haskell (and possibly revert to old behaviour when Scala2 compat mode is on). The current behaviour also makes it impossible to run intersection-based useless patern checks in-line with exhaustivity checks (as is the case in Haskell), which means the code would be (or at least, should be) less efficient overall.

Changes to test output

I'll be mentioning changes to test output in code comments, if there's something notable.

[1] - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/gadtpm-acm.pdf

dottybot

Hello, and thank you for opening this PR! 🎉

All contributors have signed the CLA, thank you! ❤️

Have an awesome day! ☀️

abgruszecki · 2017-11-10T07:38:53Z

@liufengyun - this is the source of all the issues I've been raising! I would've kept the work public from the beginning, but as this is my Master's thesis, I wanted to avoid any accusations that this wasn't my individual work.

abgruszecki · 2017-11-10T07:55:31Z

compiler/src/dotty/tools/dotc/transform/patmat/Space.scala

-    }).reduce((a, b) => Or(List(a, b)))
-    val uncovered = simplify(minus(Typ(selTyp, true), patternSpace), aggressive = true)
+      for { (space, pos) <- projectedPats } {
+        if (!UseOldRedundancyCheck) {


The code here finds useless patterns in-line with exhaustivity checks.

abgruszecki · 2017-11-10T07:57:55Z

tests/patmat/exhausting.check

@@ -3,4 +3,4 @@
 32: Pattern Match Exhaustivity: List(_, _: _*)
 39: Pattern Match Exhaustivity: Bar3
 44: Pattern Match Exhaustivity: (Bar2, Bar2)
-53: Pattern Match Exhaustivity: (Bar2, Bar2), (Bar2, Bar1), (Bar1, Bar3), (Bar1, Bar2)


The change here is a straight improvement - here we can see that this pattern matching is done on a GADT and is quite similar to the join example.

abgruszecki · 2017-11-10T08:00:54Z

tests/patmat/gadt-basic.scala

+  case class BExpr(bool: Boolean) extends Expr[Boolean]
+  case class IExpr(int: Int) extends Expr[Int]
+
+  def join[T](e1: Expr[T], e2: Expr[T]): Expr[T] = (e1, e2) match {


This test shows that the motivating example and a slightly more complicated GADT with two parameters are both correctly checked. Currently, GADT checks are limited to invariant parameters, which I hope to fix by integrating Scala's typechecking with constraint verification.

abgruszecki · 2017-11-10T08:01:39Z

tests/patmat/gadt-nontrivial.scala

+    case (IntExpr(i1), IntExpr(i2)) => IntExpr(i1 + i2)
+    case (AddExpr(ei1, ei2), ie) => join(join(ei1, ei2), ie)
+  }
+}


This test shows that correct warnings are emitted based on the type constraints, with a slightly more complicated GADT.

abgruszecki · 2017-11-10T08:03:25Z

tests/patmat/gadt-nontrivial2.scala

+  case object NEmpty extends NVec[Zero, Nothing]
+  case class NCons[N <: Nat, +A](head: A, tail: NVec[N, A]) extends NVec[Succ[N], A]
+
+  def nzip[N <: Nat, A, B](v1: NVec[N, A], v2: NVec[N, B]): NVec[N, (A, B)] =


This test shows that invariant params can be checked if a non-invariant param appears in the same list.

abgruszecki · 2017-11-10T08:04:47Z

tests/patmat/patmatexhaust.check

@@ -2,6 +2,7 @@
 11: Pattern Match Exhaustivity: Bar(_)
 23: Pattern Match Exhaustivity: (Qult(), Qult()), (Kult(_), Kult(_))
 49: Pattern Match Exhaustivity: _: Gp
+53: Pattern Match Exhaustivity: _


This is the resulting of mirroring scalac behaviour more closely - guard on this line is constant-folded to false.

abgruszecki · 2017-11-10T08:07:02Z

tests/patmat/t8511.check

@@ -1 +1 @@
-18: Pattern Match Exhaustivity: Baz(), Bar(_)
+18: Pattern Match Exhaustivity: EatsExhaustiveWarning(_), Baz(), Bar(_)


I'm not too sure why no warning was emitted for EatsExhaustiveWarning here. In the test code, Reference is pattern-matched with a possibly-failing unapply method (it returns Option and not Some), so it seems that a warning should be emitted. @liufengyun - do you recall why the original code behaves as it does?

This is because there's no known children of Reference, thus it's exhaustive.

abgruszecki · 2017-11-10T08:08:25Z

tests/patmat/virtpatmat_reach_sealed_unsealed.scala

@@ -18,4 +18,4 @@ object Test extends App {
  (true: Boolean) match { case true => case false =>  case _ => } // exhaustive, last case is unreachable
  (true: Boolean) match { case true => case false =>  case _: Boolean => } // exhaustive, last case is unreachable
  (true: Boolean) match { case true => case false =>  case _: Any => } // exhaustive, last case is unreachable


Note - last pattern here is unreachable, but not redundant, so no warning is emitted for it in the current code. With intersection-based useless pattern checking, a warning is correctly emitted here.

abgruszecki · 2017-11-10T08:18:57Z

@smarter - I'm not sure if you're the right person to ask, but I've been unable to successfully run the entire test suite locally and I have no idea why it fails. Even on master after a fresh checkout and submodule init, test fails, and differently each time - sometimes the compiler crashes during the tests, sometimes it never finishes and on some occasions, it manages to run out of open file handles. I'd try to fix the test error here myself, but with the tests failing randomly it's going to be a tad difficult. Could turning concurrency off during the tests fix this? Is there a simple way to do that?

(please note that patmat unit tests actually succeed! 😛 )

liufengyun · 2017-11-10T08:24:14Z

@AleksanderBG Congrats, this is a very nice piece of work!

abgruszecki · 2017-11-10T08:30:43Z

@liufengyun - I'd be very grateful if you could take a look at how I want to approach integrating Scala's typechecking w/ constraint verification and see if there isn't something obvious to you that I'm missing. In particular, I'll be reusing SpaceEngine#instantiate and I'm not exactly sure if this is the correct way to do it:

Each space will contain a Context specific to it
All params in types will be replaced with TypeVars with the code currently in SpaceEngine#instantiate (specifically, with typeParamMap)
When decomposing a type, fresh Contexts will be forked off from parent space Context, and in each space subtyping will be recorded. If subtyping fails, this space will be discarded
When comparing for subtypes, the Context of the LHS of current operation will be used, and all constraints on variables of the RHS type from the RHS type environment will be entered into the LHS environment (I'm not too sure if this is exactly what should happen, but based on the Haskell algorithm it seems as though this is going to be precisely enough)

I'm not too sure when exactly thisTypeMap is needed in instantiate - @liufengyun, do you think things will work if I use only typeParamMap? If thisTypeMap is not enough in instantiate, typeParamMap is used instead, which makes me think I won't really need the first one (since I always want/need to replace type parameters).

abgruszecki · 2017-11-10T08:31:54Z

Also, thank you @liufengyun! 😄

liufengyun · 2017-11-10T08:49:38Z

@AleksanderBG I'll go though the PR thoroughly in these days. BTW, to help review, if the file Space.scala is changed too much, it would also be good to create a new file like Constraint.scala for the new algorithm and change the old to Space.scala.old.

Here are some high-level comments about Dotty constraint solver. I see two places where you'll need that:

child instantiation: this is currently what we use, hopefully it's possible to reuse it without change.
constraint solving: this is a new requirement in your new algorithm

Note that both thisTypeMap and typeParamMap are specific to the child instantiation problem. thisTypeMap is intended to infer the path-dependent prefix of child, which is of no use for the second task.

The general shape of using the Dotty inference infrastructure is quite simple: you create some TVars in the type, then do a subtype T1 <:< T2 which will generate constraints in the context, then you can call isFullyDefined or other methods in Inferencing.scala to check if the constraint can be satisfied.

BTW, according to your report in #3333, we need one further refinement to child instantiation: replace unconstrained TVars with Wildcard types.
dotty-staging@9166db3

abgruszecki · 2017-11-14T07:51:38Z

Rebased on current master & removed some dead and/or misleading code which shouldn't have been here. General tests still fail at pickling tests, but now with the same puzzling exception that CI previously got - I'll see what I can do about that.

Most importantly, I've uploaded a working POC of typechecker integration here. It passes all tests, but I'm pretty sure I've gotten some things wrong with it and it needs more work. What's more, since "existential" types as described here aren't currently supported, the gain from finishing the POC is limited. Despite my hopes, this approach doesn't automagically solve #3333. Since now type variables can be kept outside of instantiate, I had hoped that it would be enough to simply not instantiate N in FiniteNumberImpl[N] - however, any simple approach to picking which variables to not instatiate (including the approach here dotty-staging/dotty@9166db3) left too many params uninstantiated, including B in ::[B]. :: is suprisingly difficult to handle correctly, since it's invariant as opposed to List being covariant. This produces an awkward situation where ::[Animal] and ::[Dog] are both subtypes of List[Animal], but neither is a subtype of its counterpart - this in particular interferes with subtracting two product spaces.

I'll be now focusing now on splitting the initial commit into smaller pieces. It should also make it easier to compare what is actually improved by using the typechecker. I should be able to get something done for Thursday. @liufengyun - I should have mentioned this when I sent my thesis to you, but: you'll likely gain the most insight into how the algorithm works from Chapter 3, particularly the example subtraction. Section 3.1 can be safely skipped. The weirdest part of the algorithm is the way it recurses when subtracting two constructor spaces - the algorithm makes much more sense if one imagines it's possible to pattern match on multiple values, like in Haskell. Chapter 2 may be useful, but at the same time it's aimed at someone with little knowledge about pattern matching.

abgruszecki · 2017-11-16T11:00:33Z

@liufengyun - I've reworked git history. It turned out that to remove some code which I now know is wrong I had to base on #3475.

dc19e6a contains both space operation implementations side-by-side. The new implementations are similar in some regards to the old ones, but, again, they exhibit the "weird" recursion I've talked about earlier. It might make sense to skim Haskell algorithm paper as well - in particular, the informal explanation to how the algorithm functions is excellent.

Other notable change is that debug messages are now indented - since the operations now recurse much more often, debug output was unreadable to me without this. IDK if the code should stay like this, since it duplicates Showable-based indenting present somewhere else in the codebase, but I believe that some indenting is absolutely necessary.

Test should still fail because of #3479 - hopefully it'll get fixed sooner rather than later.

liufengyun · 2017-11-16T11:24:25Z

Well done @AleksanderBG !

When you get the CI passed, we can compare your algorithm with another approach (implemented in OCaml, briefly mentioned in the space paper): just exclude counterexamples that are not satisfiable.

The 2nd approach passed all GADT tests, and it has better performance by only enabling the checking for invariant GADTs. It's just a few lines of code, no need for any changes to the core algorithm.

But we do appreciate much your work, and want to evaluate the two approaches when they are done. And it's possible we keep both algorithms so that it's possible to switch between them in the evolution.

abgruszecki · 2017-11-16T12:27:46Z

@liufengyun - picking OCaml approach over Haskell is absolutely understandable - part of the reason why I didn't base my work on OCaml approach is that it seemed like it would make for a very short thesis :)

However, I should stress that the implementation in this PR is severely wing-clipped by how absurdly naive the constraint checker is (7c79edb). The TVar-based PoC (AleksanderBG/dotty@19e719e0) is the serious implementation. The best comparison could be made after the approach described here by @smarter is in-place.

Adapt tests to new output: - exhausting: reflect improved GADT handling - patmatexhaust: result of mirroring `scalac` guard handling - t8511: this seems like the correct output for this test All other changes are merely equivalent, but different, warnings.

abgruszecki · 2017-11-22T21:41:47Z

Rebased on master and applied a workaround for #3479.

I've hit a wall with the TVar constraint verifier PoC (AleksanderBG/dotty@2ab594dd47bd8e72d1982a2b2f82983d44f9165a): for gadt-nontrivial2 test case, a warning that (NCons(_, NCons(_, _)), _) is unmatched is emitted. This happens, since the inferred type for the NCons pattern is NCons[Nat, A]. This makes the exhaustivity checker think NCons(_, NCons(_, _)) cannot possibly match this pattern. Unfortunately I cannot see a clean workaround for this, so the output for that case will be incorrect until something is done about the existential types inferred in patterns.

I've hit other difficulties with the TVar approach (some of which can be seen at gitter), so I believe now would be a good time to sum up my view on the OCaml approach before I go off to tinker in a corner for a while:

If it's possible to make the constraint solver implementation serious, the approach in this PR would be clearly superior to the OCaml one in terms of functionality - there would be no restrictions on GADT param variance, and users would be warned about useless GADT patterns. As in the initial message, useless subpattern warnings as described by Maranget would also be natural with this approach. All patterns need to be normalized to only have or-patterns at the top anyway, so all that's left to do is actually detect if any variants are useless and emit an appropriate warning.

I would not worry too much at this point about performance. In the proper implementation, intersect will be used to find useless patterns in-line with exhaustivity checks. This will make it possible to drop separate redundancy checks, which currently subtract same patterns over and over, which would offset additional complexity of space operations. While the PoC spaces carry entire Contexts around, with some code adjustments it should be possible to carry only Constraints, which would reduce additional memory requirements. Finally, if it turns out that too many constraints cause a slowdown, it is sound to simply skip adding them past some limit. In fact, this was necessary in the GHC implementation. Overall the performance would quite likely still take a small hit, but I believe the additional functionality would be more than worth it.

However, if the TVar approach doesn't work out, then the only advantage this PR has over the OCaml approach is ease of implementing useless subpattern warnings, which is quite a bad tradeoff when compared to the amount of additional code required.

As my time off from my job has ended last week, I will now have much less time to spend on this PR. I will continue working on it, although I certainly can't make any promises about finishing it on short notice.

liufengyun · 2017-11-28T13:15:25Z

Thanks for your comment @AleksanderBG , I somehow missed it.

the approach in this PR would be clearly superior to the OCaml one in terms of functionality - there would be no restrictions on GADT param variance, and users would be warned about useless GADT patterns

The PR #3510 only checks invariant type param for optimisation, because for co-variant or contra-variant types, the existing algorithm works well without change due to union types.

As GADTs are usually co-variant or contra-variant in Scala, it seems appropriate to handle invariant GADTs using a simple algorithm.

In terms of maintainability, I think #3510 is better than the Haskell algorithm: the core algorithm doesn't need any change, while just a few lines to handle invariant GADTs.

Thanks again for your work on exhaustivity check.

abgruszecki · 2017-11-29T09:34:16Z

@liufengyun - I definitely agree that the Haskell approach is the more complex one. I still feel that even if all it has over the OCaml approach are the additional types of warnings, it may be worth it.

I'll update the PR here as soon as I have the constraint checker in reasonable order and a clear comparison can be made, both of performance and of what warnings are emitted where. As I said though, I'm limited by being employed full-time, so it might take me a while.

abgruszecki · 2017-11-29T11:38:56Z

@liufengyun - I think I now understand why I was wrong regarding the variance "limitations". I thought that for code like this:

  sealed trait Expr[+T]
  case class IExpr(x: Int) extends Expr[Int]
  case class BExpr(b: Boolean) extends Expr[Boolean]

  def foo[T](x: Expr[T], y: Expr[T]) = (x, y) match {
    case (IExpr(_), IExpr(_)) => true
    case (BExpr(_), BExpr(_)) => false
 }

no warning should be emitted. However, that's only the case for Scala2, since in Dotty the T can be Int | Boolean instead of something explicitly specified in Expr's child. This means that no additional constraint checks are needed here, and warnings about (BExpr(_), IExpr(_)), (IExpr(_), BExpr(_)) should be emitted, as is the case in OCaml approach.

liufengyun · 2017-11-29T12:30:14Z

Yes, that's exactly what I meant. However, I need to update #3510 to only ignore co-variant, as we still need the check for contra-variant cases -- or just disable the optimisation.

liufengyun · 2018-01-19T16:18:36Z

@AleksanderBG Thanks again for your contribution. If you have something new, please open a new PR.

dottybot reviewed Nov 10, 2017

View reviewed changes

abgruszecki commented Nov 10, 2017

View reviewed changes

abgruszecki mentioned this pull request Nov 10, 2017

Patmat exhaustivity checks sometimes don't consider path-dependant children #3455

Closed

abgruszecki mentioned this pull request Nov 14, 2017

Unnecessary exhaustivity warning when combining refinement types w/ params #3469

Closed

liufengyun mentioned this pull request Nov 20, 2017

A simple way to check exhaustivity of GADTs #3510

Merged

abgruszecki mentioned this pull request Nov 20, 2017

dotc -Ytest-pickler fails on cyclic reference in valid Scala 2.12 code #3479

Closed

Aleksander Boruch-Gruszecki added 6 commits November 22, 2017 20:58

Add new space definitions and operations

187dfd6

Add NaiveConstraintChecker

0c00c38

SpaceEngine: implement debug functions

54c21b7

Rename Space.scala to SpaceEngine.scala

9a3e016

Use new code in SpaceEngine

6260c62

Adapt tests to new output: - exhausting: reflect improved GADT handling - patmatexhaust: result of mirroring `scalac` guard handling - t8511: this seems like the correct output for this test All other changes are merely equivalent, but different, warnings.

Workaround for #3479 #1932

94103c8

abgruszecki mentioned this pull request Nov 28, 2017

Prune impossible types from exhaustivity checker #3574

Merged

OlivierBlanvillain assigned liufengyun Jan 18, 2018

liufengyun closed this Jan 19, 2018

		@@ -1 +1 @@
		18: Pattern Match Exhaustivity: Baz(), Bar(_)
		18: Pattern Match Exhaustivity: EatsExhaustiveWarning(_), Baz(), Bar(_)

WIP: GADT-aware patmat exhaustivity checking #3454

WIP: GADT-aware patmat exhaustivity checking #3454

Uh oh!

Conversation

abgruszecki commented Nov 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Remaining work

Making constraint verification serious

Use space intersection for finding useless patterns

Find useless subpatterns

Changes to guard handling

Changes to test output

Uh oh!

dottybot left a comment

Choose a reason for hiding this comment

Uh oh!

abgruszecki commented Nov 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abgruszecki commented Nov 10, 2017

Uh oh!

liufengyun commented Nov 10, 2017

Uh oh!

abgruszecki commented Nov 10, 2017

Uh oh!

abgruszecki commented Nov 10, 2017

Uh oh!

liufengyun commented Nov 10, 2017

Uh oh!

abgruszecki commented Nov 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abgruszecki commented Nov 16, 2017

Uh oh!

liufengyun commented Nov 16, 2017

Uh oh!

abgruszecki commented Nov 16, 2017

Uh oh!

abgruszecki commented Nov 22, 2017

Uh oh!

liufengyun commented Nov 28, 2017

Uh oh!

abgruszecki commented Nov 29, 2017

Uh oh!

abgruszecki commented Nov 29, 2017

Uh oh!

liufengyun commented Nov 29, 2017

Uh oh!

liufengyun commented Jan 19, 2018

Uh oh!

Uh oh!

abgruszecki commented Nov 10, 2017 •

edited

Loading

abgruszecki commented Nov 14, 2017 •

edited

Loading