Skip to content

WIP: GADT-aware patmat exhaustivity checking #3454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

WIP: GADT-aware patmat exhaustivity checking #3454

wants to merge 6 commits into from

Conversation

abgruszecki
Copy link
Contributor

@abgruszecki abgruszecki commented Nov 10, 2017

This PR reworks how patmat exhaustivity checking is performed so that exhaustivity warnings take into account GADT type constraints.

Note that this is still work in progress - although the code here works and passes all unit tests, some finishing touches still need to be applied for it to become fully functional. However, since I've been raising a lot of issues discovered while implementing this algorithm, it's high time for me to finally keep track of it here.

As a simple example of code for which better warnings are emitted:

  sealed trait Expr[T]
  case class BExpr(bool: Boolean) extends Expr[Boolean]
  case class IExpr(int: Int) extends Expr[Int]

  def join[T](e1: Expr[T], e2: Expr[T]): Expr[T] = (e1, e2) match {
    case (IExpr(i1), IExpr(i2)) => IExpr(i1 + i2)
    case (BExpr(b1), BExpr(b2)) => BExpr(b1 & b2)
  }

join is exhaustive, since it's not possible to give it "mismatched" arguments. However, current algorithm would emit a warning that there are no patterns for (IExpr(_), BExpr(_)) and vice-versa. With the algorithm in this PR, no warnings are emitted (correctly).

The algorithm itself is closely based on Karachalias' et al. Haskell GADT algorithm [1] - my work was to essentially unify Haskell algorithm with Liu's Dotty algorithm, taking the capability for constraint reasoning from the first and the representation of Scala values from the second.

The core adjustment which enables this algorithm to work is that spaces now keep track of constraints associated with them. For instance, when decomposing Expr[T] into IExpr and BExpr, type equality constraints T ~ Int and T ~ Boolean become associated with their respective spaces. After spaces are subtracted, spaces with contradictory constraints are removed from being considered. For example, (IExpr(_), BExpr(_)) space would have constraints T ~ Int and T ~ Boolean associated with it. As T cannot possibly be equal to both Int and Boolean, the constraints are contradictory and the space is not considered when emitting exhaustivity warnings. My Master's thesis contains a far more in-depth description - if you want to read it, please send me an email or message me on Gitter and I'll send it to you.

Remaining work

The most important remaining work is to reuse Scala's typechecking for constraint verification. All other points are essentially nice-to-haves.

Making constraint verification serious

The constraint verification checker is currently extremely naive. It suffices for simple cases only. Haskell's algorithm reuses its type inference algorithm for constraint checking, and something similar should be implemented here.

From what I gather, it seems that Dotty contains no code which explicitly solves type constraints. Instead, as far as I understand, constraints are kept in Context and can be adjusted by performing comparisons between types. Integrating this with the algorithm is currently the highest priority, as it will enable the constraint verifier to be as powerful as Scala's typechecking.

Note that even with the current naive verifier, a lot of code can be checked. For examples, see gadt-basic.scala, gadt-nontrivial.scala and gadt-nontrivial2.scala tests - I'll be mentioning them in code comments.

Use space intersection for finding useless patterns

Current way of finding useless patterns only locates redundant patterns. With GADTs, it's easy to create a useless, but non-redundant case - for instance, (IExpr(_), BExpr(_)) in join above would be just such a pattern. The code already contains an implementation of using space intersection for finding useless patterns, but there's still a number of issues in surrounding code preventing it from working correctly. These are: #3333 #3455 . There are other issues related to subtyping checks when comparing a type param with any other type, but they should be cleared up once typechecker integration is done.

Find useless subpatterns

With the current code, it shouldn't be too difficult to find not only useless patterns, but useless subpatterns as well. Consider:

def m(p: (Int, Int) = p match {
  case (1, 2) => ;
  case (1|2, 2|3) => ;
}

In the second pattern, (1, 2) is useless and probably a programmer error, so a warning should be emitted.

While this is completely unrelated to GADTs, it's a nice-to-have.

Changes to guard handling

Haskell algorithm essentially considers patterns with guards to never match, unless they can be constant-folded to true. I was actually surprised to find that this wasn't how scalac behaved - guards which can be constant-folded are treated (correctly) as if they always/never match, but other guards are consider to always match for exhaustivity checks and never match for redundancy checks. I mimicked this behaviour (which changed the output of some tests), but I find it surprising, since I believed that scalac behaved like Haskell. Is there any specific reason why scalac behaves like that? If possible, I'd change it to behave like Haskell (and possibly revert to old behaviour when Scala2 compat mode is on). The current behaviour also makes it impossible to run intersection-based useless patern checks in-line with exhaustivity checks (as is the case in Haskell), which means the code would be (or at least, should be) less efficient overall.

Changes to test output

I'll be mentioning changes to test output in code comments, if there's something notable.

[1] - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/gadtpm-acm.pdf

Copy link
Member

@dottybot dottybot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, and thank you for opening this PR! 🎉

All contributors have signed the CLA, thank you! ❤️

Have an awesome day! ☀️

@abgruszecki
Copy link
Contributor Author

@liufengyun - this is the source of all the issues I've been raising! I would've kept the work public from the beginning, but as this is my Master's thesis, I wanted to avoid any accusations that this wasn't my individual work.

}).reduce((a, b) => Or(List(a, b)))
val uncovered = simplify(minus(Typ(selTyp, true), patternSpace), aggressive = true)
for { (space, pos) <- projectedPats } {
if (!UseOldRedundancyCheck) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here finds useless patterns in-line with exhaustivity checks.

@@ -3,4 +3,4 @@
32: Pattern Match Exhaustivity: List(_, _: _*)
39: Pattern Match Exhaustivity: Bar3
44: Pattern Match Exhaustivity: (Bar2, Bar2)
53: Pattern Match Exhaustivity: (Bar2, Bar2), (Bar2, Bar1), (Bar1, Bar3), (Bar1, Bar2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here is a straight improvement - here we can see that this pattern matching is done on a GADT and is quite similar to the join example.

case class BExpr(bool: Boolean) extends Expr[Boolean]
case class IExpr(int: Int) extends Expr[Int]

def join[T](e1: Expr[T], e2: Expr[T]): Expr[T] = (e1, e2) match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test shows that the motivating example and a slightly more complicated GADT with two parameters are both correctly checked. Currently, GADT checks are limited to invariant parameters, which I hope to fix by integrating Scala's typechecking with constraint verification.

case (IntExpr(i1), IntExpr(i2)) => IntExpr(i1 + i2)
case (AddExpr(ei1, ei2), ie) => join(join(ei1, ei2), ie)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test shows that correct warnings are emitted based on the type constraints, with a slightly more complicated GADT.

case object NEmpty extends NVec[Zero, Nothing]
case class NCons[N <: Nat, +A](head: A, tail: NVec[N, A]) extends NVec[Succ[N], A]

def nzip[N <: Nat, A, B](v1: NVec[N, A], v2: NVec[N, B]): NVec[N, (A, B)] =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test shows that invariant params can be checked if a non-invariant param appears in the same list.

@@ -2,6 +2,7 @@
11: Pattern Match Exhaustivity: Bar(_)
23: Pattern Match Exhaustivity: (Qult(), Qult()), (Kult(_), Kult(_))
49: Pattern Match Exhaustivity: _: Gp
53: Pattern Match Exhaustivity: _
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the resulting of mirroring scalac behaviour more closely - guard on this line is constant-folded to false.

@@ -1 +1 @@
18: Pattern Match Exhaustivity: Baz(), Bar(_)
18: Pattern Match Exhaustivity: EatsExhaustiveWarning(_), Baz(), Bar(_)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure why no warning was emitted for EatsExhaustiveWarning here. In the test code, Reference is pattern-matched with a possibly-failing unapply method (it returns Option and not Some), so it seems that a warning should be emitted. @liufengyun - do you recall why the original code behaves as it does?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because there's no known children of Reference, thus it's exhaustive.

@@ -18,4 +18,4 @@ object Test extends App {
(true: Boolean) match { case true => case false => case _ => } // exhaustive, last case is unreachable
(true: Boolean) match { case true => case false => case _: Boolean => } // exhaustive, last case is unreachable
(true: Boolean) match { case true => case false => case _: Any => } // exhaustive, last case is unreachable
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - last pattern here is unreachable, but not redundant, so no warning is emitted for it in the current code. With intersection-based useless pattern checking, a warning is correctly emitted here.

@abgruszecki
Copy link
Contributor Author

@smarter - I'm not sure if you're the right person to ask, but I've been unable to successfully run the entire test suite locally and I have no idea why it fails. Even on master after a fresh checkout and submodule init, test fails, and differently each time - sometimes the compiler crashes during the tests, sometimes it never finishes and on some occasions, it manages to run out of open file handles. I'd try to fix the test error here myself, but with the tests failing randomly it's going to be a tad difficult. Could turning concurrency off during the tests fix this? Is there a simple way to do that?

(please note that patmat unit tests actually succeed! 😛 )

@liufengyun
Copy link
Contributor

@AleksanderBG Congrats, this is a very nice piece of work!

@abgruszecki
Copy link
Contributor Author

@liufengyun - I'd be very grateful if you could take a look at how I want to approach integrating Scala's typechecking w/ constraint verification and see if there isn't something obvious to you that I'm missing. In particular, I'll be reusing SpaceEngine#instantiate and I'm not exactly sure if this is the correct way to do it:

  1. Each space will contain a Context specific to it
  2. All params in types will be replaced with TypeVars with the code currently in SpaceEngine#instantiate (specifically, with typeParamMap)
  3. When decomposing a type, fresh Contexts will be forked off from parent space Context, and in each space subtyping will be recorded. If subtyping fails, this space will be discarded
  4. When comparing for subtypes, the Context of the LHS of current operation will be used, and all constraints on variables of the RHS type from the RHS type environment will be entered into the LHS environment (I'm not too sure if this is exactly what should happen, but based on the Haskell algorithm it seems as though this is going to be precisely enough)

I'm not too sure when exactly thisTypeMap is needed in instantiate - @liufengyun, do you think things will work if I use only typeParamMap? If thisTypeMap is not enough in instantiate, typeParamMap is used instead, which makes me think I won't really need the first one (since I always want/need to replace type parameters).

@abgruszecki
Copy link
Contributor Author

Also, thank you @liufengyun! 😄

@liufengyun
Copy link
Contributor

@AleksanderBG I'll go though the PR thoroughly in these days. BTW, to help review, if the file Space.scala is changed too much, it would also be good to create a new file like Constraint.scala for the new algorithm and change the old to Space.scala.old.

Here are some high-level comments about Dotty constraint solver. I see two places where you'll need that:

  • child instantiation: this is currently what we use, hopefully it's possible to reuse it without change.
  • constraint solving: this is a new requirement in your new algorithm

Note that both thisTypeMap and typeParamMap are specific to the child instantiation problem. thisTypeMap is intended to infer the path-dependent prefix of child, which is of no use for the second task.

The general shape of using the Dotty inference infrastructure is quite simple: you create some TVars in the type, then do a subtype T1 <:< T2 which will generate constraints in the context, then you can call isFullyDefined or other methods in Inferencing.scala to check if the constraint can be satisfied.

BTW, according to your report in #3333, we need one further refinement to child instantiation: replace unconstrained TVars with Wildcard types.
dotty-staging@9166db3

@abgruszecki
Copy link
Contributor Author

abgruszecki commented Nov 14, 2017

Rebased on current master & removed some dead and/or misleading code which shouldn't have been here. General tests still fail at pickling tests, but now with the same puzzling exception that CI previously got - I'll see what I can do about that.

Most importantly, I've uploaded a working POC of typechecker integration here. It passes all tests, but I'm pretty sure I've gotten some things wrong with it and it needs more work. What's more, since "existential" types as described here aren't currently supported, the gain from finishing the POC is limited. Despite my hopes, this approach doesn't automagically solve #3333. Since now type variables can be kept outside of instantiate, I had hoped that it would be enough to simply not instantiate N in FiniteNumberImpl[N] - however, any simple approach to picking which variables to not instatiate (including the approach here dotty-staging/dotty@9166db3) left too many params uninstantiated, including B in ::[B]. :: is suprisingly difficult to handle correctly, since it's invariant as opposed to List being covariant. This produces an awkward situation where ::[Animal] and ::[Dog] are both subtypes of List[Animal], but neither is a subtype of its counterpart - this in particular interferes with subtracting two product spaces.

I'll be now focusing now on splitting the initial commit into smaller pieces. It should also make it easier to compare what is actually improved by using the typechecker. I should be able to get something done for Thursday. @liufengyun - I should have mentioned this when I sent my thesis to you, but: you'll likely gain the most insight into how the algorithm works from Chapter 3, particularly the example subtraction. Section 3.1 can be safely skipped. The weirdest part of the algorithm is the way it recurses when subtracting two constructor spaces - the algorithm makes much more sense if one imagines it's possible to pattern match on multiple values, like in Haskell. Chapter 2 may be useful, but at the same time it's aimed at someone with little knowledge about pattern matching.

@abgruszecki
Copy link
Contributor Author

@liufengyun - I've reworked git history. It turned out that to remove some code which I now know is wrong I had to base on #3475.

dc19e6a contains both space operation implementations side-by-side. The new implementations are similar in some regards to the old ones, but, again, they exhibit the "weird" recursion I've talked about earlier. It might make sense to skim Haskell algorithm paper as well - in particular, the informal explanation to how the algorithm functions is excellent.

Other notable change is that debug messages are now indented - since the operations now recurse much more often, debug output was unreadable to me without this. IDK if the code should stay like this, since it duplicates Showable-based indenting present somewhere else in the codebase, but I believe that some indenting is absolutely necessary.

Test should still fail because of #3479 - hopefully it'll get fixed sooner rather than later.

@liufengyun
Copy link
Contributor

Well done @AleksanderBG !

When you get the CI passed, we can compare your algorithm with another approach (implemented in OCaml, briefly mentioned in the space paper): just exclude counterexamples that are not satisfiable.

The 2nd approach passed all GADT tests, and it has better performance by only enabling the checking for invariant GADTs. It's just a few lines of code, no need for any changes to the core algorithm.

But we do appreciate much your work, and want to evaluate the two approaches when they are done. And it's possible we keep both algorithms so that it's possible to switch between them in the evolution.

@abgruszecki
Copy link
Contributor Author

@liufengyun - picking OCaml approach over Haskell is absolutely understandable - part of the reason why I didn't base my work on OCaml approach is that it seemed like it would make for a very short thesis :)

However, I should stress that the implementation in this PR is severely wing-clipped by how absurdly naive the constraint checker is (7c79edb). The TVar-based PoC (AleksanderBG/dotty@19e719e0) is the serious implementation. The best comparison could be made after the approach described here by @smarter is in-place.

Aleksander Boruch-Gruszecki added 6 commits November 22, 2017 20:58
Adapt tests to new output:
- exhausting: reflect improved GADT handling
- patmatexhaust: result of mirroring `scalac` guard handling
- t8511: this seems like the correct output for this test

All other changes are merely equivalent, but different, warnings.
@abgruszecki
Copy link
Contributor Author

Rebased on master and applied a workaround for #3479.

I've hit a wall with the TVar constraint verifier PoC (AleksanderBG/dotty@2ab594dd47bd8e72d1982a2b2f82983d44f9165a): for gadt-nontrivial2 test case, a warning that (NCons(_, NCons(_, _)), _) is unmatched is emitted. This happens, since the inferred type for the NCons pattern is NCons[Nat, A]. This makes the exhaustivity checker think NCons(_, NCons(_, _)) cannot possibly match this pattern. Unfortunately I cannot see a clean workaround for this, so the output for that case will be incorrect until something is done about the existential types inferred in patterns.

I've hit other difficulties with the TVar approach (some of which can be seen at gitter), so I believe now would be a good time to sum up my view on the OCaml approach before I go off to tinker in a corner for a while:

If it's possible to make the constraint solver implementation serious, the approach in this PR would be clearly superior to the OCaml one in terms of functionality - there would be no restrictions on GADT param variance, and users would be warned about useless GADT patterns. As in the initial message, useless subpattern warnings as described by Maranget would also be natural with this approach. All patterns need to be normalized to only have or-patterns at the top anyway, so all that's left to do is actually detect if any variants are useless and emit an appropriate warning.

I would not worry too much at this point about performance. In the proper implementation, intersect will be used to find useless patterns in-line with exhaustivity checks. This will make it possible to drop separate redundancy checks, which currently subtract same patterns over and over, which would offset additional complexity of space operations. While the PoC spaces carry entire Contexts around, with some code adjustments it should be possible to carry only Constraints, which would reduce additional memory requirements. Finally, if it turns out that too many constraints cause a slowdown, it is sound to simply skip adding them past some limit. In fact, this was necessary in the GHC implementation. Overall the performance would quite likely still take a small hit, but I believe the additional functionality would be more than worth it.

However, if the TVar approach doesn't work out, then the only advantage this PR has over the OCaml approach is ease of implementing useless subpattern warnings, which is quite a bad tradeoff when compared to the amount of additional code required.

As my time off from my job has ended last week, I will now have much less time to spend on this PR. I will continue working on it, although I certainly can't make any promises about finishing it on short notice.

@liufengyun
Copy link
Contributor

Thanks for your comment @AleksanderBG , I somehow missed it.

the approach in this PR would be clearly superior to the OCaml one in terms of functionality - there would be no restrictions on GADT param variance, and users would be warned about useless GADT patterns

The PR #3510 only checks invariant type param for optimisation, because for co-variant or contra-variant types, the existing algorithm works well without change due to union types.

As GADTs are usually co-variant or contra-variant in Scala, it seems appropriate to handle invariant GADTs using a simple algorithm.

In terms of maintainability, I think #3510 is better than the Haskell algorithm: the core algorithm doesn't need any change, while just a few lines to handle invariant GADTs.

Thanks again for your work on exhaustivity check.

@abgruszecki
Copy link
Contributor Author

@liufengyun - I definitely agree that the Haskell approach is the more complex one. I still feel that even if all it has over the OCaml approach are the additional types of warnings, it may be worth it.

I'll update the PR here as soon as I have the constraint checker in reasonable order and a clear comparison can be made, both of performance and of what warnings are emitted where. As I said though, I'm limited by being employed full-time, so it might take me a while.

@abgruszecki
Copy link
Contributor Author

@liufengyun - I think I now understand why I was wrong regarding the variance "limitations". I thought that for code like this:

  sealed trait Expr[+T]
  case class IExpr(x: Int) extends Expr[Int]
  case class BExpr(b: Boolean) extends Expr[Boolean]

  def foo[T](x: Expr[T], y: Expr[T]) = (x, y) match {
    case (IExpr(_), IExpr(_)) => true
    case (BExpr(_), BExpr(_)) => false
 }

no warning should be emitted. However, that's only the case for Scala2, since in Dotty the T can be Int | Boolean instead of something explicitly specified in Expr's child. This means that no additional constraint checks are needed here, and warnings about (BExpr(_), IExpr(_)), (IExpr(_), BExpr(_)) should be emitted, as is the case in OCaml approach.

@liufengyun
Copy link
Contributor

Yes, that's exactly what I meant. However, I need to update #3510 to only ignore co-variant, as we still need the check for contra-variant cases -- or just disable the optimisation.

@liufengyun
Copy link
Contributor

@AleksanderBG Thanks again for your contribution. If you have something new, please open a new PR.

@liufengyun liufengyun closed this Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants