-
Notifications
You must be signed in to change notification settings - Fork 1.1k
WIP: GADT-aware patmat exhaustivity checking #3454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: GADT-aware patmat exhaustivity checking #3454
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, and thank you for opening this PR! 🎉
All contributors have signed the CLA, thank you! ❤️
Have an awesome day! ☀️
@liufengyun - this is the source of all the issues I've been raising! I would've kept the work public from the beginning, but as this is my Master's thesis, I wanted to avoid any accusations that this wasn't my individual work. |
}).reduce((a, b) => Or(List(a, b))) | ||
val uncovered = simplify(minus(Typ(selTyp, true), patternSpace), aggressive = true) | ||
for { (space, pos) <- projectedPats } { | ||
if (!UseOldRedundancyCheck) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here finds useless patterns in-line with exhaustivity checks.
@@ -3,4 +3,4 @@ | |||
32: Pattern Match Exhaustivity: List(_, _: _*) | |||
39: Pattern Match Exhaustivity: Bar3 | |||
44: Pattern Match Exhaustivity: (Bar2, Bar2) | |||
53: Pattern Match Exhaustivity: (Bar2, Bar2), (Bar2, Bar1), (Bar1, Bar3), (Bar1, Bar2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here is a straight improvement - here we can see that this pattern matching is done on a GADT and is quite similar to the join
example.
case class BExpr(bool: Boolean) extends Expr[Boolean] | ||
case class IExpr(int: Int) extends Expr[Int] | ||
|
||
def join[T](e1: Expr[T], e2: Expr[T]): Expr[T] = (e1, e2) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test shows that the motivating example and a slightly more complicated GADT with two parameters are both correctly checked. Currently, GADT checks are limited to invariant parameters, which I hope to fix by integrating Scala's typechecking with constraint verification.
case (IntExpr(i1), IntExpr(i2)) => IntExpr(i1 + i2) | ||
case (AddExpr(ei1, ei2), ie) => join(join(ei1, ei2), ie) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test shows that correct warnings are emitted based on the type constraints, with a slightly more complicated GADT.
tests/patmat/gadt-nontrivial2.scala
Outdated
case object NEmpty extends NVec[Zero, Nothing] | ||
case class NCons[N <: Nat, +A](head: A, tail: NVec[N, A]) extends NVec[Succ[N], A] | ||
|
||
def nzip[N <: Nat, A, B](v1: NVec[N, A], v2: NVec[N, B]): NVec[N, (A, B)] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test shows that invariant params can be checked if a non-invariant param appears in the same list.
@@ -2,6 +2,7 @@ | |||
11: Pattern Match Exhaustivity: Bar(_) | |||
23: Pattern Match Exhaustivity: (Qult(), Qult()), (Kult(_), Kult(_)) | |||
49: Pattern Match Exhaustivity: _: Gp | |||
53: Pattern Match Exhaustivity: _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the resulting of mirroring scalac
behaviour more closely - guard on this line is constant-folded to false
.
@@ -1 +1 @@ | |||
18: Pattern Match Exhaustivity: Baz(), Bar(_) | |||
18: Pattern Match Exhaustivity: EatsExhaustiveWarning(_), Baz(), Bar(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure why no warning was emitted for EatsExhaustiveWarning
here. In the test code, Reference
is pattern-matched with a possibly-failing unapply
method (it returns Option
and not Some
), so it seems that a warning should be emitted. @liufengyun - do you recall why the original code behaves as it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because there's no known children of Reference
, thus it's exhaustive.
@@ -18,4 +18,4 @@ object Test extends App { | |||
(true: Boolean) match { case true => case false => case _ => } // exhaustive, last case is unreachable | |||
(true: Boolean) match { case true => case false => case _: Boolean => } // exhaustive, last case is unreachable | |||
(true: Boolean) match { case true => case false => case _: Any => } // exhaustive, last case is unreachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note - last pattern here is unreachable, but not redundant, so no warning is emitted for it in the current code. With intersection-based useless pattern checking, a warning is correctly emitted here.
@smarter - I'm not sure if you're the right person to ask, but I've been unable to successfully run the entire test suite locally and I have no idea why it fails. Even on master after a fresh checkout and submodule init, (please note that patmat unit tests actually succeed! 😛 ) |
@AleksanderBG Congrats, this is a very nice piece of work! |
@liufengyun - I'd be very grateful if you could take a look at how I want to approach integrating Scala's typechecking w/ constraint verification and see if there isn't something obvious to you that I'm missing. In particular, I'll be reusing
I'm not too sure when exactly |
Also, thank you @liufengyun! 😄 |
@AleksanderBG I'll go though the PR thoroughly in these days. BTW, to help review, if the file Here are some high-level comments about Dotty constraint solver. I see two places where you'll need that:
Note that both The general shape of using the Dotty inference infrastructure is quite simple: you create some TVars in the type, then do a subtype BTW, according to your report in #3333, we need one further refinement to child instantiation: replace unconstrained TVars with Wildcard types. |
Rebased on current master & removed some dead and/or misleading code which shouldn't have been here. General tests still fail at pickling tests, but now with the same puzzling exception that CI previously got - I'll see what I can do about that. Most importantly, I've uploaded a working POC of typechecker integration here. It passes all tests, but I'm pretty sure I've gotten some things wrong with it and it needs more work. What's more, since "existential" types as described here aren't currently supported, the gain from finishing the POC is limited. Despite my hopes, this approach doesn't automagically solve #3333. Since now type variables can be kept outside of I'll be now focusing now on splitting the initial commit into smaller pieces. It should also make it easier to compare what is actually improved by using the typechecker. I should be able to get something done for Thursday. @liufengyun - I should have mentioned this when I sent my thesis to you, but: you'll likely gain the most insight into how the algorithm works from Chapter 3, particularly the example subtraction. Section 3.1 can be safely skipped. The weirdest part of the algorithm is the way it recurses when subtracting two constructor spaces - the algorithm makes much more sense if one imagines it's possible to pattern match on multiple values, like in Haskell. Chapter 2 may be useful, but at the same time it's aimed at someone with little knowledge about pattern matching. |
@liufengyun - I've reworked git history. It turned out that to remove some code which I now know is wrong I had to base on #3475. dc19e6a contains both space operation implementations side-by-side. The new implementations are similar in some regards to the old ones, but, again, they exhibit the "weird" recursion I've talked about earlier. It might make sense to skim Haskell algorithm paper as well - in particular, the informal explanation to how the algorithm functions is excellent. Other notable change is that debug messages are now indented - since the operations now recurse much more often, debug output was unreadable to me without this. IDK if the code should stay like this, since it duplicates Test should still fail because of #3479 - hopefully it'll get fixed sooner rather than later. |
Well done @AleksanderBG ! When you get the CI passed, we can compare your algorithm with another approach (implemented in OCaml, briefly mentioned in the The 2nd approach passed all GADT tests, and it has better performance by only enabling the checking for But we do appreciate much your work, and want to evaluate the two approaches when they are done. And it's possible we keep both algorithms so that it's possible to switch between them in the evolution. |
@liufengyun - picking OCaml approach over Haskell is absolutely understandable - part of the reason why I didn't base my work on OCaml approach is that it seemed like it would make for a very short thesis :) However, I should stress that the implementation in this PR is severely wing-clipped by how absurdly naive the constraint checker is (7c79edb). The TVar-based PoC (AleksanderBG/dotty@19e719e0) is the serious implementation. The best comparison could be made after the approach described here by @smarter is in-place. |
Adapt tests to new output: - exhausting: reflect improved GADT handling - patmatexhaust: result of mirroring `scalac` guard handling - t8511: this seems like the correct output for this test All other changes are merely equivalent, but different, warnings.
Rebased on master and applied a workaround for #3479. I've hit a wall with the TVar constraint verifier PoC (AleksanderBG/dotty@2ab594dd47bd8e72d1982a2b2f82983d44f9165a): for gadt-nontrivial2 test case, a warning that I've hit other difficulties with the TVar approach (some of which can be seen at gitter), so I believe now would be a good time to sum up my view on the OCaml approach before I go off to tinker in a corner for a while: If it's possible to make the constraint solver implementation serious, the approach in this PR would be clearly superior to the OCaml one in terms of functionality - there would be no restrictions on GADT param variance, and users would be warned about useless GADT patterns. As in the initial message, useless subpattern warnings as described by Maranget would also be natural with this approach. All patterns need to be normalized to only have or-patterns at the top anyway, so all that's left to do is actually detect if any variants are useless and emit an appropriate warning. I would not worry too much at this point about performance. In the proper implementation, However, if the TVar approach doesn't work out, then the only advantage this PR has over the OCaml approach is ease of implementing useless subpattern warnings, which is quite a bad tradeoff when compared to the amount of additional code required. As my time off from my job has ended last week, I will now have much less time to spend on this PR. I will continue working on it, although I certainly can't make any promises about finishing it on short notice. |
Thanks for your comment @AleksanderBG , I somehow missed it.
The PR #3510 only checks invariant type param for optimisation, because for co-variant or contra-variant types, the existing algorithm works well without change due to union types. As GADTs are usually co-variant or contra-variant in Scala, it seems appropriate to handle invariant GADTs using a simple algorithm. In terms of maintainability, I think #3510 is better than the Haskell algorithm: the core algorithm doesn't need any change, while just a few lines to handle invariant GADTs. Thanks again for your work on exhaustivity check. |
@liufengyun - I definitely agree that the Haskell approach is the more complex one. I still feel that even if all it has over the OCaml approach are the additional types of warnings, it may be worth it. I'll update the PR here as soon as I have the constraint checker in reasonable order and a clear comparison can be made, both of performance and of what warnings are emitted where. As I said though, I'm limited by being employed full-time, so it might take me a while. |
@liufengyun - I think I now understand why I was wrong regarding the variance "limitations". I thought that for code like this: sealed trait Expr[+T]
case class IExpr(x: Int) extends Expr[Int]
case class BExpr(b: Boolean) extends Expr[Boolean]
def foo[T](x: Expr[T], y: Expr[T]) = (x, y) match {
case (IExpr(_), IExpr(_)) => true
case (BExpr(_), BExpr(_)) => false
} no warning should be emitted. However, that's only the case for Scala2, since in Dotty the |
Yes, that's exactly what I meant. However, I need to update #3510 to only ignore |
@AleksanderBG Thanks again for your contribution. If you have something new, please open a new PR. |
This PR reworks how patmat exhaustivity checking is performed so that exhaustivity warnings take into account GADT type constraints.
Note that this is still work in progress - although the code here works and passes all unit tests, some finishing touches still need to be applied for it to become fully functional. However, since I've been raising a lot of issues discovered while implementing this algorithm, it's high time for me to finally keep track of it here.
As a simple example of code for which better warnings are emitted:
join
is exhaustive, since it's not possible to give it "mismatched" arguments. However, current algorithm would emit a warning that there are no patterns for(IExpr(_), BExpr(_))
and vice-versa. With the algorithm in this PR, no warnings are emitted (correctly).The algorithm itself is closely based on Karachalias' et al. Haskell GADT algorithm [1] - my work was to essentially unify Haskell algorithm with Liu's Dotty algorithm, taking the capability for constraint reasoning from the first and the representation of Scala values from the second.
The core adjustment which enables this algorithm to work is that spaces now keep track of constraints associated with them. For instance, when decomposing
Expr[T]
intoIExpr
andBExpr
, type equality constraintsT ~ Int
andT ~ Boolean
become associated with their respective spaces. After spaces are subtracted, spaces with contradictory constraints are removed from being considered. For example,(IExpr(_), BExpr(_))
space would have constraintsT ~ Int
andT ~ Boolean
associated with it. AsT
cannot possibly be equal to bothInt
andBoolean
, the constraints are contradictory and the space is not considered when emitting exhaustivity warnings. My Master's thesis contains a far more in-depth description - if you want to read it, please send me an email or message me on Gitter and I'll send it to you.Remaining work
The most important remaining work is to reuse Scala's typechecking for constraint verification. All other points are essentially nice-to-haves.
Making constraint verification serious
The constraint verification checker is currently extremely naive. It suffices for simple cases only. Haskell's algorithm reuses its type inference algorithm for constraint checking, and something similar should be implemented here.
From what I gather, it seems that Dotty contains no code which explicitly solves type constraints. Instead, as far as I understand, constraints are kept in
Context
and can be adjusted by performing comparisons between types. Integrating this with the algorithm is currently the highest priority, as it will enable the constraint verifier to be as powerful as Scala's typechecking.Note that even with the current naive verifier, a lot of code can be checked. For examples, see
gadt-basic.scala
,gadt-nontrivial.scala
andgadt-nontrivial2.scala
tests - I'll be mentioning them in code comments.Use space intersection for finding useless patterns
Current way of finding useless patterns only locates redundant patterns. With GADTs, it's easy to create a useless, but non-redundant case - for instance,
(IExpr(_), BExpr(_))
injoin
above would be just such a pattern. The code already contains an implementation of using space intersection for finding useless patterns, but there's still a number of issues in surrounding code preventing it from working correctly. These are: #3333 #3455 . There are other issues related to subtyping checks when comparing a type param with any other type, but they should be cleared up once typechecker integration is done.Find useless subpatterns
With the current code, it shouldn't be too difficult to find not only useless patterns, but useless subpatterns as well. Consider:
In the second pattern,
(1, 2)
is useless and probably a programmer error, so a warning should be emitted.While this is completely unrelated to GADTs, it's a nice-to-have.
Changes to guard handling
Haskell algorithm essentially considers patterns with guards to never match, unless they can be constant-folded to
true
. I was actually surprised to find that this wasn't howscalac
behaved - guards which can be constant-folded are treated (correctly) as if they always/never match, but other guards are consider to always match for exhaustivity checks and never match for redundancy checks. I mimicked this behaviour (which changed the output of some tests), but I find it surprising, since I believed thatscalac
behaved like Haskell. Is there any specific reason whyscalac
behaves like that? If possible, I'd change it to behave like Haskell (and possibly revert to old behaviour when Scala2 compat mode is on). The current behaviour also makes it impossible to run intersection-based useless patern checks in-line with exhaustivity checks (as is the case in Haskell), which means the code would be (or at least, should be) less efficient overall.Changes to test output
I'll be mentioning changes to test output in code comments, if there's something notable.
[1] - https://www.microsoft.com/en-us/research/wp-content/uploads/2016/08/gadtpm-acm.pdf