|
| 1 | +--- |
| 2 | +layout: blog-detail |
| 3 | +post-type: blog |
| 4 | +by: Sébastien Doeraene |
| 5 | +title: Scala 3.3.2 post-mortem analysis |
| 6 | +description: Post-mortem analysis of the Scala 3.3.2 release abort |
| 7 | +--- |
| 8 | + |
| 9 | +Last week, we announced [the release of Scala 3.4.0 and 3.3.3]({{ site.baseurl }}/blog/2024/02/29/scala-3.4.0-and-3.3.3-released.html). |
| 10 | +The announcement mentioned that we had to skip 3.3.2 due to a bug breaking our forward binary [compatibility guarantees](https://docs.scala-lang.org/overviews/core/binary-compatibility-of-scala-releases.html) in a patch release. |
| 11 | + |
| 12 | +In this post, we take a closer look at what happened, why it slipped through our existing testing processes, and what steps we are taking so that it does not happen again in the future. |
| 13 | + |
| 14 | +## Symptoms of the bug |
| 15 | + |
| 16 | +First, let us look at the scenario in which the bug surfaces: |
| 17 | + |
| 18 | +1. Compile a library A with Scala 3.3.2 |
| 19 | +2. Compile a project B, depending on A, with Scala 3.3.1 (or 3.3.0) |
| 20 | +3. The older compiler crashes with the following message: |
| 21 | + |
| 22 | + ``` |
| 23 | + [error] error while loading SomeClass, |
| 24 | + [error] class file .../SomeClass.class is broken, reading aborted with class java.util.NoSuchElementException |
| 25 | + [error] contextual$ |
| 26 | + [error] one error found |
| 27 | + ``` |
| 28 | + |
| 29 | +The error does not surface in all situations. |
| 30 | +It only happens when `SomeClass`, coming from library A, is involved in the expansion of a macro or inline def. |
| 31 | +Nevertheless, it is not rare. |
| 32 | + |
| 33 | +Since the above scenario is supposed to be accepted according to our forward compatibility guarantees, we had to abort the release of Scala 3.3.2. |
| 34 | +However, the artifacts had already been published to Maven Central, which is an immutable repository. |
| 35 | +We had no choice but to abandon the 3.3.2 version number, and publish 3.3.3 instead. |
| 36 | + |
| 37 | +## The bug in detail |
| 38 | + |
| 39 | +Let us dig a bit deeper on the bug. |
| 40 | +It appears while the compiler is *reading* the TASTy files of a dependency. |
| 41 | +In particular, it cannot read the *name* of a method parameter. |
| 42 | + |
| 43 | +Names in TASTy and in the compiler are not simple Strings. |
| 44 | +They have some amount of structure. |
| 45 | +For example, the name of the class behind an `object Foo` definition has a structured name `ModuleClass("Foo")`, instead of the string `"Foo$"`. |
| 46 | + |
| 47 | +Likewise, generated names for anonymous entites, such as `implicit` parameters, have the structured name `UniqueName("evidence$", 1)`. |
| 48 | +The string `"evidence$"` used in the equality definition of names, so that `UniqueName("evidence$", 1)` is not the same name as `UniqueName("foo$", 1)`, but it does not otherwise have any semantic meaning in TASTy. |
| 49 | + |
| 50 | +However, for code readability and performance reasons, the compiler maintains a fixed list of the prefixes it generates, and uses a single instance of the associated `NameKind`, one of its internal data structures. |
| 51 | +When it reads the string `"evidence$"`, it reuses the instance of `NameKind` that it uses for that particular string. |
| 52 | + |
| 53 | +This leads us to the cause of our bug. |
| 54 | +Between 3.3.1 and 3.3.2, we merged [PR #18280](https://github.com/lampepfl/dotty/pull/18280). |
| 55 | +That PR starts with what appeared to be a *refactoring* commit that renamed some `NameKind`s and their associated strings. |
| 56 | +Unfortunately, that means the 3.3.2 compiler started emitting `UniqueName`s with the string `"contextual$"`. |
| 57 | +When the 3.3.1 compiler reads that structured name, it tries to find the cached instance of `NameKind` that it should use for `"constextual$"`, and does not find any, because it did not use to generate that string. |
| 58 | + |
| 59 | +In order to fix this in 3.3.3, we reverted to generating `UniqueName("evidence$", x)` where 3.3.2 would generate `UniqueName("contextual$", x)`. |
| 60 | + |
| 61 | +## Why the bug passed our review and testing processing |
| 62 | + |
| 63 | +The root cause is that the compiler assumed that only a fixed set of known strings would ever appear in TASTy `UniqueName`s, despite the fact that the TASTy format says that any string is valid in that position. |
| 64 | +From that root cause, an unfortunate combination of events happened. |
| 65 | + |
| 66 | +First, when we reviewed [PR #18280](https://github.com/lampepfl/dotty/pull/18280), taking the TASTy format into account, we assumed that using new strings would be a perfectly compatible change. |
| 67 | +We therefore did not mark the PR as "needs-minor-release" as we merged it into the `main` branch for release in 3.4.0. |
| 68 | + |
| 69 | +Second, the PR was backported to the 3.3.x LTS branch. |
| 70 | +Normally, such a refactoring would not have met the bar for *backporting* on its own. |
| 71 | +We backported the commit because the PR that contained it was improving tooling, and in particular performance of incremental compilation. |
| 72 | +This is one of the areas for which backports are highly desirable. |
| 73 | + |
| 74 | +Third and last, the backport failed to trigger any Continuous Integration test: |
| 75 | + |
| 76 | +* the initial set of unit tests for the compiler never mix different compiler versions, and |
| 77 | +* the community builds compile projects using the *new* compiler with dependencies compiled by the *old* compiler. |
| 78 | + |
| 79 | +However, our issue only manifests when the *old* compiler is used with dependencies compiled by the *new* compiler. |
| 80 | + |
| 81 | +Concretely, we did not have adequate testing scenarios for *forward* compatibility. |
| 82 | + |
| 83 | +The RC cycles failed to surface the issue as well, for a similar reason. |
| 84 | +When trying out an RC, users rarely publish the product, even locally, then use it with an older compiler. |
| 85 | + |
| 86 | +Fortunately, as a last line of defense, the issue was caught by the Play! framework, in-between the release of the 3.3.2 artifacts on Maven Central and the official announcement of the release. |
| 87 | + |
| 88 | +## Preventing this kind of bugs in the future |
| 89 | + |
| 90 | +As we mentioned in the previous section, the main point of failure was a lack of tests for forward compatibility scenarios. |
| 91 | +Our most important fix to the process will therefore be to add adequate forward compatibility tests in the compiler's test suite. |
| 92 | + |
| 93 | +In addition, this particular bug highlights the importance of accurately following the specificiation of the TASTy format. |
| 94 | +Making additional assumptions on the *origin* of TASTy file---namely that only our own compiler can produce them---leads to problems like this issue. |
| 95 | +We will therefore check whether there are other similar assumptions lurking in the compiler's unpickler, and address them. |
| 96 | + |
| 97 | +## Conclusion |
| 98 | + |
| 99 | +In this post, we examined in detail the bug and events that led to the abandon of the Scala 3.3.2 release. |
| 100 | +We first dug into the bug's root cause, then looked into the series of events that allowed it to go unnoticed, and finally mentioned the steps we are going to take to prevent similar bugs from happening in the future. |
0 commit comments