|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +post-type: blog |
| 4 | +by: Julien Richard-Foy |
| 5 | +title: Tribulations of CanBuildFrom |
| 6 | +--- |
| 7 | + |
| 8 | +[`CanBuildFrom`](/api/2.12.2/scala/collection/generic/CanBuildFrom.html) is probably the most |
| 9 | +infamous abstraction of the current collections. It is mainly criticised for making scary type |
| 10 | +signatures. |
| 11 | + |
| 12 | +Our ongoing [collections redesign](https://github.com/scala/collection-strawman) is an opportunity |
| 13 | +to try alternative designs. This blogposts explains the (many!) problems solved by `CanBuildFrom` |
| 14 | +and the alternative solutions implemented in the new collections. |
| 15 | + |
| 16 | +## Transforming the elements of a collection |
| 17 | + |
| 18 | +It’s useful to think of `String` as a collection of `Char` elements: you can then use |
| 19 | +the common collection operations like `++`, `find`, etc. on `String` values. |
| 20 | + |
| 21 | +However the `map` method is challenging because this one |
| 22 | +transforms the `Char` elements into something that might or might not be `Char`s. |
| 23 | +Then, what should be the return type of the `map` method on `String` values? Ideally, |
| 24 | +we want to get back a `String` if we transform each `Char` into another `Char`, but we |
| 25 | +want to get some `Seq[B]` if we transform each `Char` into a different type `B`. And this |
| 26 | +is the way it currently works: |
| 27 | + |
| 28 | +~~~ |
| 29 | +Welcome to Scala 2.12.2 (OpenJDK 64-Bit Server VM, Java 1.8.0_131). |
| 30 | +Type in expressions for evaluation. Or try :help. |
| 31 | +
|
| 32 | +scala> "foo".map(c => c.toInt) |
| 33 | +res1: scala.collection.immutable.IndexedSeq[Int] = Vector(102, 111, 111) |
| 34 | +
|
| 35 | +scala> "foo".map(c => c.toUpper) |
| 36 | +res2: String = FOO |
| 37 | +~~~ |
| 38 | + |
| 39 | +This feature is not limited to the `map` method: `flatMap`, `collect`, `concat` and a few |
| 40 | +others also work the same. Moreover, `String` is not the only |
| 41 | +collection type that needs this feature: [`BitSet`](/api/2.12.2/index.html?search=bitset) |
| 42 | +and [`Map`](/api/2.12.2/index.html?search=map) are other examples. |
| 43 | + |
| 44 | +The current collections rely on `CanBuildFrom` to implement this feature. The `map` |
| 45 | +method is defined as follows: |
| 46 | + |
| 47 | +~~~ scala |
| 48 | +def map[B, That](f: Char => B)(implicit bf: CanBuildFrom[String, B, That]): That |
| 49 | +~~~ |
| 50 | + |
| 51 | +When the implicit `CanBuildFrom` parameter is resolved it fixes the return type `That`. |
| 52 | +The resolution is driven by the actual `B` type: if `B` is `Char` then `That` is fixed |
| 53 | +to `String`, otherwise it is `immutable.IndexedSeq`. |
| 54 | + |
| 55 | +The drawback of this solution is that the type signature of the `map` method looks cryptic. |
| 56 | + |
| 57 | +In the new design we solve this problem by defining two overloads of the `map` |
| 58 | +method: one that handles `Char` to `Char` transformations, and one that handles other |
| 59 | +transformations. The type signatures of these `map` methods are straightforward: |
| 60 | + |
| 61 | +~~~ scala |
| 62 | +def map(f: Char => Char): String |
| 63 | +def map[B](f: Char => B): Seq[B] |
| 64 | +~~~ |
| 65 | + |
| 66 | +Then, if you call `map` with a function that returns a `Char`, the first overload is |
| 67 | +selected and you get a `String`. Otherwise, the second overload is selected and you |
| 68 | +get a `Seq[B]`. Before Scala 2.12 such a solution would not have worked well: users |
| 69 | +would have been required to explicitly write the type of the argument of the supplied |
| 70 | +`f` function. In Scala 2.12 type inference has been improved so that it is not |
| 71 | +anymore necessary. |
| 72 | + |
| 73 | +Thus, we got rid of the cryptic method signatures while still supporting the feature |
| 74 | +of returning a different type of result according to the type of the transformation function. |
| 75 | + |
| 76 | +## Collections’ type constructors with different arities |
| 77 | + |
| 78 | +The collections are hierarchically organized. Essentially, the most generic collection |
| 79 | +is `Iterable[A]`, and then we have three main kinds of collections: `Seq[A]`, `Set[A]` |
| 80 | +and `Map[K, V]`. |
| 81 | + |
| 82 | + |
| 83 | + |
| 84 | +It is worth noting that `Map[K, V]` takes two type parameters (`K` and `V`) whereas the |
| 85 | +other collection types take only one type parameter. This makes it difficult to |
| 86 | +generically define, at the level of `Iterable[A]`, operations that will |
| 87 | +return a `Map[K, V]` when specialized. |
| 88 | + |
| 89 | +For instance, consider again the case of the `map` method. We want to generically define |
| 90 | +it on `Iterable[A]`, but which return type should we use? When this method will |
| 91 | +be inherited by `List[A]` we want its return type to be `List[B]`, but when |
| 92 | +it will be inherited by `HashMap[K, V]`, we want its return type to be `HashMap[L, W]`. |
| 93 | +It is clear that we want to abstract over the type constructor of the concrete collections, |
| 94 | +but the difficulty is that they don’t always take the same number of type parameters. |
| 95 | + |
| 96 | +That’s a second problem solved by `CanBuildFrom` in the current collections. |
| 97 | +Look again at the type signature of the (generic) `map` method on `Iterable[A]`: |
| 98 | + |
| 99 | +~~~ scala |
| 100 | +def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That |
| 101 | +~~~ |
| 102 | + |
| 103 | +The return type `That` is inferred from the resolved `CanBuildFrom` instance at call-site. |
| 104 | +Both the `Repr` and `B` types actually drive the implicit resolution: when `Repr` is `List[_]` |
| 105 | +the parameter `That` is fixed to `List[B]`, and when `Repr` is `HashMap[_, _]` and `B` is a |
| 106 | +tuple `(K, V)` then `That` is fixed to `HashMap[K, V]`. |
| 107 | + |
| 108 | +In the new design we solve this problem by defining two “branches” in the hierarchy: |
| 109 | + |
| 110 | +- `IterableOps` for collections whose type constructor takes one parameter, |
| 111 | +- `MapOps` for collections whose type constructor takes two parameters. |
| 112 | + |
| 113 | +Here is a simplified version of `IterableOps`: |
| 114 | + |
| 115 | +~~~ scala |
| 116 | +trait IterableOps[A, CC[_]] { |
| 117 | + def map[B](f: A => B): CC[B] |
| 118 | +} |
| 119 | +~~~ |
| 120 | + |
| 121 | +The `CC` type parameter stands for *C*ollection type *C*onstructor. Then, the `List[A]` |
| 122 | +concrete collection extends `IterableOps[A, List]` to set its correct self-type constructor. |
| 123 | + |
| 124 | +Similarly, here is a simplified version of `MapOps`: |
| 125 | + |
| 126 | +~~~ scala |
| 127 | +trait MapOps[K, V, CC[_, _]] extends IterableOps[(K, V), Iterable] { |
| 128 | + def map[L, W](f: ((K, V)) => (L, W)): CC[L, W] |
| 129 | +} |
| 130 | +~~~ |
| 131 | + |
| 132 | +And then the `HashMap[K, V]` concrete collection extends `MapOps[K, V, HashMap]` to set |
| 133 | +its correct self-type constructor. Note that `MapOps` extends `IterableOps`: consequently it |
| 134 | +inherits from its `map` method, which will be selected when the transformation function |
| 135 | +passed to `map` does not return a tuple. |
| 136 | + |
| 137 | +## Sorted collections |
| 138 | + |
| 139 | +The third challenge is about sorted collections (like `TreeSet` and `TreeMap`, for instance). |
| 140 | +These collections define their order of iteration according to an ordering relationship for the |
| 141 | +type of their elements. |
| 142 | + |
| 143 | +As a consequence, when you transform the type of the elements (e.g. by using the -- now familiar! -- |
| 144 | +`map` method), an implicit ordering instance for the new type of elements has to be available. |
| 145 | + |
| 146 | +With `CanBuildFrom`, the solution relies (again) on the implicit resolution mechanism: |
| 147 | +the implicit `CanBuildFrom[TreeSet[_], X, TreeSet[X]]` instance is available for some |
| 148 | +type `X` only if an implicit `Ordering[X]` instance is also available. |
| 149 | + |
| 150 | +In the new design we solve this problem by introducing a new branch in the hierarchy. |
| 151 | +This one defines transformation operations that require an ordering instance for the element |
| 152 | +type of the resulting collection: |
| 153 | + |
| 154 | +~~~ scala |
| 155 | +trait SortedIterableOps[A, CC[_]] { |
| 156 | + def map[B : Ordering](f: A => B): CC[B] |
| 157 | +} |
| 158 | +~~~ |
| 159 | + |
| 160 | +However, as mentioned in the previous section, we need to also abstract over the kind of the |
| 161 | +type constructor of the concrete collections. Consequently we have in total four branches: |
| 162 | + |
| 163 | +kind | not sorted | sorted |
| 164 | +------------|-------------|------------------- |
| 165 | +`CC[_]` |`IterableOps`|`SortedIterableOps` |
| 166 | +`CC[_, _]` |`MapOps` |`SortedMapOps` |
| 167 | + |
| 168 | +In summary, instead of having one `map` method that supports all the use cases described in |
| 169 | +this section and the previous ones, we specialized the hierarchy to have overloads of |
| 170 | +the `map` method, each one supporting a specific use case. The benefit is that the type |
| 171 | +signatures immediately tell you the story: you don’t have to have a look at the actual |
| 172 | +implicit resolution to know the result you will get from calling `map`. |
| 173 | + |
| 174 | +## Implicit builders |
| 175 | + |
| 176 | +In the current collections, the fact that `CanBuildFrom` instances are available in the |
| 177 | +implicit scope is useful to implement, separately from the collections, generic operations |
| 178 | +that work with any collection type. |
| 179 | + |
| 180 | +Examples of use cases are: |
| 181 | + |
| 182 | +- [`Future.traverse`](https://github.com/scala/scala/blob/92ffe04070f25452b8d48ee7fbced587ddafbf6d/src/library/scala/concurrent/Future.scala#L822-L840) |
| 183 | +- type-driven builders (e.g. in [play-json](https://github.com/playframework/play-json/blob/8642c485c79e32263b7bef5f991abb486523b3ef/play-json/shared/src/main/scala/Reads.scala#L144-L170), or [slick](https://github.com/slick/slick/blob/51e14f2756ed29b8c92a24b0ae24f2acd0b85c6f/slick/src/main/scala/slick/jdbc/PositionedResult.scala#L150-L154)) |
| 184 | +- extension methods (e.g. in [scala-extensions](https://github.com/cvogt/scala-extensions/blob/master/src/main/scala/collection.scala#L14-L28)) |
| 185 | + |
| 186 | +In the new design we are still experimenting with solutions to support these features. So far |
| 187 | +the decision is to not put implicit builders in the collections implementation. We might |
| 188 | +provide them as an optional dependency instead, but it seems that most of these use cases |
| 189 | +could be supported even without implicit builders: you could just use an existing collection |
| 190 | +instance and navigate through its companion object (providing the builder), or you could just |
| 191 | +use the companion object directly to get a builder. |
| 192 | + |
| 193 | +## `breakOut` escape hatch |
| 194 | + |
| 195 | +As we have previously seen, in the current collections when we want to transform some |
| 196 | +collection into a new collection, we rely on an available implicit `CanBuildFrom` |
| 197 | +instance to get a builder for the target collection. The implicit search is |
| 198 | +driven by the type of the initial collection and the type of elements of the target |
| 199 | +collection. The available implicit instances have been designed to make sense in the most |
| 200 | +common cases. |
| 201 | + |
| 202 | +However, sometimes this default behavior is not what you want. For instance, consider the |
| 203 | +following program: |
| 204 | + |
| 205 | +~~~ scala |
| 206 | +val xs: List[Int] = 1 :: 2 :: 3 :: Nil |
| 207 | +val xsWithSquares: Map[Int, Int] = |
| 208 | + xs.map(x => (x, x * x)) |
| 209 | +~~~ |
| 210 | + |
| 211 | +If you try to compile it you will get a compile error because the implicitly |
| 212 | +resolved builder produces a `List[(Int, Int)]` instead of the desired `Map[Int, Int]`. |
| 213 | +We could convert this `List[(Int, Int)]` into a `Map[Int, Int]` but that |
| 214 | +would be inefficient for large collections. |
| 215 | + |
| 216 | +We can fix this issue by using the `breakOut` escape hatch: |
| 217 | + |
| 218 | +~~~ scala |
| 219 | +val xs: List[Int] = 1 :: 2 :: 3 :: Nil |
| 220 | +val xsWithSquares: Map[Int, Int] = |
| 221 | + xs.map(x => (x, x * x))(collection.breakOut) |
| 222 | +~~~ |
| 223 | + |
| 224 | +`breakOut` selects a `CanBuildFrom` instance irrespective of the initial collection type. |
| 225 | +This requires the target type to be known, in this case via an explicit type ascription. |
| 226 | + |
| 227 | +In the new design we have no direct equivalent of `breakOut`. The solution of the |
| 228 | +above example consists in using a `View` to avoid the construction of an |
| 229 | +intermediate collection: |
| 230 | + |
| 231 | +~~~ scala |
| 232 | +val xs: List[Int] = 1 :: 2 :: 3 :: Nil |
| 233 | +val xsWithSquares: Map[Int, Int] = |
| 234 | + xs.view.map(x => (x, x * x)).to(Map) |
| 235 | +~~~ |
| 236 | + |
| 237 | +In practice, we expect that most usages of `breakOut` could be adapted to the new design by using |
| 238 | +a `View` followed by an explicit `to` call. However, this is an area that remains to explore. |
| 239 | + |
| 240 | +## Summary |
| 241 | + |
| 242 | +In this article we have reviewed the features built on top of `CanBuildFrom` and explained |
| 243 | +the design decision we made for the new collections to support most of these features |
| 244 | +without `CanBuildFrom`. |
0 commit comments