Skip to content

Commit d59a8b2

Browse files
Merge pull request #651 from scala/tribulations-canbuildfrom
Add blogpost: Tribulations of CanBuildFrom
2 parents 3496e9d + 6df2a5b commit d59a8b2

File tree

2 files changed

+266
-0
lines changed

2 files changed

+266
-0
lines changed
Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
---
2+
layout: blog
3+
post-type: blog
4+
by: Julien Richard-Foy
5+
title: Tribulations of CanBuildFrom
6+
---
7+
8+
[`CanBuildFrom`](/api/2.12.2/scala/collection/generic/CanBuildFrom.html) is probably the most
9+
infamous abstraction of the current collections. It is mainly criticised for making scary type
10+
signatures.
11+
12+
Our ongoing [collections redesign](https://github.com/scala/collection-strawman) is an opportunity
13+
to try alternative designs. This blogposts explains the (many!) problems solved by `CanBuildFrom`
14+
and the alternative solutions implemented in the new collections.
15+
16+
## Transforming the elements of a collection
17+
18+
It’s useful to think of `String` as a collection of `Char` elements: you can then use
19+
the common collection operations like `++`, `find`, etc. on `String` values.
20+
21+
However the `map` method is challenging because this one
22+
transforms the `Char` elements into something that might or might not be `Char`s.
23+
Then, what should be the return type of the `map` method on `String` values? Ideally,
24+
we want to get back a `String` if we transform each `Char` into another `Char`, but we
25+
want to get some `Seq[B]` if we transform each `Char` into a different type `B`. And this
26+
is the way it currently works:
27+
28+
~~~
29+
Welcome to Scala 2.12.2 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
30+
Type in expressions for evaluation. Or try :help.
31+
32+
scala> "foo".map(c => c.toInt)
33+
res1: scala.collection.immutable.IndexedSeq[Int] = Vector(102, 111, 111)
34+
35+
scala> "foo".map(c => c.toUpper)
36+
res2: String = FOO
37+
~~~
38+
39+
This feature is not limited to the `map` method: `flatMap`, `collect`, `concat` and a few
40+
others also work the same. Moreover, `String` is not the only
41+
collection type that needs this feature: [`BitSet`](/api/2.12.2/index.html?search=bitset)
42+
and [`Map`](/api/2.12.2/index.html?search=map) are other examples.
43+
44+
The current collections rely on `CanBuildFrom` to implement this feature. The `map`
45+
method is defined as follows:
46+
47+
~~~ scala
48+
def map[B, That](f: Char => B)(implicit bf: CanBuildFrom[String, B, That]): That
49+
~~~
50+
51+
When the implicit `CanBuildFrom` parameter is resolved it fixes the return type `That`.
52+
The resolution is driven by the actual `B` type: if `B` is `Char` then `That` is fixed
53+
to `String`, otherwise it is `immutable.IndexedSeq`.
54+
55+
The drawback of this solution is that the type signature of the `map` method looks cryptic.
56+
57+
In the new design we solve this problem by defining two overloads of the `map`
58+
method: one that handles `Char` to `Char` transformations, and one that handles other
59+
transformations. The type signatures of these `map` methods are straightforward:
60+
61+
~~~ scala
62+
def map(f: Char => Char): String
63+
def map[B](f: Char => B): Seq[B]
64+
~~~
65+
66+
Then, if you call `map` with a function that returns a `Char`, the first overload is
67+
selected and you get a `String`. Otherwise, the second overload is selected and you
68+
get a `Seq[B]`. Before Scala 2.12 such a solution would not have worked well: users
69+
would have been required to explicitly write the type of the argument of the supplied
70+
`f` function. In Scala 2.12 type inference has been improved so that it is not
71+
anymore necessary.
72+
73+
Thus, we got rid of the cryptic method signatures while still supporting the feature
74+
of returning a different type of result according to the type of the transformation function.
75+
76+
## Collections’ type constructors with different arities
77+
78+
The collections are hierarchically organized. Essentially, the most generic collection
79+
is `Iterable[A]`, and then we have three main kinds of collections: `Seq[A]`, `Set[A]`
80+
and `Map[K, V]`.
81+
82+
![](/resources/img/blog/collections-hierarchy.svg)
83+
84+
It is worth noting that `Map[K, V]` takes two type parameters (`K` and `V`) whereas the
85+
other collection types take only one type parameter. This makes it difficult to
86+
generically define, at the level of `Iterable[A]`, operations that will
87+
return a `Map[K, V]` when specialized.
88+
89+
For instance, consider again the case of the `map` method. We want to generically define
90+
it on `Iterable[A]`, but which return type should we use? When this method will
91+
be inherited by `List[A]` we want its return type to be `List[B]`, but when
92+
it will be inherited by `HashMap[K, V]`, we want its return type to be `HashMap[L, W]`.
93+
It is clear that we want to abstract over the type constructor of the concrete collections,
94+
but the difficulty is that they don’t always take the same number of type parameters.
95+
96+
That’s a second problem solved by `CanBuildFrom` in the current collections.
97+
Look again at the type signature of the (generic) `map` method on `Iterable[A]`:
98+
99+
~~~ scala
100+
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That
101+
~~~
102+
103+
The return type `That` is inferred from the resolved `CanBuildFrom` instance at call-site.
104+
Both the `Repr` and `B` types actually drive the implicit resolution: when `Repr` is `List[_]`
105+
the parameter `That` is fixed to `List[B]`, and when `Repr` is `HashMap[_, _]` and `B` is a
106+
tuple `(K, V)` then `That` is fixed to `HashMap[K, V]`.
107+
108+
In the new design we solve this problem by defining two “branches” in the hierarchy:
109+
110+
- `IterableOps` for collections whose type constructor takes one parameter,
111+
- `MapOps` for collections whose type constructor takes two parameters.
112+
113+
Here is a simplified version of `IterableOps`:
114+
115+
~~~ scala
116+
trait IterableOps[A, CC[_]] {
117+
def map[B](f: A => B): CC[B]
118+
}
119+
~~~
120+
121+
The `CC` type parameter stands for *C*ollection type *C*onstructor. Then, the `List[A]`
122+
concrete collection extends `IterableOps[A, List]` to set its correct self-type constructor.
123+
124+
Similarly, here is a simplified version of `MapOps`:
125+
126+
~~~ scala
127+
trait MapOps[K, V, CC[_, _]] extends IterableOps[(K, V), Iterable] {
128+
def map[L, W](f: ((K, V)) => (L, W)): CC[L, W]
129+
}
130+
~~~
131+
132+
And then the `HashMap[K, V]` concrete collection extends `MapOps[K, V, HashMap]` to set
133+
its correct self-type constructor. Note that `MapOps` extends `IterableOps`: consequently it
134+
inherits from its `map` method, which will be selected when the transformation function
135+
passed to `map` does not return a tuple.
136+
137+
## Sorted collections
138+
139+
The third challenge is about sorted collections (like `TreeSet` and `TreeMap`, for instance).
140+
These collections define their order of iteration according to an ordering relationship for the
141+
type of their elements.
142+
143+
As a consequence, when you transform the type of the elements (e.g. by using the -- now familiar! --
144+
`map` method), an implicit ordering instance for the new type of elements has to be available.
145+
146+
With `CanBuildFrom`, the solution relies (again) on the implicit resolution mechanism:
147+
the implicit `CanBuildFrom[TreeSet[_], X, TreeSet[X]]` instance is available for some
148+
type `X` only if an implicit `Ordering[X]` instance is also available.
149+
150+
In the new design we solve this problem by introducing a new branch in the hierarchy.
151+
This one defines transformation operations that require an ordering instance for the element
152+
type of the resulting collection:
153+
154+
~~~ scala
155+
trait SortedIterableOps[A, CC[_]] {
156+
def map[B : Ordering](f: A => B): CC[B]
157+
}
158+
~~~
159+
160+
However, as mentioned in the previous section, we need to also abstract over the kind of the
161+
type constructor of the concrete collections. Consequently we have in total four branches:
162+
163+
kind | not sorted | sorted
164+
------------|-------------|-------------------
165+
`CC[_]` |`IterableOps`|`SortedIterableOps`
166+
`CC[_, _]` |`MapOps` |`SortedMapOps`
167+
168+
In summary, instead of having one `map` method that supports all the use cases described in
169+
this section and the previous ones, we specialized the hierarchy to have overloads of
170+
the `map` method, each one supporting a specific use case. The benefit is that the type
171+
signatures immediately tell you the story: you don’t have to have a look at the actual
172+
implicit resolution to know the result you will get from calling `map`.
173+
174+
## Implicit builders
175+
176+
In the current collections, the fact that `CanBuildFrom` instances are available in the
177+
implicit scope is useful to implement, separately from the collections, generic operations
178+
that work with any collection type.
179+
180+
Examples of use cases are:
181+
182+
- [`Future.traverse`](https://github.com/scala/scala/blob/92ffe04070f25452b8d48ee7fbced587ddafbf6d/src/library/scala/concurrent/Future.scala#L822-L840)
183+
- type-driven builders (e.g. in [play-json](https://github.com/playframework/play-json/blob/8642c485c79e32263b7bef5f991abb486523b3ef/play-json/shared/src/main/scala/Reads.scala#L144-L170), or [slick](https://github.com/slick/slick/blob/51e14f2756ed29b8c92a24b0ae24f2acd0b85c6f/slick/src/main/scala/slick/jdbc/PositionedResult.scala#L150-L154))
184+
- extension methods (e.g. in [scala-extensions](https://github.com/cvogt/scala-extensions/blob/master/src/main/scala/collection.scala#L14-L28))
185+
186+
In the new design we are still experimenting with solutions to support these features. So far
187+
the decision is to not put implicit builders in the collections implementation. We might
188+
provide them as an optional dependency instead, but it seems that most of these use cases
189+
could be supported even without implicit builders: you could just use an existing collection
190+
instance and navigate through its companion object (providing the builder), or you could just
191+
use the companion object directly to get a builder.
192+
193+
## `breakOut` escape hatch
194+
195+
As we have previously seen, in the current collections when we want to transform some
196+
collection into a new collection, we rely on an available implicit `CanBuildFrom`
197+
instance to get a builder for the target collection. The implicit search is
198+
driven by the type of the initial collection and the type of elements of the target
199+
collection. The available implicit instances have been designed to make sense in the most
200+
common cases.
201+
202+
However, sometimes this default behavior is not what you want. For instance, consider the
203+
following program:
204+
205+
~~~ scala
206+
val xs: List[Int] = 1 :: 2 :: 3 :: Nil
207+
val xsWithSquares: Map[Int, Int] =
208+
xs.map(x => (x, x * x))
209+
~~~
210+
211+
If you try to compile it you will get a compile error because the implicitly
212+
resolved builder produces a `List[(Int, Int)]` instead of the desired `Map[Int, Int]`.
213+
We could convert this `List[(Int, Int)]` into a `Map[Int, Int]` but that
214+
would be inefficient for large collections.
215+
216+
We can fix this issue by using the `breakOut` escape hatch:
217+
218+
~~~ scala
219+
val xs: List[Int] = 1 :: 2 :: 3 :: Nil
220+
val xsWithSquares: Map[Int, Int] =
221+
xs.map(x => (x, x * x))(collection.breakOut)
222+
~~~
223+
224+
`breakOut` selects a `CanBuildFrom` instance irrespective of the initial collection type.
225+
This requires the target type to be known, in this case via an explicit type ascription.
226+
227+
In the new design we have no direct equivalent of `breakOut`. The solution of the
228+
above example consists in using a `View` to avoid the construction of an
229+
intermediate collection:
230+
231+
~~~ scala
232+
val xs: List[Int] = 1 :: 2 :: 3 :: Nil
233+
val xsWithSquares: Map[Int, Int] =
234+
xs.view.map(x => (x, x * x)).to(Map)
235+
~~~
236+
237+
In practice, we expect that most usages of `breakOut` could be adapted to the new design by using
238+
a `View` followed by an explicit `to` call. However, this is an area that remains to explore.
239+
240+
## Summary
241+
242+
In this article we have reviewed the features built on top of `CanBuildFrom` and explained
243+
the design decision we made for the new collections to support most of these features
244+
without `CanBuildFrom`.
Lines changed: 22 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)