|
| 1 | +--- |
| 2 | +layout: blog-detail |
| 3 | +post-type: blog |
| 4 | +by: Julien Richard-Foy |
| 5 | +title: On Performance of the New Collections |
| 6 | +--- |
| 7 | + |
| 8 | +In a [previous blog post](/blog/2017/11/28/view-based-collections.html), I explained |
| 9 | +how [Scala 2.13’s new collections](http://www.scala-lang.org/blog/2017/02/28/collections-rework.html) |
| 10 | +have been designed so that the default implementations of transformation operations work |
| 11 | +with both strict and non-strict types of collections. In essence, we abstract over |
| 12 | +the evaluation mode (strict or non strict) of concrete collection types. |
| 13 | + |
| 14 | +After we published that blog post, the community |
| 15 | +[raised concerns](https://www.reddit.com/r/scala/comments/7g52cy/let_them_be_lazy/dqgol36/) |
| 16 | +about possible performance implications of having more levels of abstraction than before. |
| 17 | + |
| 18 | +This blog article: |
| 19 | + |
| 20 | +- gives more information about the overhead of the collections’ |
| 21 | + view-based design and our solution to remove that overhead, |
| 22 | +- argues that for correctness reasons it is still better to have |
| 23 | + view-based default implementations, |
| 24 | +- shows that we should expect the new collections to be equally fast |
| 25 | + or faster than the old collections, and reports an average speedup |
| 26 | + of 35% in the case of `Vector`’s `filter`, `map` and `flatMap`. |
| 27 | + |
| 28 | +For reference, the source code of the new collections is available in |
| 29 | +[this GitHub repository](https://github.com/scala/collection-strawman). |
| 30 | + |
| 31 | +## Overhead Of View Based Implementations |
| 32 | + |
| 33 | +Let’s be clear, the view based implementations are in general slower than their |
| 34 | +builder based versions. How much slower exactly varies with the type of collection |
| 35 | +(e.g. `List`, `Vector`, `Set`), the operation (e.g. `map`, `flatMap`, `filter`) |
| 36 | +and the number of elements in the collection. In my benchmark on `Vector`, on |
| 37 | +the `map`, `filter` and `flatMap` operations, with 1 to 7 million of |
| 38 | +elements, I measured an average slowdown of 25%. |
| 39 | + |
| 40 | +## How To Fix That Performance Regression? |
| 41 | + |
| 42 | +Our solution is simply to go back to builder based implementations for strict collections: we |
| 43 | +override the default view based implementations with more efficient builder based |
| 44 | +ones. We actually end up with the same implementations as in the old collections. |
| 45 | + |
| 46 | +In practice these implementations are factored out in traits that can be mixed |
| 47 | +into concrete collection types. Such trait names are always prefixed with |
| 48 | +`StrictOptimized`. For instance, here is an excerpt of the `StrictOptimizedIterableOps` |
| 49 | +trait: |
| 50 | + |
| 51 | +~~~ scala |
| 52 | +trait StrictOptimizedIterableOps[+A, +CC[_], +C] extends IterableOps[A, CC, C] { |
| 53 | + |
| 54 | + override def map[B](f: A => B): CC[B] = { |
| 55 | + val b = iterableFactory.newBuilder[B]() |
| 56 | + val it = iterator() |
| 57 | + while (it.hasNext) { |
| 58 | + b += f(it.next()) |
| 59 | + } |
| 60 | + b.result() |
| 61 | + } |
| 62 | + |
| 63 | +} |
| 64 | +~~~ |
| 65 | + |
| 66 | +Then, to implement the `Vector` collection, we just mix such a “strict optimized” trait: |
| 67 | + |
| 68 | +~~~ scala |
| 69 | +trait Vector[+A] extends IndexedSeq[A] |
| 70 | + with IndexedSeqOps[A, Vector, Vector[A]] |
| 71 | + with StrictOptimizedSeqOps[A, Vector, Vector[A]] |
| 72 | +~~~ |
| 73 | + |
| 74 | +Here we use `StrictOptimizedSeqOps`, which is a specialization of `StrictOptimizedIterableOps` |
| 75 | +for `Seq` collections. |
| 76 | + |
| 77 | +## Is The View Based Design Worth It? |
| 78 | + |
| 79 | +In my previous article, I explained a drawback of the old builder based design. |
| 80 | +On non strict collections (e.g. `Stream` or `View`), we had to carefully override all the |
| 81 | +default implementations of transformation operations to make them non strict. |
| 82 | + |
| 83 | +Now it seems that the situation is just reversed: the default implementations work well |
| 84 | +with non strict collections, but we have to override them in strict collections. |
| 85 | + |
| 86 | +So, is the new design worth it? To answer this question I will quote a comment posted |
| 87 | +by Stefan Zeiger [here](https://www.reddit.com/r/scala/comments/7g52cy/let_them_be_lazy/dqixt8d/): |
| 88 | + |
| 89 | +> The lazy-by-default approach is mostly beneficial when you're implementing lazy |
| 90 | +> collections because you don't have to override pretty much everything or get |
| 91 | +> incorrect semantics. The reverse risk is smaller: If you don't override a lazy |
| 92 | +> implementation for a strict collection type you only suffer a small performance |
| 93 | +> impact but it's still correct. |
| 94 | +
|
| 95 | +In short, implementations are **correct first** in the new design but you might want to |
| 96 | +override them for performance reasons on strict collections. |
| 97 | + |
| 98 | +## Performance Comparison With 2.12’s Collections |
| 99 | + |
| 100 | +Talking about performance, how performant are the new collections compared to the old ones? |
| 101 | + |
| 102 | +Again, the answer depends on the type of collection, the operations and the number of elements. |
| 103 | +My `Vector` benchmarks show a 35% speedup on average: |
| 104 | + |
| 105 | + |
| 106 | + |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | +These charts show the speedup factor (vertically) of the `filter`, `map` and `flatMap` |
| 112 | +operations execution compared to the old `Vector`, for various number of elements (horizontally). |
| 113 | +The blue line shows the old `Vector`, |
| 114 | +the red line shows the new `Vector` if it used only view based |
| 115 | +implementations, and the yellow line shows the actual new `Vector` |
| 116 | +(with strict optimized implementations). Benchmark source code and numbers can be found |
| 117 | +[here](https://gist.github.com/julienrf/f1cb2b062cd9783a35e2f35778959c76). |
| 118 | + |
| 119 | +Since operation implementations end up being the same, why do we get better performance |
| 120 | +at all? Well, these numbers are specific to `Vector` and the tested operations, they |
| 121 | +are due to the fact that |
| 122 | +we more aggressively inlined a few critical methods. I don’t expect the new collections |
| 123 | +to be *always* faster than the old collections. However, there is no reason for |
| 124 | +them to be slower since the execution path, when calling an operation, can be made |
| 125 | +exactly the same as in the old collections. |
| 126 | + |
| 127 | +## Conclusion |
| 128 | + |
| 129 | +This article studied the performance of the new collections. I’ve reported that view |
| 130 | +based operation implementations are about 25% slower than builder based implementations, |
| 131 | +and I’ve explained how we restored builder based implementations on strict collections. |
| 132 | +Last but not least, I’ve shown that defaulting to view based implementations does |
| 133 | +make sense for the sake of correctness. |
| 134 | + |
| 135 | +I expect the new collections to be equally fast or slightly faster than the previous collections. |
| 136 | +Indeed, we took advantage of the rewrite to apply some more optimizations here and |
| 137 | +again. |
| 138 | + |
| 139 | +More significant performance improvements can be achieved by using different |
| 140 | +data structures. For instance, we recently |
| 141 | +[merged](https://github.com/scala/collection-strawman/pull/342) |
| 142 | +a completely new implementation of immutable `Set` and `Map` based on [compressed |
| 143 | +hash-array mapped prefix-trees](https://michael.steindorfer.name/publications/oopsla15.pdf). |
| 144 | +This data structure has a smaller memory footprint than the old `HashSet` and `HashMap`, |
| 145 | +and some operations can be an order of magnitude faster (e.g. `==` is up to 7x faster). |
0 commit comments