-
Notifications
You must be signed in to change notification settings - Fork 32
adds splitBy extension method on scala.collection.Iterator #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good method to add and the use cases are fairly common to add to this lib.
src/main/scala/scala/collection/decorators/IteratorDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IteratorDecorator.scala
Outdated
Show resolved
Hide resolved
7a3f8ec
to
7e5521a
Compare
I made cosmetic changes after the first review :
|
Hello @julienrf I don't mind if this isn't a priority, but just want to make sure this PR was submitted at the right place (not to mention I would love feedback on the actual implementation :) ) cheers |
Hello Jean! Sorry for being silent. Yes, the PR looks great! We are still unsure about what will happen to |
Hi Julien, thats perfectly ok for me. I'll keep monitoring the thread here and at contributors.scala-lang.org cheers :) |
src/main/scala/scala/collection/decorators/IteratorDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IteratorDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IteratorDecorator.scala
Outdated
Show resolved
Hide resolved
If we're gonna add this |
7e5521a
to
d659dd1
Compare
@joshlemer I gave a try to adding this to |
@jeantil I think it should return at least |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of having the following signature as a Seq[A]
decorator?
def splitBy[B](f: A => B): CC[CC[A]]
And, similarly, on Iterator[A]
:
def splitBy[B](f: A => B): Iterator[Iterator[A]]
@julienrf We could aim for consistency with the most closely related method on iterator, for Seq's: Since it's not changing the underlying element type, I think it could return This would also be more consistent with |
Here is what I came up with for a quick shake up : def splitBy[K, That,CC](f: seq.A => K)(implicit bf: BuildFrom[C, seq.A, That], bff: BuildFrom[C,That,CC]):CC =
bff.fromSpecific(coll)(seq(coll).iterator.splitBy(f).map(bf.fromSpecific(coll))) this compiles, but then I get compilation errors on the tests because of ambiguous implicits [error] Note that implicit conversions are not applicable because they are ambiguous:
[error] both method IterableDecorator in package decorators of type [C](coll: C)(implicit it: scala.collection.generic.IsIterable[C])scala.collection.decorators.IterableDecorator[C,it.type]
[error] and method SeqDecorator in package decorators of type [C](coll: C)(implicit seq: scala.collection.generic.IsSeq[C])scala.collection.decorators.SeqDecorator[C,seq.type]
[error] are possible conversion functions from scala.collection.immutable.Vector[A] to ?{def splitBy: ?}
[error] val groupedVector:Vector[Vector[Nothing]] = Vector.empty.splitBy(identity)
[error] ^ But that won't work with CC[C] since there is no guarantee that C is preserved (as per @joshlemer's proposal) I tried getting it to work on |
@jeantil is |
@joshlemer You raised good points ;)
I don’t know what is the reason for
👍 |
I realize I may not have pushed everything, I had started adding it on IterableDecorator and it seems i never pushed the commit (it was in response to #4 (comment)) def splitBy[K, That](f: it.A => K)(implicit bf: BuildFrom[C, it.A, That]):Iterator[That] is as far as I managed to compile for IterableDecorator. |
8a007ef
to
226accb
Compare
Hi, Doing the latter, I switched from delegating to iterator to a specific implementation which The corresponding tests have been updated too and I applied a formatter for |
226accb
to
b732778
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your patience, Jean!
Doing the latter, I switched from delegating to iterator to a specific implementation which
I found easier to work with to achieve the expected type signature.
I understand, however the drawback is that your implementation is strict by default. I think it would be nice if it was lazy on LazyList
.
src/main/scala/scala/collection/decorators/IterableDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IterableDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IterableDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IterableDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IterableDecorator.scala
Outdated
Show resolved
Hide resolved
src/main/scala/scala/collection/decorators/IteratorDecorator.scala
Outdated
Show resolved
Hide resolved
Hi Julien,
It's alright, I'm doing this on the side and I really appreciate the time
you spend for reviews.
I may not be able to address your latest comments before next week.
Le lun. 19 août 2019 à 10:32, Julien Richard-Foy <[email protected]>
a écrit :
… ***@***.**** commented on this pull request.
Thanks *a lot* for your patience, Jean!
Doing the latter, I switched from delegating to iterator to a specific
implementation which
I found easier to work with to achieve the expected type signature.
I understand, however the drawback is that your implementation is strict
by default. I think it would be nice if it was lazy on LazyList.
------------------------------
In src/main/scala/scala/collection/decorators/IterableDecorator.scala
<#4 (comment)>
:
> @@ -33,4 +33,46 @@ class IterableDecorator[C, I <: IsIterable[C]](coll: C)(implicit val it: I) {
def lazyFoldRight[B](z: B)(op: it.A => Either[B, B => B]): B =
it(coll).iterator.lazyFoldRight(z)(op)
+
+ /**
+ * Constructs an iterator where consecutive elements are accumulated as
We should replace “iterator” with “collection”
------------------------------
In src/main/scala/scala/collection/decorators/IterableDecorator.scala
<#4 (comment)>
:
> @@ -33,4 +33,46 @@ class IterableDecorator[C, I <: IsIterable[C]](coll: C)(implicit val it: I) {
def lazyFoldRight[B](z: B)(op: it.A => Either[B, B => B]): B =
it(coll).iterator.lazyFoldRight(z)(op)
+
+ /**
+ * Constructs an iterator where consecutive elements are accumulated as
+ * long as the output of f for each element doesn't change.
+ * <pre>
+ * Vector(1,2,2,3,3,3,2,2)
+ * .splitBy(identity)
+ * </pre>
+ * produces
+ * <pre>
+ * Iterator(Vector(1),
It should be Vector(Vector(1), instead of Iterator(Vector(1),
------------------------------
In src/main/scala/scala/collection/decorators/IterableDecorator.scala
<#4 (comment)>
:
> + * .splitBy(identity)
+ * </pre>
+ * produces
+ * <pre>
+ * Iterator(Vector(1),
+ * Vector(2,2),
+ * Vector(3,3,3),
+ * Vector(2,2))
+ * </pre>
+ *
+ * @param f the function to compute a key for an element
+ * @tparam K the type of the computed key
+ * @return an iterator of sequences of the consecutive elements with the
+ * same key in the original iterator
+ */
+ def splitBy[K, That, CC](f: it.A => K)(implicit bf: BuildFrom[C, it.A, That], bff: BuildFrom[C, That, CC]): CC = {
What about naming the type parameters That1, That2? Or maybe C1 and C2?
------------------------------
In src/main/scala/scala/collection/decorators/IterableDecorator.scala
<#4 (comment)>
:
> + * <pre>
+ * Vector(1,2,2,3,3,3,2,2)
+ * .splitBy(identity)
+ * </pre>
+ * produces
+ * <pre>
+ * Iterator(Vector(1),
+ * Vector(2,2),
+ * Vector(3,3,3),
+ * Vector(2,2))
+ * </pre>
+ *
+ * @param f the function to compute a key for an element
+ * @tparam K the type of the computed key
+ * @return an iterator of sequences of the consecutive elements with the
+ * same key in the original iterator
What about the following description?
@return <https://github.com/return> a sequence of sequences of
consecutive elements having the same key
------------------------------
In src/main/scala/scala/collection/decorators/IterableDecorator.scala
<#4 (comment)>
:
> + * @param f the function to compute a key for an element
+ * @tparam K the type of the computed key
+ * @return an iterator of sequences of the consecutive elements with the
+ * same key in the original iterator
+ */
+ def splitBy[K, That, CC](f: it.A => K)(implicit bf: BuildFrom[C, it.A, That], bff: BuildFrom[C, That, CC]): CC = {
+ val builder = bff.newBuilder(coll)
+
+ val iterator = it(coll).iterator
+ var init = bf.newBuilder(coll)
+ if (iterator.hasNext) {
+ var ref = iterator.next();
+ init += ref
+ while (iterator.hasNext) {
+ val el = iterator.next();
+ if (f(el) != f(ref)) {
Should we put f(ref) in a val to compute it only once?
------------------------------
In src/main/scala/scala/collection/decorators/IteratorDecorator.scala
<#4 (comment)>
:
> +
+ override def next(): immutable.Seq[A] = {
+ if (hasNext) {
+ val seq = Vector.newBuilder[A]
+ if (hdDefined) {
+ seq += hd
+ } else {
+ hd = `this`.next()
+ hdDefined = true
+ seq += hd
+ }
+ var hadSameKey = true
+ while (`this`.hasNext && hadSameKey) {
+ val el = `this`.next()
+ hdDefined = true
+ if (f(el) == f(hd)) {
Same comment here about f(hd)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AAAFTQ3OQOZAVD5OGKBL4DTQFJLDVA5CNFSM4FAUBXA2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCB4Y2MA#pullrequestreview-276401456>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAFTQ2YTQDMIAIESJ3OMBLQFJLDVANCNFSM4FAUBXAQ>
.
|
I had some free time this morning thanks to kids sleeping in :) |
Looks good to me, thanks! Do you mind squashing the commits into a single one? |
No problem, I made separate commits to facilitate the review.
Le mar. 20 août 2019 à 10:17, Julien Richard-Foy <[email protected]>
a écrit :
… Looks good to me, thanks! Do you mind squashing the commits into a single
one?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AAAFTQYEVH7SZK5QKKD3KQLQFOSBFA5CNFSM4FAUBXA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4VO7HI#issuecomment-522907549>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAFTQ3PDOMBWWQT7VTUCGDQFOSBFANCNFSM4FAUBXAQ>
.
|
`Iterator#splitBy` constructs an iterator where consecutive elements of the original iterator are accumulated as long as the output of a key function for each element doesn't change. This operation makes sense as soon as you are trying to process an iterator where you know the elements will be sorted in a certain way and you need to group them without loading all the data in memory. For instance * processing a file where the ordering is guaranteed but the file doesn't fit in the heap, * processing a streaming resultset where the underlying database guarantees the ordering because of a sort clause. The same operation is added to `Iterable` with the difference that the specific container type of the input is preserved for both collection levels of the output, thus * `Set(1,2,3).splitBy(identity)` returns `Set(Set(1), Set(2), Set(3))` * `Vector(1,2,3).splitBy(identity)` returns `Vector(Vector1), Vector2), Vector3))` * etc.
718abf1
to
c57a652
Compare
Done I squashed everything to a single commit |
🚀 |
How is this different than span? |
The simplified type signatures for both operations look like the following which should give a hint as to the difference def span(p: (A) ⇒ Boolean): (Iterable[A], Iterable[A]) vs def splitBy[K](f: (A) => K):Iterable[Iterable[A]] But the difference increases in the details :
I think of |
Oh I see. I assumed from the name that it was comparable to splitAt but
with a predicate.
What about a name that is more associated with groupBy, like chunkBy or
groupOnBoundaries or something?
…On Thu, Sep 12, 2019 at 12:23 PM Jean Helou ***@***.***> wrote:
The simplified type signatures for both operations look like which should
give a hint as to the difference
def span(p: (A) ⇒ Boolean): (Iterable[A], Iterable[A])
vs
def splitBy[K](f: (A) => K):Iterable[Iterable[A]]
But the difference is in the details :
- span accumulates all the elements as long as the predicate is
satisfied, however it assumes the accumulation condition can be derived
from a single element
- splitBy accumulates all elements which successively resolve to the
same key. Writing a predicate for this requires access to the previous
element to know if the key changed which isn't available in span.
- groupBy accumulates all elements which resolve to the same key,
consuming all the input in the process.
I think of splitBy as closer to a lazy groupBy that avoids consuming the
whole input than to span.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AAAYAUALFHYDAO2WQGIJC6LQJJUIVA5CNFSM4FAUBXA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SOVKY#issuecomment-530901675>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAYAUBGHVJAMY5DUBMYZJDQJJUIVANCNFSM4FAUBXAQ>
.
|
It was initially called I think that edit |
I don't really care, just thinking out loud ;)
…On Thu, Sep 12, 2019 at 1:06 PM Jean Helou ***@***.***> wrote:
It was initially called groupUntilChanged and was renamed to splitBy as
per comments during the review process...
I think that chunkBy is an interesting proposal though. I don't mind
making another PR to rename it. I would prefer to have @julienrf
<https://github.com/julienrf> and @joshlemer
<https://github.com/joshlemer> 's thoughts on this before I make it :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AAAYAUCOTLOO4PGBPAW5NHTQJJZIFA5CNFSM4FAUBXA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6SSPIA#issuecomment-530917280>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAYAUAW7775YLNAGW5BTXDQJJZIFANCNFSM4FAUBXAQ>
.
|
Iterator#splitBy
constructs an iterator where consecutiveelements of the original iterator are accumulated as long as the output
of a key function for each element doesn't change. They are emitted as
an Iterable as soon as the key function changes.
This operation makes sense as soon as you are trying to process an
iterator where you know the elements will be sorted in a certain way and
you need to group them without loading all the data in memory.
For instance
doesn't fit in the heap,
guarantees the ordering because of a sort clause.
The same operation is added on Iterables with the difference that the
specific container type of the input is preserved for both collection
levels of the output, thus
Set(1,2,3).splitBy(identity)
returnsSet(Set(1), Set(2), Set(3))
Vector(1,2,3).splitBy(identity)
returnsVector(Vector1), Vector2), Vector3))