Skip to content

Add Enum.sum_by/2 #12885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 21, 2024
Merged

Add Enum.sum_by/2 #12885

merged 9 commits into from
Jun 21, 2024

Conversation

sabiwara
Copy link
Contributor

@sabiwara sabiwara commented Aug 22, 2023

Proposal draft to include Enum.sum/2.

If we agree on the signature, will send a similar PR for Enum.product/2.

Side note: might also bench specific implementations for lists to see if it's worth it.
Edit: done, shows a 40% increase. Not bad, will update accordingly.

@josevalim
Copy link
Member

My biggest concern with sum/2 and product/2 is figuring out the API. We have count/2, but it works as a filter (understandably). min and max are also aggregations but the function configures the empty value. sum_by is not a good fit either, as it should behave closer to frequencies_by (which by the way it is not called count_by). So if we add this, each aggregation/2 function behaves slightly different and I am not sure if it will be confusing.

@sabiwara
Copy link
Contributor Author

sabiwara commented Aug 23, 2023

I understand this concern, my thinking was that filtering is still possible by returning the noop value 0 or 1. Maybe we can explicitly provide an example doing so in the docs ?

An explicit alternative could be map_sum/2 (cf map_join/2), which somehow feels less elegant though.

@kaaboaye
Copy link

sum_by is not a good fit either, as it should behave closer to frequencies_by

Assuming that frequencies_by would have value_fun just like group_by sum_by wouldn't be so out of place.

image

@hauleth
Copy link
Contributor

hauleth commented Aug 31, 2023

3rd time the charm? #11381 and #11455

@sabiwara
Copy link
Contributor Author

I've been thinking about this one, and map_sum/2 would actually make a lot of sense if we think in terms of types.

It would be consistent with map_join/2, which has a almost the same signature and is the equivalent aggregation function for strings.

map_join([t], (t -> String.t())) :: String.t()
map_sum([t], (t -> number())) :: number()

Type-wise, _by would be inconsistent (as you mentioned @josevalim) because _by functions usually return the original type, not the "mapped" type (or as keys in a map):

max_by([t], (t -> u)) :: t
uniq_by([t], (t -> u)) :: [t]

Another benefit of map_sum/2 is that the name is pretty clear and leaves no doubt about what it's doing.

@whatyouhide
Copy link
Member

whatyouhide commented Sep 18, 2023

I'm personally ok with map_sum/2 as the name, what you just wrote makes sense @sabiwara. I’m not convinced we need this in the first place, though. Also, we'd need map_product/2, just to point that out.

@sabiwara
Copy link
Contributor Author

I’m not convinced we need this in the first place, though.

I think it would be good from the perspective of encouraging users to pick efficiency without sacrificing readability.
Most of the cases I've found myself needing sum/1 are like the one in the cheatsheet and require mapping.
Right now one can be tempted to write map(f) |> sum() instead of a single reduce to make the intent clearer, yet this is not slightly but highly more wasteful.

@sabiwara
Copy link
Contributor Author

Closing for now since the appropriate solution still needs discussion.

@sabiwara sabiwara closed this Sep 18, 2023
@sabiwara sabiwara reopened this Sep 18, 2023
@PragTob
Copy link
Contributor

PragTob commented Dec 13, 2023

@sabiwara you said closing as it still needs discussion but immediately reopened - accidental double click? 😅

@sabiwara
Copy link
Contributor Author

@sabiwara you said closing as it still needs discussion but immediately reopened - accidental double click? 😅

Nope, Jose asked me to keep it open 😉

@josevalim
Copy link
Member

Closing this, as we were unable to reach a conclusion.

@josevalim josevalim closed this Mar 22, 2024
@aloisdg
Copy link

aloisdg commented Jun 21, 2024

Learning Elixir right now. The absence of sum/2 is quite sad.

We are doing:

      item[:quantity_by_size]
      |> Enum.map(fn {_k, v} -> v end)
      |> Enum.sum

and the exercise says

Use the Enum.reduce function in the total_quantity function to practice. It's the best fitting Enum function for this task.

Here is what I did:

  defp sum_by(enumerable, fun) do
    enumerable |> Enum.reduce(0, fn element, acc -> fun.(element) + acc end)
  end


item[:quantity_by_size]
 |> sum_by(fn {_k, v} -> v end)

So are we doomed to rewrite a sum/2 with a low level tool like reduce each time? Cant we have something nicer and really fitting?

Please let us write

      item[:quantity_by_size]
      |> Enum.sum(fn {_k, v} -> v end)

FYI: Both C# has sum/2 and F# has sum/2 as sumBy

@josevalim josevalim reopened this Jun 21, 2024
@josevalim
Copy link
Member

josevalim commented Jun 21, 2024

I have revisited my previous argument. And I found a flaw in it:

sum_by is not a good fit either, as it should behave closer to frequencies_by (which by the way it is not called count_by).

We already have max_by, dedup_by, chunk_by and many others. And they all behave differently - and they should - because they are customizing different operations.

So if everyone agrees, I am fine with adding this as sum_by and product_by.

EDIT: I believe @sabiwara mentioned this earlier as well.

@sabiwara
Copy link
Contributor Author

sabiwara commented Jun 21, 2024

@josevalim after thinking about this for a while, I was more leaning towards map_sum/2 due to the very close situation with map_join. It could be caught by a linter rule which would be symmetrical with this one, and type signatures are very similar too:

@spec map_join(Enumerable.t(x), x -> String.Chars.t()) :: String.t()
@spec map_sum(Enumerable.t(x), x -> number()) :: number()

OTOH, for a callback of type x -> y, max_by or dedup_by still return a x / list(x), y is used for comparisons.

I am fine either way, sum_by/2 works as well, just wanted to share the rationale above.

@sabiwara
Copy link
Contributor Author

sabiwara commented Jun 21, 2024

  • Rebased
  • Renamed to sum_by for now
  • Added an entry to the cheatsheet and reworked the docs a bit
Screenshot 2024-06-21 at 23 52 03

Once this is merged, will send an equivalent for product/2

Co-authored-by: Wojtek Mach <[email protected]>
@sabiwara sabiwara changed the title Add Enum.sum/2 Add Enum.sum_by/2 Jun 21, 2024
Co-authored-by: Łukasz Jan Niemier <[email protected]>
@sabiwara sabiwara merged commit 0df128c into elixir-lang:main Jun 21, 2024
9 checks passed
@sabiwara sabiwara deleted the enum-sum-2 branch June 21, 2024 21:27
@sabiwara sabiwara mentioned this pull request Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants