Skip to content

Inconsistent behavior between df.sum() and groupby(col).agg('sum') on lists #29033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gqfiddler opened this issue Oct 16, 2019 · 1 comment
Closed
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc.

Comments

@gqfiddler
Copy link

gqfiddler commented Oct 16, 2019

df = pd.DataFrame({
    'id':[1,2,2],
    'cost':[5,5,5],
    'letters':[['a','b'],['a','b'],['a','b']]
})
print(df.sum()) # joins lists in 'letters' column
print(df.groupby('id').agg('sum')) # drops 'letters' column from results
print(df.groupby('id').agg(pd.Series.sum)) # successfully joins lists in 'letters' column

Problem description

Like the + operator in python, .sum() in pandas is overloaded to perform list joins as well as numerical addition. However, 'sum' inside of the 'agg' method does not do this. Instead, it treats lists as un-addable objects and drops them from the dataset.

For both convenience and consistency, df.join('col').agg('sum') should exhibit the same behavior on lists as df.sum() and df.col.sum(). This would be as easy as calling the existing pd.Series.sum() function given a 'sum' input from the user.

@gqfiddler gqfiddler changed the title Inconsistent behavior between .sum() and groupby(col).agg('sum') on lists Inconsistent behavior between df.sum() and groupby(col).agg('sum') on lists Oct 16, 2019
@jbrockmendel jbrockmendel added Apply Apply, Aggregate, Transform, Map Groupby Numeric Operations Arithmetic, Comparison, and Logical operations labels Oct 30, 2019
@mroeschke mroeschke added the Bug label Jun 28, 2020
@jbrockmendel jbrockmendel added List-Like Scalars Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc. Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). and removed List-Like Scalars labels Sep 21, 2020
@mroeschke mroeschke removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Jul 21, 2021
@jreback jreback added this to the Contributions Welcome milestone Jan 16, 2022
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach
Copy link
Member

This was resolved by the nuisance column deprecation that was enforced in 2.0 and is already well-tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Nested Data Data where the values are collections (lists, sets, dicts, objects, etc.). Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants