Inconsistent behavior between df.sum() and groupby(col).agg('sum') on lists #29033
Labels
Apply
Apply, Aggregate, Transform, Map
Bug
Groupby
Nested Data
Data where the values are collections (lists, sets, dicts, objects, etc.).
Nuisance Columns
Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
Reduction Operations
sum, mean, min, max, etc.
Problem description
Like the + operator in python, .sum() in pandas is overloaded to perform list joins as well as numerical addition. However, 'sum' inside of the 'agg' method does not do this. Instead, it treats lists as un-addable objects and drops them from the dataset.
For both convenience and consistency, df.join('col').agg('sum') should exhibit the same behavior on lists as df.sum() and df.col.sum(). This would be as easy as calling the existing pd.Series.sum() function given a 'sum' input from the user.
The text was updated successfully, but these errors were encountered: