BUG: DataFrameGroupBy.agg with lists doesn't respect as_index=False #52849

rhshadrach · 2023-04-22T15:11:23Z

df = DataFrame({"a1": [0, 0, 1], "a2": [2, 3, 3], "b": [4, 5, 6]})
# df = df.astype({"a1": "category", "a2": "category"})
gb = df.groupby(by=["a1", "a2"], as_index=False)
result = gb.agg(["sum"])
print(result)
#         b
#       sum
# a1 a2
# 0  2    4
#    3    5
# 1  3    6

The expected output is:

  a1 a2   b
        sum
0  0  2   4
1  0  3   5
2  1  3   6

In particular, the index is a RangeIndex and a1 as well as a2 are columns.

The text was updated successfully, but these errors were encountered:

szczekulskij · 2023-04-23T01:17:30Z

This looks a bit hard, but I'll give it a try. If I won't be able to solve this within a week, I'll re-assign

szczekulskij · 2023-04-23T01:17:34Z

take

szczekulskij · 2023-04-24T09:17:26Z

There is also missing data in a1

szczekulskij · 2023-04-24T09:20:01Z

The single missing number aside (from a1 column) - as this seems like a separate issue.

What should be the expected output for gb.agg(["sum", "min"]) ?
It outputs something this, which looks reasonable - since how otherwise can we output multiple agg columns ?

#         b    
#       sum min
# a1 a2        
# 0  2    4   4
#    3    5   5
# 1  3    6   6

Therefore, if I understand correctly - if we're passing only one argument to list, we want to overwrite the list behaviour & output sth similar to gb.agg("sum"), right ? My concern is that some could argue that current output for gb.agg(["sum"]) is correct - that if you're passing in a list, you do want the output df to include "sum" annotation.

rhshadrach · 2023-04-25T01:57:06Z

The single missing number aside (from a1 column) - as this seems like a separate issue.

Are you meaning something like the following?

#         b    
#       sum min
# a1 a2        
# 0  2    4   4
# 0  3    5   5
# 1  3    6   6

pandas prints a sparse representation for the index. A missing entry implicitly means it is repeated from above.

What should be the expected output for gb.agg(["sum", "min"]) ?

I originally had the incorrect columns in the OP; it should be a MultiIndex. The OP has been updated.

szczekulskij · 2023-05-08T12:47:29Z

Hey, I'm sorry but I'll need to drop this issue. Pandas is a bit too much for me right now, I'll come back to this in the future

Charlie-XIAO · 2023-05-15T02:52:28Z

I believe this is because returning too early. As shown below, it didn't go through the as_index check. I will postpone its return until after checking as_index.

pandas/pandas/core/groupby/generic.py

Lines 1331 to 1333 in 935244a

    
           result = op.agg() 
        
           if not is_dict_like(func) and result is not None: 
        
               return result

Charlie-XIAO · 2023-05-15T02:52:35Z

take

Charlie-XIAO · 2023-05-15T05:08:29Z

But Categorical data seems to be tricky. The current

result = self._insert_inaxis_grouper(result)
result.index = default_index(len(result))

seems unable to deal with that.

rhshadrach added Bug Groupby Apply Apply, Aggregate, Transform, Map labels Apr 22, 2023

github-actions bot assigned szczekulskij Apr 23, 2023

rhshadrach unassigned szczekulskij May 9, 2023

github-actions bot assigned Charlie-XIAO May 15, 2023

Charlie-XIAO mentioned this issue May 15, 2023

BUG DataFrameGroupBy.agg with list not respecting as_index=False #53237

Merged

5 tasks

rhshadrach added this to the 2.1 milestone May 20, 2023

mroeschke closed this as completed in #53237 May 22, 2023

mvashishtha mentioned this issue Jul 10, 2023

BUG: groupby.agg(func=list, as_index=False) incorrectly puts keys in columns modin-project/modin#6274

Closed

3 tasks

itholic mentioned this issue Sep 8, 2023

[SPARK-45065][PYTHON][PS] Support Pandas 2.1.0 apache/spark#42793

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrameGroupBy.agg with lists doesn't respect as_index=False #52849

BUG: DataFrameGroupBy.agg with lists doesn't respect as_index=False #52849

rhshadrach commented Apr 22, 2023 •

edited

Loading

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 24, 2023

szczekulskij commented Apr 24, 2023 •

edited

Loading

rhshadrach commented Apr 25, 2023

szczekulskij commented May 8, 2023

Charlie-XIAO commented May 15, 2023

Charlie-XIAO commented May 15, 2023

Charlie-XIAO commented May 15, 2023

BUG: DataFrameGroupBy.agg with lists doesn't respect as_index=False #52849

BUG: DataFrameGroupBy.agg with lists doesn't respect as_index=False #52849

Comments

rhshadrach commented Apr 22, 2023 • edited Loading

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 23, 2023

szczekulskij commented Apr 24, 2023

szczekulskij commented Apr 24, 2023 • edited Loading

rhshadrach commented Apr 25, 2023

szczekulskij commented May 8, 2023

Charlie-XIAO commented May 15, 2023

Charlie-XIAO commented May 15, 2023

Charlie-XIAO commented May 15, 2023

rhshadrach commented Apr 22, 2023 •

edited

Loading

szczekulskij commented Apr 24, 2023 •

edited

Loading