Skip to content

BUG: weird behaviour for returning group in groupby.apply #22546

Closed
@h-vetinari

Description

@h-vetinari

I was trying to document my experiences with the inconsistencies of DataFrame.groupby.apply (see #22545), and one of them was the following:

N = 5
df = pd.DataFrame(index=range(N), columns=['id', 'x', 'y', 'z'])
df.loc[:, ['x', 'y', 'z']] = np.arange(N*3).reshape(N, 3)
df.id = np.random.randint(0, 3, (N,)) + 10

df
#    id   x   y   z
# 0  11   0   1   2
# 1  10   3   4   5
# 2  10   6   7   8
# 3  12   9  10  11
# 4  12  12  13  14

Then, even though the result returned by the function is exactly the same, the following outputs are different:

df.groupby('id', as_index=True).apply(lambda gr: gr))
#    id   x   y   z
# 0  11   0   1   2
# 1  10   3   4   5
# 2  10   6   7   8
# 3  12   9  10  11
# 4  12  12  13  14

df.groupby('id', as_index=True).apply(lambda gr: gr.iloc[:10 ** 6])
#       id   x   y   z
# id                  
# 10 1  10   3   4   5
#    2  10   6   7   8
# 11 0  11   0   1   2
# 12 3  12   9  10  11
#    4  12  12  13  14

The first one just returns the original frame as-is, with no attempt to actually group the results like the second output. Furthermore, both outputs should not have the id column anymore, which is now ambiguous between the index and the columns (e.g. in case one may continue with groupby after some further transformations)

Desired output of both:

#        x   y   z
# id              
# 10 1   3   4   5
#    2   6   7   8
# 11 0   0   1   2
# 12 3   9  10  11
#    4  12  13  14

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapBugGroupby

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions