Skip to content

BUG: groupby.apply() inconsistently indexes its return value with the groupby- #45476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
ODemidenko opened this issue Jan 19, 2022 · 2 comments
Closed
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform, Map Bug Duplicate Report Duplicate issue or pull request Groupby

Comments

@ODemidenko
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
full_time_index = pd.Index(
        pd.to_datetime(['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15']),
        name='timestamp'
    )
df = pd.DataFrame({'A': [3, 1],
                   'B': ['foo', 'foo'],
                   'timestamp': full_time_index[::2]
                   }).set_index('timestamp')
group_cols = ['B']

def reindex_by_group(df, group_cols=group_cols, new_index=full_time_index):
    result = df.groupby(group_cols).apply(lambda d: d.reindex(new_index, fill_value=0)).drop(columns=group_cols)
    print(result.index.names)
    assert set(result.index.names) == set(['timestamp']+group_cols)

#this adds group column to index (desired behavior):
result = reindex_by_group(df, new_index=full_time_index)
result = reindex_by_group(df, new_index=full_time_index[:1])
result = reindex_by_group(df.iloc[:1], new_index=full_time_index)

#this fail:
result = reindex_by_group(df.iloc[:1], new_index=full_time_index[:1])
result = reindex_by_group(df, new_index=full_time_index[::2])

Issue Description

Whether group columns are added to the index depends on the amount fo rows returned by apply:
when it changes - group columns are added to the index. When it doesn't change - group columns aren't added.
The problem is, that amount of returned rows may or may not change after applying exactly the same code.

(There is a similar issue, that is nevertheless different. Previously, I reported this bug there. But topic starter there reported a different issue, and it is being resolved separately, as I got it: #44803)

Expected Behavior

IMO, the most intuitive behavior is always adding group columns to index.
The fact that 'transform' operations don't do it - was a major source of confustion for me and my colleagues.

Installed Versions

Replace this line with the output of pd.show_versions()

@ODemidenko ODemidenko added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 19, 2022
@rhshadrach
Copy link
Member

Thanks for the report! This is duplicate of other issues and will be closed by #34998 (see issues linked there).

The fact that 'transform' operations don't do it - was a major source of confustion for me and my colleagues.

Transforms don't do it because they are most useful that way, as often it is used as:

df['new_column'] = df.groupby(keys).apply(op)

If the index of the RHS is not the same as that on the LHS, this would not work because of alignment. That said, I think there is an appetite to have apply not infer (at least, as much). This could mean forcing users to to use .transform when they want to apply a transformation. The drawback is this would make apply less flexible, and needs to be considered carefully.

@rhshadrach
Copy link
Member

I'm going to close this as a duplicate - if you believe I missed anything or want further discussion, just reply here and can reopen!

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform, Map Groupby Duplicate Report Duplicate issue or pull request and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

2 participants