Skip to content

Behavior change of 1.4.rc0 when group by apply returns a copy of the passed data frame #45315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
aberres opened this issue Jan 11, 2022 · 8 comments
Closed
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@aberres
Copy link
Contributor

aberres commented Jan 11, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

df = pd.DataFrame([[1,2,3,4]], columns=["A", "B", "C", "D"])

grouped = df.groupby(['A', 'B'], as_index= True)[["C", "D"]]

combined = grouped.apply(lambda x: x.copy())

combined.reset_index().columns
# Pandas 1.3: Index(['A', 'B', 'level_2', 'C', 'D'], dtype='object')
# Pandas 1.4: Index(['index', 'C', 'D'], dtype='object')

Issue Description

While testing our code base against 1.4rc0 I stumbled upon a behavior change of Pandas 1.4.

Formerly when apply was called on a grouper it seems as if a returned multi-index was honored in the resulting data frame.
With 1.4 the index is dropped.

No idea if this is an intentional change or not. Wanted to let you know to be on the safe side.
also did not check why or code does what it does (the method called in apply is pretty complex, not just a copy). Looks a bit fishy...

Expected Behavior

Unchanged behavior.

Installed Versions

Replace this line with the output of pd.show_versions()

@aberres aberres added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2022
@aberres aberres changed the title Behavior change of 1.4.rc0 when group by apply returns frames with multi-index Behavior change of 1.4.rc0 when group by apply returns a copy of the passed data frame Jan 11, 2022
@aberres
Copy link
Contributor Author

aberres commented Jan 11, 2022

It seems my analysis is not really correct.

What is passed to the apply func seems to be unchanged. It is just that formerly returning a copy created a frame with a multi-index while this does not happen anymore?

The behavior when just returning what is passed (grouped.apply(lambda x: x)) seems to be unchanged. So yes, aligning the behavior makes sense I guess.

@jreback jreback added this to the 1.4 milestone Jan 12, 2022
@jreback
Copy link
Contributor

jreback commented Jan 12, 2022

cc @rhshadrach if u could have a quick look

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jan 12, 2022
@simonjayhawkins
Copy link
Member

While testing our code base against 1.4rc0 I stumbled upon a behavior change of Pandas 1.4.

first bad commit: [d037ff6] REF: remove libreduction.apply_frame_axis0 (#42992)

cc @jbrockmendel

@simonjayhawkins simonjayhawkins added Apply Apply, Aggregate, Transform, Map Groupby Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 12, 2022
@rhshadrach
Copy link
Member

rhshadrach commented Jan 14, 2022

@aberres - thanks for the report! The change here was not intentional, in that it was an behavior change that was not noticed until after it was merged and without a deprecation notice. However, it does move pandas in the right direction of consistency, and would be very hard to revert, so I am +1 on moving forward with the behavior in 1.4 as-is.

apply was meant to be flexible, handling aggregations, transforms, and everything in between. The result is different when apply detects an operation was a transform. Previously, in 1.3, pandas used two different definitions of whether something was a transform (i.e. whether the index was "mutated") - one code path used "is" whereas another code path used "equals" when determining whether the input index is the same as the output index. The PR highlighted by @simonjayhawkins removed this inconsistency.

Being able to control what happens with the group keys in apply (regardless of whether pandas detects a transform) would be resolved by #34998.

@simonjayhawkins
Copy link
Member

Thanks @rhshadrach should we close this or leave open until #34998 is merged and move off 1.4 milestone?

@simonjayhawkins
Copy link
Member

The change here was not intentional, in that it was an behavior change that was not noticed until after it was merged and without a deprecation notice.

Thinking some more, it's not too late to add something to the release notes about the breaking change.

@rhshadrach
Copy link
Member

That sounds good @simonjayhawkins - I should be able to get a PR up by the end of today.

@rhshadrach
Copy link
Member

Closing this as the remark about allowing user control of the group keys being resolved by #34998 is duplicative of e.g. #34809.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

4 participants