Skip to content

agg with list of non-aggregating functions #35723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 14, 2020

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Aug 14, 2020

Reverts to 1.0.5 behavior while maintaining the bugfix that introduced the "regression". Ideally agg would instead raise in these scenarios (ref), but that would be an API change. Much of this PR should be reverted once this is done, I've marked such places with TODOs.

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform, Map Groupby Regression Functionality that used to work in a prior pandas version labels Aug 14, 2020
@rhshadrach rhshadrach added this to the 1.1.1 milestone Aug 14, 2020
@jreback jreback merged commit 0658ce3 into pandas-dev:master Aug 14, 2020
@jreback
Copy link
Contributor

jreback commented Aug 14, 2020

thanks @rhshadrach can you open an issue to 'fix' this properly in 1.2

@rhshadrach
Copy link
Member Author

The issue is here: #35725

@rhshadrach rhshadrach deleted the agg_regression branch August 14, 2020 22:22
@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

@@ -322,11 +322,14 @@ def _aggregate_multiple_funcs(self, arg):
# let higher level handle
return results

output = self._wrap_aggregated_output(results)
output = self._wrap_aggregated_output(results, index=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach is it the case that if we get here then the user passed something sketchy? i.e. should we do the deprecation here?

What should the "do XYZ instead" be?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're pointing at the return results line immediately above this and referring to #35725. Yes, the user passed something I would consider to be sketchy.

My understanding here is that we're a SeriesgGroupBy with results being a mapping from the groups to the result of a UDF. My opinion is that the result should be a Series whose index consists of the groups and values are DataFrames themselves. In other words, the user passed something to agg, so no matter what we get back we are going to treat it as a scalar. It might not be useful, but it is consistent, predictable, and easy to maintain. Optionally, we can emit a performance warning in the case of getting back a complex object.

I've been trying to do the deprecation for #35725 for a bit now, but ended up giving because of the way agg and apply cross paths. I am currently writing an issue for what I see being the way forward, hoping it will be ready at some point tonight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're pointing at the return results line immediately above this

I was actually pointing the 324, so I take it my original guess is wrong? We get to L325 if the user passed something "normal" and return on L323 if they passed something wonky?

Copy link
Member Author

@rhshadrach rhshadrach Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L323 the user definitely passed something wonky. L325 gets hit in both wonky and non-wonky cases, e.g.

gb = DataFrame({'key': [1, 1, 2, 2], 'a': [1, 2, 3, 4]}).groupby('key')['a']
gb.agg(['sum', 'mean']))  # non-wonky
gb.agg(['sum', 'cumsum']))  # wonky

both hit L325.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.agg with multiple cum functions creates wrong result
4 participants