-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Allow multiple lambdas in Groupby.aggregate #26905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow multiple lambdas in Groupby.aggregate #26905
Conversation
Codecov Report
@@ Coverage Diff @@
## master #26905 +/- ##
==========================================
- Coverage 92.04% 92% -0.04%
==========================================
Files 180 180
Lines 50714 50783 +69
==========================================
+ Hits 46679 46724 +45
- Misses 4035 4059 +24
Continue to review full report at Codecov.
|
Am I correct to understand that this also allows using Keyword aggregation with multiple lambdas as a way to avoid the automatic name mangling if one does wish so? Something like: df.groupby('A').agg(
foo=('B', lambda x: x.max() - x.min()),
bar=('B', lambda x: x.median() - x.mean()),
) Which results in a dataframe with 2 columns |
Right In [1]: import pandas as pd
In [2]: df = pd.DataFrame({"A": [1, 2]})
In [3]: df.groupby([1, 1]).agg(foo=('A', lambda x: x.max()), bar=("A", lambda x: x.min()))
Out[3]:
foo bar
1 2 2 On master, that raises with the SpecificationError about multiple lambdas. |
Great news! This issue is one of the main reasons I opened issue #18366 at the first place. Might I suggest including an example in the documentation as a way to skip automatic name mangling and control the output column names? Then that would be really great, because users can discover it as a replacement of the already working old behaviour which will disappear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. some doc and a question about the output format.
Updated
|
This made me suddenly remember a comment from someone in #18366 saying:
which lead me to the idea: maybe* when using lambda aggfuncs, using named aggregation could be made mandatory? This is probably not an acceptable idea now that we are that late in the iteration (I'm fine with the current pull request BTW), but I just wanted to ask your opinion on this (*: maybe for a future deprecation?) |
pandas/core/groupby/generic.py
Outdated
@@ -1710,3 +1715,97 @@ def _normalize_keyword_aggregation(kwargs): | |||
order.append((column, | |||
com.get_callable_name(aggfunc) or aggfunc)) | |||
return aggspec, columns, order | |||
|
|||
|
|||
def _make_lambda( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this? Seems like this could be done a lot more succinctly with a partial, i.e.:
f = functools.partial(func)
f.__name__ = "<lambda_{}>".format(i)
Could be done directly in loop so one less function and has the added benefit of maintaining other function attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you find it strange to use functools.partial
when you aren't partially applying any arguments? It's not where my mind initially went.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I can see where that is strange (effectively using partial
to copy the function) but personally find it to be less code and more readable than this wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the currently construction more informative (e.g. it has a doc-string and typing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @WillAyd.
pandas/core/groupby/generic.py
Outdated
@@ -1710,3 +1715,97 @@ def _normalize_keyword_aggregation(kwargs): | |||
order.append((column, | |||
com.get_callable_name(aggfunc) or aggfunc)) | |||
return aggspec, columns, order | |||
|
|||
|
|||
def _make_lambda( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you find it strange to use functools.partial
when you aren't partially applying any arguments? It's not where my mind initially went.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, minor comments.
pandas/core/groupby/generic.py
Outdated
@@ -1710,3 +1715,97 @@ def _normalize_keyword_aggregation(kwargs): | |||
order.append((column, | |||
com.get_callable_name(aggfunc) or aggfunc)) | |||
return aggspec, columns, order | |||
|
|||
|
|||
def _make_lambda( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the currently construction more informative (e.g. it has a doc-string and typing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments on types. otherwise lgtm. merge on green.
# -> typing.Sequence[Callable[..., ScalarResult]]: | ||
|
||
def _managle_lambda_list(aggfuncs): | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add Parameters & types
@@ -47,6 +48,10 @@ | |||
NamedAgg = namedtuple("NamedAgg", ["column", "aggfunc"]) | |||
# TODO(typing) the return value on this callable should be any *scalar*. | |||
AggScalar = Union[str, Callable[..., Any]] | |||
# TODO: validate types on ScalarResult and move to _typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make an issue for this and point to this code after we merge
@@ -208,6 +213,8 @@ def aggregate(self, func, *args, **kwargs): | |||
raise TypeError("Must provide 'func' or tuples of " | |||
"'(column, aggfunc).") | |||
|
|||
func = _maybe_mangle_lambdas(func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I don't think you actually need to do this here, rather put it around https://github.com/pandas-dev/pandas/pull/26905/files#diff-bfee1ba9e7cb79839776fac1a57ed940L810 and pull out the change you have in https://github.com/pandas-dev/pandas/pull/26905/files#diff-bfee1ba9e7cb79839776fac1a57ed940L832
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the one on L810 is SeriesGroupBy.aggregate. I think it's entirely separate from NDFramGroupBy.aggregate.
hmm, can you merge master again, have a conflict now. |
thanks @TomAugspurger very nice! |
I am able to run the I assume this isn't expected behaviour? Shall I create a new issue for this? Output of
|
@badge a new issue would be good. Can you note that "in |
Closes #26430