Skip to content

ENH: pd.NamedAgg forwards *args and **kwargs to aggfunc #58852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ enhancement2

Other enhancements
^^^^^^^^^^^^^^^^^^
- :class:`pandas.NamedAgg` now forwards any ``*args`` and ``**kwargs`` to calls of ``aggfunc`` (:issue:`58283`)
- :class:`pandas.api.typing.FrozenList` is available for typing the outputs of :attr:`MultiIndex.names`, :attr:`MultiIndex.codes` and :attr:`MultiIndex.levels` (:issue:`58237`)
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`)
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
Expand Down
27 changes: 24 additions & 3 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,12 @@
ScalarResult = TypeVar("ScalarResult")


class NamedAgg(NamedTuple):
class _BaseNamedAgg(NamedTuple):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for introducing a new class here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because you cannot override new when inheriting from NamedTuple directly.

Copy link
Contributor Author

@tomytp tomytp May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the most elegant way I found to override it while maintaining the NamedTuple properties was to Introduce a new class. Do you have a suggestion for another method to do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reference error for when I tried to inherit directly from NamedTuple is:
AttributeError: Cannot overwrite NamedTuple attribute __new__

Copy link
Member

@WillAyd WillAyd May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's unfortunate. This seems kind of hacky as a result though - maybe we can just drop the NamedTuple subclass? I think that was mostly developed that way as a convenience rather than an intentional design decision - I am not sure why NamedAgg would need to be "tuple-ish"

@rhshadrach might have more thoughts

Copy link
Member

@rhshadrach rhshadrach Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python/cpython#77258 has some relevant information, especially the linked discussion there to the issue in python/typing.

In any case, it looks to me like this implementation just ignores args / kwargs when the provided aggfunc is not a callable? If that's the case, then I'm -1 on this approach anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach I'll rethink the approach to include the function names passed as string.

I'm not sure I understand your thoughts on the necessity of keeping NamedArgs with tuple behavior. Could you clarify that?

TY

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your thoughts on the necessity of keeping NamedArgs with tuple behavior. Could you clarify that?

I don't believe I shared any thoughts in this regard, just a link to some background information on why Python raises when you try to overwrite __new__.

column: Hashable
aggfunc: AggScalar


class NamedAgg(_BaseNamedAgg):
"""
Helper for column specific aggregation with control over output column names.

Expand All @@ -121,6 +126,10 @@ class NamedAgg(NamedTuple):
aggfunc : function or str
Function to apply to the provided column. If string, the name of a built-in
pandas function.
*args : tuple, optional
Args passed to aggfunc.
**kwargs : dict, optional
Kwargs passed to aggfunc.

Examples
--------
Expand All @@ -134,8 +143,20 @@ class NamedAgg(NamedTuple):
2 1 12.0
"""

column: Hashable
aggfunc: AggScalar
def __new__(cls, column, aggfunc, *args, **kwargs):
if not isinstance(aggfunc, str):
aggfunc = cls._get_wrapped_aggfunc(aggfunc, *args, **kwargs)
self = _BaseNamedAgg.__new__(cls, column, aggfunc)
return self

@staticmethod
def _get_wrapped_aggfunc(function, *initial_args, **initial_kwargs):
def wrapped_aggfunc(*new_args, **new_kwargs):
final_args = new_args + initial_args
final_kwargs = {**initial_kwargs, **new_kwargs}
return function(*final_args, **final_kwargs)

return wrapped_aggfunc


class SeriesGroupBy(GroupBy[Series]):
Expand Down
28 changes: 28 additions & 0 deletions pandas/tests/groupby/aggregate/test_aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -827,6 +827,34 @@ def test_agg_namedtuple(self):
expected = df.groupby("A").agg(b=("B", "sum"), c=("B", "count"))
tm.assert_frame_equal(result, expected)

def test_single_named_agg_with_args_and_kwargs(self):
df = DataFrame({"A": [0, 1, 2, 3], "B": [1, 2, 3, 4]})

def n_between(ser, low, high):
return ser.between(low, high).sum()

result = df.groupby("A").agg(n_between=pd.NamedAgg("B", n_between, 0, high=2))
expected = df.groupby("A").agg(n_between=("B", lambda x: x.between(0, 2).sum()))
tm.assert_frame_equal(result, expected)

def test_multiple_named_agg_with_args_and_kwargs(self):
df = DataFrame({"A": [0, 1, 2, 3], "B": [1, 2, 3, 4]})

def n_between(ser, low, high):
return ser.between(low, high).sum()

result = df.groupby("A").agg(
n_between01=pd.NamedAgg("B", n_between, 0, 1),
n_between13=pd.NamedAgg("B", n_between, 1, 3),
n_between02=pd.NamedAgg("B", n_between, 0, 2),
)
expected = df.groupby("A").agg(
n_between01=("B", lambda x: x.between(0, 1).sum()),
n_between13=("B", lambda x: x.between(0, 3).sum()),
n_between02=("B", lambda x: x.between(0, 2).sum()),
)
tm.assert_frame_equal(result, expected)

def test_mangled(self):
df = DataFrame({"A": [0, 1], "B": [1, 2], "C": [3, 4]})
result = df.groupby("A").agg(b=("B", lambda x: 0), c=("C", lambda x: 1))
Expand Down