Skip to content

DEPR: Passing a dictionary to SeriesGroupBy.agg #52268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 31, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ Deprecations
- Deprecated 'method', 'limit', and 'fill_axis' keywords in :meth:`DataFrame.align` and :meth:`Series.align`, explicitly call ``fillna`` on the alignment results instead (:issue:`51856`)
- Deprecated 'broadcast_axis' keyword in :meth:`Series.align` and :meth:`DataFrame.align`, upcast before calling ``align`` with ``left = DataFrame({col: left for col in right.columns}, index=right.index)`` (:issue:`51856`)
- Deprecated the 'axis' keyword in :meth:`.GroupBy.idxmax`, :meth:`.GroupBy.idxmin`, :meth:`.GroupBy.fillna`, :meth:`.GroupBy.take`, :meth:`.GroupBy.skew`, :meth:`.GroupBy.rank`, :meth:`.GroupBy.cumprod`, :meth:`.GroupBy.cumsum`, :meth:`.GroupBy.cummax`, :meth:`.GroupBy.cummin`, :meth:`.GroupBy.pct_change`, :meth:`GroupBy.diff`, :meth:`.GroupBy.shift`, and :meth:`DataFrameGroupBy.corrwith`; for ``axis=1`` operate on the underlying :class:`DataFrame` instead (:issue:`50405`, :issue:`51046`)
- Deprecated passing a dictionary to :meth:`.SeriesGroupBy.agg`; pass a list of aggregations instead (:issue:`50684`)
- Deprecated :meth:`DataFrame.swapaxes` and :meth:`Series.swapaxes`, use :meth:`DataFrame.transpose` or :meth:`Series.transpose` instead (:issue:`51946`)
-

Expand Down
17 changes: 14 additions & 3 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@
from pandas.core.groupby.groupby import (
GroupBy,
GroupByPlot,
_agg_template,
_agg_template_frame,
_agg_template_series,
_apply_docs,
_transform_template,
)
Expand Down Expand Up @@ -214,7 +215,7 @@ def _get_data_to_aggregate(
def apply(self, func, *args, **kwargs) -> Series:
return super().apply(func, *args, **kwargs)

@doc(_agg_template, examples=_agg_examples_doc, klass="Series")
@doc(_agg_template_series, examples=_agg_examples_doc, klass="Series")
def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs):
if maybe_use_numba(engine):
return self._aggregate_with_numba(
Expand Down Expand Up @@ -306,6 +307,16 @@ def _aggregate_multiple_funcs(self, arg, *args, **kwargs) -> DataFrame:
raise SpecificationError("nested renamer is not supported")
else:
# GH#50684 - This accidentally worked in 1.x
msg = (
"Passing a dictionary to SeriesGroupBy.agg is deprecated "
"and will raise in a future version of pandas. Pass a list "
"of aggregations instead."
)
warnings.warn(
message=msg,
category=FutureWarning,
stacklevel=find_stack_level(),
)
arg = list(arg.items())
elif any(isinstance(x, (tuple, list)) for x in arg):
arg = [(x, x) if not isinstance(x, (tuple, list)) else x for x in arg]
Expand Down Expand Up @@ -1291,7 +1302,7 @@ class DataFrameGroupBy(GroupBy[DataFrame]):
"""
)

@doc(_agg_template, examples=_agg_examples_doc, klass="DataFrame")
@doc(_agg_template_frame, examples=_agg_examples_doc, klass="DataFrame")
def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs):
if maybe_use_numba(engine):
return self._aggregate_with_numba(
Expand Down
89 changes: 88 additions & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,94 @@ class providing the base-class of operations.
--------
%(example)s"""

_agg_template = """
_agg_template_series = """
Aggregate using one or more operations over the specified axis.

Parameters
----------
func : function, str, list, dict or None
Function to use for aggregating the data. If a function, must either
work when passed a {klass} or when passed to {klass}.apply.

Accepted combinations are:

- function
- string function name
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
- dict of axis labels -> functions, function names or list of such.
Copy link
Member

@mroeschke mroeschke Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this line specifically because passing dicts here is being deprecated?

Copy link
Member Author

@rhshadrach rhshadrach Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most cases, my preference would be to have the User Guide reflect how we would like users to use pandas (so deprecations are to be removed) and the API docs to be an accurate reflection of the current arguments/behavior (so deprecations should not be removed). If there are other opinions here, maybe we should make an issue to discuss.

However, in this case dict already doesn't work for SeriesGroupBy.agg when as_index=True (the default), so this one is a bit of a grey area. I'm good with removing here and will update.

- None, in which case ``**kwargs`` are used with Named Aggregation. Here the
output has one column for each element in ``**kwargs``. The name of the
column is keyword, whereas the value determines the aggregation used to compute
the values in the column.

Can also accept a Numba JIT function with
``engine='numba'`` specified. Only passing a single function is supported
with this engine.

If the ``'numba'`` engine is chosen, the function must be
a user defined function with ``values`` and ``index`` as the
first and second arguments respectively in the function signature.
Each group's index will be passed to the user defined function
and optionally available for use.

.. versionchanged:: 1.1.0

.. deprecated:: 2.1.0

Passing a dictionary is deprecated and will raise in a future version
of pandas. Pass a list of aggregations instead.
*args
Positional arguments to pass to func.
engine : str, default None
* ``'cython'`` : Runs the function through C-extensions from cython.
* ``'numba'`` : Runs the function through JIT compiled code from numba.
* ``None`` : Defaults to ``'cython'`` or globally setting ``compute.use_numba``

.. versionadded:: 1.1.0
engine_kwargs : dict, default None
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
and ``parallel`` dictionary keys. The values must either be ``True`` or
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
``{{'nopython': True, 'nogil': False, 'parallel': False}}`` and will be
applied to the function

.. versionadded:: 1.1.0
**kwargs
* If ``func`` is None, ``**kwargs`` are used to define the output names and
aggregations via Named Aggregation. See ``func`` entry.
* Otherwise, keyword arguments to be passed into func.

Returns
-------
{klass}

See Also
--------
{klass}.groupby.apply : Apply function func group-wise
and combine the results together.
{klass}.groupby.transform : Transforms the Series on each group
based on the given function.
{klass}.aggregate : Aggregate using one or more
operations over the specified axis.

Notes
-----
When using ``engine='numba'``, there will be no "fall back" behavior internally.
The group data and group index will be passed as numpy arrays to the JITed
user defined function, and no alternative execution attempts will be tried.

Functions that mutate the passed object can produce unexpected
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
for more details.

.. versionchanged:: 1.3.0

The resulting dtype will reflect the return value of the passed ``func``,
see the examples below.
{examples}"""

_agg_template_frame = """
Aggregate using one or more operations over the specified axis.

Parameters
Expand Down
4 changes: 3 additions & 1 deletion pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -727,7 +727,9 @@ def test_groupby_as_index_agg(df):

expected3 = grouped["C"].sum()
expected3 = DataFrame(expected3).rename(columns={"C": "Q"})
result3 = grouped["C"].agg({"Q": np.sum})
msg = "Passing a dictionary to SeriesGroupBy.agg is deprecated"
with tm.assert_produces_warning(FutureWarning, match=msg):
result3 = grouped["C"].agg({"Q": np.sum})
tm.assert_frame_equal(result3, expected3)

# GH7115 & GH8112 & GH8582
Expand Down
4 changes: 3 additions & 1 deletion pandas/tests/groupby/test_grouping.py
Original file line number Diff line number Diff line change
Expand Up @@ -486,7 +486,9 @@ def test_multifunc_select_col_integer_cols(self, df):
df.columns = np.arange(len(df.columns))

# it works!
df.groupby(1, as_index=False)[2].agg({"Q": np.mean})
msg = "Passing a dictionary to SeriesGroupBy.agg is deprecated"
with tm.assert_produces_warning(FutureWarning, match=msg):
df.groupby(1, as_index=False)[2].agg({"Q": np.mean})

def test_multiindex_columns_empty_level(self):
lst = [["count", "values"], ["to filter", ""]]
Expand Down
3 changes: 2 additions & 1 deletion scripts/validate_unwanted_patterns.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@
"_new_Index",
"_new_PeriodIndex",
"_doc_template",
"_agg_template",
"_agg_template_series",
"_agg_template_frame",
"_pipe_template",
"__main__",
"_transform_template",
Expand Down