Skip to content

Commit 1c4dacb

Browse files
authored
DEPR: deprecate relableling dicts in groupby.agg (pandas-dev#15931)
* DEPR: deprecate relabling dictionarys in groupby.agg
1 parent 7b8a6b1 commit 1c4dacb

File tree

12 files changed

+418
-121
lines changed

12 files changed

+418
-121
lines changed

doc/source/computation.rst

-8
Original file line numberDiff line numberDiff line change
@@ -610,14 +610,6 @@ aggregation with, outputting a DataFrame:
610610
611611
r['A'].agg([np.sum, np.mean, np.std])
612612
613-
If a dict is passed, the keys will be used to name the columns. Otherwise the
614-
function's name (stored in the function object) will be used.
615-
616-
.. ipython:: python
617-
618-
r['A'].agg({'result1' : np.sum,
619-
'result2' : np.mean})
620-
621613
On a widowed DataFrame, you can pass a list of functions to apply to each
622614
column, which produces an aggregated result with a hierarchical index:
623615

doc/source/groupby.rst

+22-10
Original file line numberDiff line numberDiff line change
@@ -502,31 +502,43 @@ index are the group names and whose values are the sizes of each group.
502502
Applying multiple functions at once
503503
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
504504

505-
With grouped Series you can also pass a list or dict of functions to do
505+
With grouped ``Series`` you can also pass a list or dict of functions to do
506506
aggregation with, outputting a DataFrame:
507507

508508
.. ipython:: python
509509
510510
grouped = df.groupby('A')
511511
grouped['C'].agg([np.sum, np.mean, np.std])
512512
513-
If a dict is passed, the keys will be used to name the columns. Otherwise the
514-
function's name (stored in the function object) will be used.
513+
On a grouped ``DataFrame``, you can pass a list of functions to apply to each
514+
column, which produces an aggregated result with a hierarchical index:
515515

516516
.. ipython:: python
517517
518-
grouped['D'].agg({'result1' : np.sum,
519-
'result2' : np.mean})
518+
grouped.agg([np.sum, np.mean, np.std])
520519
521-
On a grouped DataFrame, you can pass a list of functions to apply to each
522-
column, which produces an aggregated result with a hierarchical index:
520+
521+
The resulting aggregations are named for the functions themselves. If you
522+
need to rename, then you can add in a chained operation for a ``Series`` like this:
523523

524524
.. ipython:: python
525525
526-
grouped.agg([np.sum, np.mean, np.std])
526+
(grouped['C'].agg([np.sum, np.mean, np.std])
527+
.rename(columns={'sum': 'foo',
528+
'mean': 'bar',
529+
'std': 'baz'})
530+
)
531+
532+
For a grouped ``DataFrame``, you can rename in a similar manner:
533+
534+
.. ipython:: python
535+
536+
(grouped.agg([np.sum, np.mean, np.std])
537+
.rename(columns={'sum': 'foo',
538+
'mean': 'bar',
539+
'std': 'baz'})
540+
)
527541
528-
Passing a dict of functions has different behavior by default, see the next
529-
section.
530542
531543
Applying different functions to DataFrame columns
532544
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/timeseries.rst

-8
Original file line numberDiff line numberDiff line change
@@ -1549,14 +1549,6 @@ You can pass a list or dict of functions to do aggregation with, outputting a Da
15491549
15501550
r['A'].agg([np.sum, np.mean, np.std])
15511551
1552-
If a dict is passed, the keys will be used to name the columns. Otherwise the
1553-
function's name (stored in the function object) will be used.
1554-
1555-
.. ipython:: python
1556-
1557-
r['A'].agg({'result1' : np.sum,
1558-
'result2' : np.mean})
1559-
15601552
On a resampled DataFrame, you can pass a list of functions to apply to each
15611553
column, which produces an aggregated result with a hierarchical index:
15621554

doc/source/whatsnew/v0.20.0.txt

+82
Original file line numberDiff line numberDiff line change
@@ -456,6 +456,88 @@ Convert to an xarray DataArray
456456

457457
p.to_xarray()
458458

459+
.. _whatsnew_0200.api_breaking.deprecate_group_agg_dict:
460+
461+
Deprecate groupby.agg() with a dictionary when renaming
462+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
463+
464+
The ``.groupby(..).agg(..)``, ``.rolling(..).agg(..)``, and ``.resample(..).agg(..)`` syntax can accept a variable of inputs, including scalars,
465+
list, and a dict of column names to scalars or lists. This provides a useful syntax for constructing multiple
466+
(potentially different) aggregations.
467+
468+
However, ``.agg(..)`` can *also* accept a dict that allows 'renaming' of the result columns. This is a complicated and confusing syntax, as well as not consistent
469+
between ``Series`` and ``DataFrame``. We are deprecating this 'renaming' functionaility.
470+
471+
1) We are deprecating passing a dict to a grouped/rolled/resampled ``Series``. This allowed
472+
one to ``rename`` the resulting aggregation, but this had a completely different
473+
meaning than passing a dictionary to a grouped ``DataFrame``, which accepts column-to-aggregations.
474+
2) We are deprecating passing a dict-of-dicts to a grouped/rolled/resampled ``DataFrame`` in a similar manner.
475+
476+
This is an illustrative example:
477+
478+
.. ipython:: python
479+
480+
df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
481+
'B': range(5),
482+
'C': range(5)})
483+
df
484+
485+
Here is a typical useful syntax for computing different aggregations for different columns. This
486+
is a natural (and useful) syntax. We aggregate from the dict-to-list by taking the specified
487+
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns.
488+
489+
.. ipython:: python
490+
491+
df.groupby('A').agg({'B': 'sum', 'C': 'min'})
492+
493+
Here's an example of the first deprecation (1), passing a dict to a grouped ``Series``. This
494+
is a combination aggregation & renaming:
495+
496+
.. code-block:: ipython
497+
498+
In [6]: df.groupby('A').B.agg({'foo': 'count'})
499+
FutureWarning: using a dict on a Series for aggregation
500+
is deprecated and will be removed in a future version
501+
502+
Out[6]:
503+
foo
504+
A
505+
1 3
506+
2 2
507+
508+
You can accomplish the same operation, more idiomatically by:
509+
510+
.. ipython:: python
511+
512+
df.groupby('A').B.agg(['count']).rename({'count': 'foo'})
513+
514+
515+
Here's an example of the second deprecation (2), passing a dict-of-dict to a grouped ``DataFrame``:
516+
517+
.. code-block:: python
518+
519+
In [23]: (df.groupby('A')
520+
.agg({'B': {'foo': 'sum'}, 'C': {'bar': 'min'}})
521+
)
522+
FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
523+
524+
Out[23]:
525+
B C
526+
foo bar
527+
A
528+
1 3 0
529+
2 7 3
530+
531+
532+
You can accomplish nearly the same by:
533+
534+
.. ipython:: python
535+
536+
(df.groupby('A')
537+
.agg({'B': 'sum', 'C': 'min'})
538+
.rename(columns={'B': 'foo', 'C': 'bar'})
539+
)
540+
459541
.. _whatsnew.api_breaking.io_compat:
460542

461543
Possible incompat for HDF5 formats for pandas < 0.13.0

0 commit comments

Comments
 (0)