Skip to content

Commit 554774a

Browse files
DOC: fix RT03, ES01 for pandas.core.groupby.DataFrameGroupBy.agg
1 parent 2419343 commit 554774a

File tree

2 files changed

+175
-3
lines changed

2 files changed

+175
-3
lines changed

ci/code_checks.sh

-1
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
139139
-i "pandas.arrays.SparseArray PR07,SA01" \
140140
-i "pandas.arrays.TimedeltaArray PR07,SA01" \
141141
-i "pandas.core.groupby.DataFrameGroupBy.__iter__ RT03,SA01" \
142-
-i "pandas.core.groupby.DataFrameGroupBy.agg RT03" \
143142
-i "pandas.core.groupby.DataFrameGroupBy.aggregate RT03" \
144143
-i "pandas.core.groupby.DataFrameGroupBy.boxplot PR07,RT03,SA01" \
145144
-i "pandas.core.groupby.DataFrameGroupBy.get_group RT03,SA01" \

pandas/core/groupby/generic.py

+175-2
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,6 @@
6767
from pandas.core.groupby.groupby import (
6868
GroupBy,
6969
GroupByPlot,
70-
_agg_template_frame,
7170
_agg_template_series,
7271
_transform_template,
7372
)
@@ -1515,8 +1514,182 @@ class DataFrameGroupBy(GroupBy[DataFrame]):
15151514
"""
15161515
)
15171516

1518-
@doc(_agg_template_frame, examples=_agg_examples_doc, klass="DataFrame")
15191517
def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs):
1518+
"""
1519+
Aggregate using one or more operations.
1520+
1521+
This method allows the application of one or more aggregation operations
1522+
on groups of data within a DataFrameGroupBy object. The method also supports
1523+
the use of JIT-compiled functions via Numba, allowing optimized execution of
1524+
custom aggregations on group data.
1525+
1526+
Parameters
1527+
----------
1528+
func : function, str, list, dict or None
1529+
Function to use for aggregating the data. If a function, must either
1530+
work when passed a DataFrame or when passed to DataFrame.apply.
1531+
1532+
Accepted combinations are:
1533+
1534+
- function
1535+
- string function name
1536+
- list of functions and/or function names, e.g. ``[np.sum, 'mean']``
1537+
- dict of index labels -> functions, function names or list of such.
1538+
- None, in which case ``**kwargs`` are used with Named Aggregation. Here
1539+
the output has one column for each element in ``**kwargs``. The name of
1540+
the column is keyword, whereas the value determines the aggregation used
1541+
to compute the values in the column.
1542+
1543+
Can also accept a Numba JIT function with
1544+
``engine='numba'`` specified. Only passing a single function is supported
1545+
with this engine.
1546+
1547+
If the ``'numba'`` engine is chosen, the function must be
1548+
a user defined function with ``values`` and ``index`` as the
1549+
first and second arguments respectively in the function signature.
1550+
Each group's index will be passed to the user defined function
1551+
and optionally available for use.
1552+
1553+
*args
1554+
Positional arguments to pass to func.
1555+
engine : str, default None
1556+
* ``'cython'`` : Runs the function through C-extensions from cython.
1557+
* ``'numba'`` : Runs the function through JIT compiled code from numba.
1558+
* ``None`` : Defaults to ``'cython'`` or globally setting
1559+
``compute.use_numba``
1560+
1561+
engine_kwargs : dict, default None
1562+
* For ``'cython'`` engine, there are no accepted ``engine_kwargs``
1563+
* For ``'numba'`` engine, the engine can accept ``nopython``, ``nogil``
1564+
and ``parallel`` dictionary keys. The values must either be ``True`` or
1565+
``False``. The default ``engine_kwargs`` for the ``'numba'`` engine is
1566+
``{'nopython': True, 'nogil': False, 'parallel': False}`` and will be
1567+
applied to the function
1568+
1569+
**kwargs
1570+
* If ``func`` is None, ``**kwargs`` are used to define the output names and
1571+
aggregations via Named Aggregation. See ``func`` entry.
1572+
* Otherwise, keyword arguments to be passed into func.
1573+
1574+
Returns
1575+
-------
1576+
DataFrame
1577+
Aggregated DataFrame based on the grouping and the applied aggregation
1578+
functions.
1579+
1580+
See Also
1581+
--------
1582+
DataFrame.groupby.apply : Apply function func group-wise
1583+
and combine the results together.
1584+
DataFrame.groupby.transform : Transforms the Series on each group
1585+
based on the given function.
1586+
DataFrame.aggregate : Aggregate using one or more operations.
1587+
1588+
Notes
1589+
-----
1590+
When using ``engine='numba'``, there will be no "fall back" behavior
1591+
internally. The group data and group index will be passed as numpy arrays
1592+
to the JITed user defined function, and no alternative execution attempts
1593+
will be tried.
1594+
1595+
Functions that mutate the passed object can produce unexpected
1596+
behavior or errors and are not supported. See :ref:`gotchas.udf-mutation`
1597+
for more details.
1598+
1599+
.. versionchanged:: 1.3.0
1600+
1601+
The resulting dtype will reflect the return value of the passed ``func``,
1602+
see the examples below.
1603+
1604+
Examples
1605+
--------
1606+
>>> data = {
1607+
... "A": [1, 1, 2, 2],
1608+
... "B": [1, 2, 3, 4],
1609+
... "C": [0.362838, 0.227877, 1.267767, -0.562860],
1610+
... }
1611+
>>> df = pd.DataFrame(data)
1612+
>>> df
1613+
A B C
1614+
0 1 1 0.362838
1615+
1 1 2 0.227877
1616+
2 2 3 1.267767
1617+
3 2 4 -0.562860
1618+
1619+
The aggregation is for each column.
1620+
1621+
>>> df.groupby("A").agg("min")
1622+
B C
1623+
A
1624+
1 1 0.227877
1625+
2 3 -0.562860
1626+
1627+
Multiple aggregations
1628+
1629+
>>> df.groupby("A").agg(["min", "max"])
1630+
B C
1631+
min max min max
1632+
A
1633+
1 1 2 0.227877 0.362838
1634+
2 3 4 -0.562860 1.267767
1635+
1636+
Select a column for aggregation
1637+
1638+
>>> df.groupby("A").B.agg(["min", "max"])
1639+
min max
1640+
A
1641+
1 1 2
1642+
2 3 4
1643+
1644+
User-defined function for aggregation
1645+
1646+
>>> df.groupby("A").agg(lambda x: sum(x) + 2)
1647+
B C
1648+
A
1649+
1 5 2.590715
1650+
2 9 2.704907
1651+
1652+
Different aggregations per column
1653+
1654+
>>> df.groupby("A").agg({"B": ["min", "max"], "C": "sum"})
1655+
B C
1656+
min max sum
1657+
A
1658+
1 1 2 0.590715
1659+
2 3 4 0.704907
1660+
1661+
To control the output names with different aggregations per column,
1662+
pandas supports "named aggregation"
1663+
1664+
>>> df.groupby("A").agg(
1665+
... b_min=pd.NamedAgg(column="B", aggfunc="min"),
1666+
... c_sum=pd.NamedAgg(column="C", aggfunc="sum"),
1667+
... )
1668+
b_min c_sum
1669+
A
1670+
1 1 0.590715
1671+
2 3 0.704907
1672+
1673+
- The keywords are the *output* column names
1674+
- The values are tuples whose first element is the column to select
1675+
and the second element is the aggregation to apply to that column.
1676+
Pandas provides the ``pandas.NamedAgg`` namedtuple with the fields
1677+
``['column', 'aggfunc']`` to make it clearer what the arguments are.
1678+
As usual, the aggregation can be a callable or a string alias.
1679+
1680+
See :ref:`groupby.aggregate.named` for more.
1681+
1682+
.. versionchanged:: 1.3.0
1683+
1684+
The resulting dtype will reflect the return value of the aggregating
1685+
function.
1686+
1687+
>>> df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min())
1688+
B
1689+
A
1690+
1 1.0
1691+
2 3.0
1692+
"""
15201693
relabeling, func, columns, order = reconstruct_func(func, **kwargs)
15211694
func = maybe_mangle_lambdas(func)
15221695

0 commit comments

Comments
 (0)