From b7ee04919a555af5c89ef4803f9e4dd2f112ccb8 Mon Sep 17 00:00:00 2001
From: Richard Shadrach <45562402+rhshadrach@users.noreply.github.com>
Date: Wed, 8 Mar 2023 16:35:56 -0500
Subject: [PATCH] Backport PR #51704: DOC: Improve groupby in the User Guide

---
 doc/source/user_guide/groupby.rst    | 536 ++++++++++++++++-----------
 doc/source/user_guide/timeseries.rst |   2 +-
 doc/source/whatsnew/v0.7.0.rst       |   2 +-
 3 files changed, 330 insertions(+), 210 deletions(-)

diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
index 15baedbac31ba..b0aafbc22562e 100644
--- a/doc/source/user_guide/groupby.rst
+++ b/doc/source/user_guide/groupby.rst
@@ -478,41 +478,71 @@ Or for an object grouped on multiple columns:
 Aggregation
 -----------
 
-Once the GroupBy object has been created, several methods are available to
-perform a computation on the grouped data. These operations are similar to the
-:ref:`aggregating API <basics.aggregate>`, :ref:`window API <window.overview>`,
-and :ref:`resample API <timeseries.aggregate>`.
-
-An obvious one is aggregation via the
-:meth:`~pandas.core.groupby.DataFrameGroupBy.aggregate` or equivalently
-:meth:`~pandas.core.groupby.DataFrameGroupBy.agg` method:
+An aggregation is a GroupBy operation that reduces the dimension of the grouping
+object. The result of an aggregation is, or at least is treated as,
+a scalar value for each column in a group. For example, producing the sum of each
+column in a group of values.
 
 .. ipython:: python
 
-   grouped = df.groupby("A")
-   grouped[["C", "D"]].aggregate(np.sum)
-
-   grouped = df.groupby(["A", "B"])
-   grouped.aggregate(np.sum)
+   animals = pd.DataFrame(
+       {
+           "kind": ["cat", "dog", "cat", "dog"],
+           "height": [9.1, 6.0, 9.5, 34.0],
+           "weight": [7.9, 7.5, 9.9, 198.0],
+       }
+   )
+   animals
+   animals.groupby("kind").sum()
 
-As you can see, the result of the aggregation will have the group names as the
-new index along the grouped axis. In the case of multiple keys, the result is a
-:ref:`MultiIndex <advanced.hierarchical>` by default, though this can be
-changed by using the ``as_index`` option:
+In the result, the keys of the groups appear in the index by default. They can be
+instead included in the columns by passing ``as_index=False``.
 
 .. ipython:: python
 
-   grouped = df.groupby(["A", "B"], as_index=False)
-   grouped.aggregate(np.sum)
+   animals.groupby("kind", as_index=False).sum()
 
-   df.groupby("A", as_index=False)[["C", "D"]].sum()
+.. _groupby.aggregate.builtin:
 
-Note that you could use the ``reset_index`` DataFrame function to achieve the
-same result as the column names are stored in the resulting ``MultiIndex``:
+Built-in aggregation methods
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. ipython:: python
+Many common aggregations are built-in to GroupBy objects as methods. Of the methods
+listed below, those with a ``*`` do *not* have a Cython-optimized implementation.
 
-   df.groupby(["A", "B"]).sum().reset_index()
+.. csv-table::
+    :header: "Method", "Description"
+    :widths: 20, 80
+    :delim: ;
+
+        :meth:`~.DataFrameGroupBy.any`;Compute whether any of the values in the groups are truthy
+        :meth:`~.DataFrameGroupBy.all`;Compute whether all of the values in the groups are truthy
+        :meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
+        :meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
+        :meth:`~.DataFrameGroupBy.first`;Compute the first occurring value in each group
+        :meth:`~.DataFrameGroupBy.idxmax` *;Compute the index of the maximum value in each group
+        :meth:`~.DataFrameGroupBy.idxmin` *;Compute the index of the minimum value in each group
+        :meth:`~.DataFrameGroupBy.last`;Compute the last occurring value in each group
+        :meth:`~.DataFrameGroupBy.max`;Compute the maximum value in each group
+        :meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
+        :meth:`~.DataFrameGroupBy.median`;Compute the median of each group
+        :meth:`~.DataFrameGroupBy.min`;Compute the minimum value in each group
+        :meth:`~.DataFrameGroupBy.nunique`;Compute the number of unique values in each group
+        :meth:`~.DataFrameGroupBy.prod`;Compute the product of the values in each group
+        :meth:`~.DataFrameGroupBy.quantile`;Compute a given quantile of the values in each group
+        :meth:`~.DataFrameGroupBy.sem`;Compute the standard error of the mean of the values in each group
+        :meth:`~.DataFrameGroupBy.size`;Compute the number of values in each group
+        :meth:`~.DataFrameGroupBy.skew` *;Compute the skew of the values in each group
+        :meth:`~.DataFrameGroupBy.std`;Compute the standard deviation of the values in each group
+        :meth:`~.DataFrameGroupBy.sum`;Compute the sum of the values in each group
+        :meth:`~.DataFrameGroupBy.var`;Compute the variance of the values in each group
+
+Some examples:
+
+.. ipython:: python
+
+   df.groupby("A")[["C", "D"]].max()
+   df.groupby(["A", "B"]).mean()
 
 Another simple aggregation example is to compute the size of each group.
 This is included in GroupBy as the ``size`` method. It returns a Series whose
@@ -520,13 +550,20 @@ index are the group names and whose values are the sizes of each group.
 
 .. ipython:: python
 
+   grouped = df.groupby(["A", "B"])
    grouped.size()
 
+While the :meth:`~.DataFrameGroupBy.describe` method is not itself a reducer, it
+can be used to conveniently produce a collection of summary statistics about each of
+the groups.
+
 .. ipython:: python
 
    grouped.describe()
 
-Another aggregation example is to compute the number of unique values of each group. This is similar to the ``value_counts`` function, except that it only counts unique values.
+Another aggregation example is to compute the number of unique values of each group.
+This is similar to the ``value_counts`` function, except that it only counts the
+number of unique values.
 
 .. ipython:: python
 
@@ -538,40 +575,84 @@ Another aggregation example is to compute the number of unique values of each gr
 .. note::
 
    Aggregation functions **will not** return the groups that you are aggregating over
-   if they are named *columns*, when ``as_index=True``, the default. The grouped columns will
+   as named *columns*, when ``as_index=True``, the default. The grouped columns will
    be the **indices** of the returned object.
 
    Passing ``as_index=False`` **will** return the groups that you are aggregating over, if they are
-   named *columns*.
+   named **indices** or *columns*.
 
-Aggregating functions are the ones that reduce the dimension of the returned objects.
-Some common aggregating functions are tabulated below:
 
-.. csv-table::
-    :header: "Function", "Description"
-    :widths: 20, 80
-    :delim: ;
+.. _groupby.aggregate.agg:
+
+The :meth:`~.DataFrameGroupBy.aggregate` method
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. note::
+    The :meth:`~.DataFrameGroupBy.aggregate` method can accept many different types of
+    inputs. This section details using string aliases for various GroupBy methods; other
+    inputs are detailed in the sections below.
+
+Any reduction method that pandas implements can be passed as a string to
+:meth:`~.DataFrameGroupBy.aggregate`. Users are encouraged to use the shorthand,
+``agg``. It will operate as if the corresponding method was called.
+
+.. ipython:: python
 
-        :meth:`~pd.core.groupby.DataFrameGroupBy.mean`;Compute mean of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.sum`;Compute sum of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.size`;Compute group sizes
-        :meth:`~pd.core.groupby.DataFrameGroupBy.count`;Compute count of group
-        :meth:`~pd.core.groupby.DataFrameGroupBy.std`;Standard deviation of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.var`;Compute variance of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.sem`;Standard error of the mean of groups
-        :meth:`~pd.core.groupby.DataFrameGroupBy.describe`;Generates descriptive statistics
-        :meth:`~pd.core.groupby.DataFrameGroupBy.first`;Compute first of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.last`;Compute last of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.nth`;Take nth value, or a subset if n is a list
-        :meth:`~pd.core.groupby.DataFrameGroupBy.min`;Compute min of group values
-        :meth:`~pd.core.groupby.DataFrameGroupBy.max`;Compute max of group values
-
-
-The aggregating functions above will exclude NA values. Any function which
-reduces a :class:`Series` to a scalar value is an aggregation function and will work,
-a trivial example is ``df.groupby('A').agg(lambda ser: 1)``. Note that
-:meth:`~pd.core.groupby.DataFrameGroupBy.nth` can act as a reducer *or* a
-filter, see :ref:`here <groupby.nth>`.
+   grouped = df.groupby("A")
+   grouped[["C", "D"]].aggregate("sum")
+
+   grouped = df.groupby(["A", "B"])
+   grouped.agg("sum")
+
+The result of the aggregation will have the group names as the
+new index along the grouped axis. In the case of multiple keys, the result is a
+:ref:`MultiIndex <advanced.hierarchical>` by default. As mentioned above, this can be
+changed by using the ``as_index`` option:
+
+.. ipython:: python
+
+   grouped = df.groupby(["A", "B"], as_index=False)
+   grouped.agg("sum")
+
+   df.groupby("A", as_index=False)[["C", "D"]].agg("sum")
+
+Note that you could use the :meth:`DataFrame.reset_index` DataFrame function to achieve
+the same result as the column names are stored in the resulting ``MultiIndex``, although
+this will make an extra copy.
+
+.. ipython:: python
+
+   df.groupby(["A", "B"]).agg("sum").reset_index()
+
+.. _groupby.aggregate.udf:
+
+Aggregation with User-Defined Functions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Users can also provide their own User-Defined Functions (UDFs) for custom aggregations.
+
+.. warning::
+
+    When aggregating with a UDF, the UDF should not mutate the
+    provided ``Series``. See :ref:`gotchas.udf-mutation` for more information.
+
+.. note::
+
+    Aggregating with a UDF is often less performant than using
+    the pandas built-in methods on GroupBy. Consider breaking up a complex operation
+    into a chain of operations that utilize the built-in methods.
+
+.. ipython:: python
+
+   animals
+   animals.groupby("kind")[["height"]].agg(lambda x: set(x))
+
+The resulting dtype will reflect that of the aggregating function. If the results from different groups have
+different dtypes, then a common dtype will be determined in the same way as ``DataFrame`` construction.
+
+.. ipython:: python
+
+   animals.groupby("kind")[["height"]].agg(lambda x: x.astype(int).sum())
 
 .. _groupby.aggregate.multifunc:
 
@@ -584,24 +665,24 @@ aggregation with, outputting a DataFrame:
 .. ipython:: python
 
    grouped = df.groupby("A")
-   grouped["C"].agg([np.sum, np.mean, np.std])
+   grouped["C"].agg(["sum", "mean", "std"])
 
 On a grouped ``DataFrame``, you can pass a list of functions to apply to each
 column, which produces an aggregated result with a hierarchical index:
 
 .. ipython:: python
 
-   grouped[["C", "D"]].agg([np.sum, np.mean, np.std])
+   grouped[["C", "D"]].agg(["sum", "mean", "std"])
 
 
-The resulting aggregations are named for the functions themselves. If you
+The resulting aggregations are named after the functions themselves. If you
 need to rename, then you can add in a chained operation for a ``Series`` like this:
 
 .. ipython:: python
 
    (
        grouped["C"]
-       .agg([np.sum, np.mean, np.std])
+       .agg(["sum", "mean", "std"])
        .rename(columns={"sum": "foo", "mean": "bar", "std": "baz"})
    )
 
@@ -610,24 +691,23 @@ For a grouped ``DataFrame``, you can rename in a similar manner:
 .. ipython:: python
 
    (
-       grouped[["C", "D"]].agg([np.sum, np.mean, np.std]).rename(
+       grouped[["C", "D"]].agg(["sum", "mean", "std"]).rename(
            columns={"sum": "foo", "mean": "bar", "std": "baz"}
        )
    )
 
 .. note::
 
-   In general, the output column names should be unique. You can't apply
-   the same function (or two functions with the same name) to the same
+   In general, the output column names should be unique, but pandas will allow
+   you apply to the same function (or two functions with the same name) to the same
    column.
 
    .. ipython:: python
-      :okexcept:
 
       grouped["C"].agg(["sum", "sum"])
 
 
-   pandas *does* allow you to provide multiple lambdas. In this case, pandas
+   pandas also allows you to provide multiple lambdas. In this case, pandas
    will mangle the name of the (nameless) lambda functions, appending ``_<i>``
    to each subsequent lambda.
 
@@ -636,72 +716,58 @@ For a grouped ``DataFrame``, you can rename in a similar manner:
       grouped["C"].agg([lambda x: x.max() - x.min(), lambda x: x.median() - x.mean()])
 
 
-
 .. _groupby.aggregate.named:
 
 Named aggregation
 ~~~~~~~~~~~~~~~~~
 
 To support column-specific aggregation *with control over the output column names*, pandas
-accepts the special syntax in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg`, known as "named aggregation", where
+accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg`, known as "named aggregation", where
 
 - The keywords are the *output* column names
 - The values are tuples whose first element is the column to select
   and the second element is the aggregation to apply to that column. pandas
-  provides the ``pandas.NamedAgg`` namedtuple with the fields ``['column', 'aggfunc']``
+  provides the :class:`NamedAgg` namedtuple with the fields ``['column', 'aggfunc']``
   to make it clearer what the arguments are. As usual, the aggregation can
   be a callable or a string alias.
 
 .. ipython:: python
 
-   animals = pd.DataFrame(
-       {
-           "kind": ["cat", "dog", "cat", "dog"],
-           "height": [9.1, 6.0, 9.5, 34.0],
-           "weight": [7.9, 7.5, 9.9, 198.0],
-       }
-   )
    animals
 
    animals.groupby("kind").agg(
        min_height=pd.NamedAgg(column="height", aggfunc="min"),
        max_height=pd.NamedAgg(column="height", aggfunc="max"),
-       average_weight=pd.NamedAgg(column="weight", aggfunc=np.mean),
+       average_weight=pd.NamedAgg(column="weight", aggfunc="mean"),
    )
 
 
-``pandas.NamedAgg`` is just a ``namedtuple``. Plain tuples are allowed as well.
+:class:`NamedAgg` is just a ``namedtuple``. Plain tuples are allowed as well.
 
 .. ipython:: python
 
    animals.groupby("kind").agg(
        min_height=("height", "min"),
        max_height=("height", "max"),
-       average_weight=("weight", np.mean),
+       average_weight=("weight", "mean"),
    )
 
 
-If your desired output column names are not valid Python keywords, construct a dictionary
+If the column names you want are not valid Python keywords, construct a dictionary
 and unpack the keyword arguments
 
 .. ipython:: python
 
    animals.groupby("kind").agg(
        **{
-           "total weight": pd.NamedAgg(column="weight", aggfunc=sum)
+           "total weight": pd.NamedAgg(column="weight", aggfunc="sum")
        }
    )
 
-Additional keyword arguments are not passed through to the aggregation functions. Only pairs
+When using named aggregation, additional keyword arguments are not passed through
+to the aggregation functions; only pairs
 of ``(column, aggfunc)`` should be passed as ``**kwargs``. If your aggregation functions
-requires additional arguments, partially apply them with :meth:`functools.partial`.
-
-.. note::
-
-   For Python 3.5 and earlier, the order of ``**kwargs`` in a functions was not
-   preserved. This means that the output column ordering would not be
-   consistent. To ensure consistent ordering, the keys (and so output columns)
-   will always be sorted for Python 3.5.
+require additional arguments, apply them partially with :meth:`functools.partial`.
 
 Named aggregation is also valid for Series groupby aggregations. In this case there's
 no column selection, so the values are just the functions.
@@ -721,59 +787,98 @@ columns of a DataFrame:
 
 .. ipython:: python
 
-   grouped.agg({"C": np.sum, "D": lambda x: np.std(x, ddof=1)})
+   grouped.agg({"C": "sum", "D": lambda x: np.std(x, ddof=1)})
 
 The function names can also be strings. In order for a string to be valid it
-must be either implemented on GroupBy or available via :ref:`dispatching
-<groupby.dispatch>`:
+must be implemented on GroupBy:
 
 .. ipython:: python
 
    grouped.agg({"C": "sum", "D": "std"})
 
-.. _groupby.aggregate.cython:
+.. _groupby.transform:
 
-Cython-optimized aggregation functions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Transformation
+--------------
 
-Some common aggregations, currently only ``sum``, ``mean``, ``std``, and ``sem``, have
-optimized Cython implementations:
+A transformation is a GroupBy operation whose result is indexed the same
+as the one being grouped. Common examples include :meth:`~.DataFrameGroupBy.cumsum` and
+:meth:`~.DataFrameGroupBy.diff`.
 
 .. ipython:: python
 
-   df.groupby("A")[["C", "D"]].sum()
-   df.groupby(["A", "B"]).mean()
+    speeds
+    grouped = speeds.groupby("class")["max_speed"]
+    grouped.cumsum()
+    grouped.diff()
 
-Of course ``sum`` and ``mean`` are implemented on pandas objects, so the above
-code would work even without the special versions via dispatching (see below).
+Unlike aggregations, the groupings that are used to split
+the original object are not included in the result.
 
-.. _groupby.aggregate.udfs:
+.. note::
 
-Aggregations with User-Defined Functions
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Since transformations do not include the groupings that are used to split the result,
+    the arguments ``as_index`` and ``sort`` in :meth:`DataFrame.groupby` and
+    :meth:`Series.groupby` have no effect.
 
-Users can also provide their own functions for custom aggregations. When aggregating
-with a User-Defined Function (UDF), the UDF should not mutate the provided ``Series``, see
-:ref:`gotchas.udf-mutation` for more information.
+A common use of a transformation is to add the result back into the original DataFrame.
 
 .. ipython:: python
 
-   animals.groupby("kind")[["height"]].agg(lambda x: set(x))
+    result = speeds.copy()
+    result["cumsum"] = grouped.cumsum()
+    result["diff"] = grouped.diff()
+    result
 
-The resulting dtype will reflect that of the aggregating function. If the results from different groups have
-different dtypes, then a common dtype will be determined in the same way as ``DataFrame`` construction.
+Built-in transformation methods
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. ipython:: python
+The following methods on GroupBy act as transformations. Of these methods, only
+``fillna`` does not have a Cython-optimized implementation.
 
-   animals.groupby("kind")[["height"]].agg(lambda x: x.astype(int).sum())
+.. csv-table::
+    :header: "Method", "Description"
+    :widths: 20, 80
+    :delim: ;
 
-.. _groupby.transform:
+        :meth:`~.DataFrameGroupBy.bfill`;Back fill NA values within each group
+        :meth:`~.DataFrameGroupBy.cumcount`;Compute the cumulative count within each group
+        :meth:`~.DataFrameGroupBy.cummax`;Compute the cumulative max within each group
+        :meth:`~.DataFrameGroupBy.cummin`;Compute the cumulative min within each group
+        :meth:`~.DataFrameGroupBy.cumprod`;Compute the cumulative product within each group
+        :meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group
+        :meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group
+        :meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group
+        :meth:`~.DataFrameGroupBy.fillna`;Fill NA values within each group
+        :meth:`~.DataFrameGroupBy.pct_change`;Compute the percent change between adjacent values within each group
+        :meth:`~.DataFrameGroupBy.rank`;Compute the rank of each value within each group
+        :meth:`~.DataFrameGroupBy.shift`;Shift values up or down within each group
 
-Transformation
---------------
+In addition, passing any built-in aggregation method as a string to
+:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result
+across the group, producing a transformed result. If the aggregation method is
+Cython-optimized, this will be performant as well.
+
+.. _groupby.transformation.transform:
 
-The ``transform`` method returns an object that is indexed the same
-as the one being grouped. The transform function must:
+The :meth:`~.DataFrameGroupBy.transform` method
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Similar to the :ref:`aggregation method <groupby.aggregate.agg>`, the
+:meth:`~.DataFrameGroupBy.transform` method can accept string aliases to the built-in
+transformation methods in the previous section. It can *also* accept string aliases to
+the built-in aggregation methods. When an aggregation method is provided, the result
+will be broadcast across the group.
+
+.. ipython:: python
+
+    speeds
+    grouped = speeds.groupby("class")[["max_speed"]]
+    grouped.transform("cumsum")
+    grouped.transform("sum")
+
+In addition to string aliases, the :meth:`~.DataFrameGroupBy.transform` method can
+also except User-Defined functions (UDFs). The UDF must:
 
 * Return a result that is either the same size as the group chunk or
   broadcastable to the size of the group chunk (e.g., a scalar,
@@ -782,22 +887,33 @@ as the one being grouped. The transform function must:
   the first group chunk using chunk.apply.
 * Not perform in-place operations on the group chunk. Group chunks should
   be treated as immutable, and changes to a group chunk may produce unexpected
-  results.
-* (Optionally) operates on the entire group chunk. If this is supported, a
-  fast path is used starting from the *second* chunk.
+  results. See :ref:`gotchas.udf-mutation` for more information.
+* (Optionally) operates on all columns of the entire group chunk at once. If this is
+  supported, a fast path is used starting from the *second* chunk.
+
+.. note::
+
+    Transforming by supplying ``transform`` with a UDF is
+    often less performant than using the built-in methods on GroupBy.
+    Consider breaking up a complex operation into a chain of operations that utilize
+    the built-in methods.
+
+    All of the examples in this section can be made more performant by calling
+    built-in methods instead of using ``transform``.
+    See :ref:`below for examples <groupby_efficient_transforms>`.
 
 .. versionchanged:: 2.0.0
 
     When using ``.transform`` on a grouped DataFrame and the transformation function
     returns a DataFrame, pandas now aligns the result's index
-    with the input's index. You can call ``.to_numpy()`` on the
-    result of the transformation function to avoid alignment.
+    with the input's index. You can call ``.to_numpy()`` within the transformation
+    function to avoid alignment.
 
-Similar to :ref:`groupby.aggregate.udfs`, the resulting dtype will reflect that of the
+Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the
 transformation function. If the results from different groups have different dtypes, then
 a common dtype will be determined in the same way as ``DataFrame`` construction.
 
-Suppose we wished to standardize the data within each group:
+Suppose we wish to standardize the data within each group:
 
 .. ipython:: python
 
@@ -844,15 +960,6 @@ match the shape of the input array.
 
    ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
 
-Alternatively, the built-in methods could be used to produce the same outputs.
-
-.. ipython:: python
-
-   max_ts = ts.groupby(lambda x: x.year).transform("max")
-   min_ts = ts.groupby(lambda x: x.year).transform("min")
-
-   max_ts - min_ts
-
 Another common data transform is to replace missing data with the group mean.
 
 .. ipython:: python
@@ -879,7 +986,7 @@ Another common data transform is to replace missing data with the group mean.
 
    transformed = grouped.transform(lambda x: x.fillna(x.mean()))
 
-We can verify that the group means have not changed in the transformed data
+We can verify that the group means have not changed in the transformed data,
 and that the transformed data contains no NAs.
 
 .. ipython:: python
@@ -893,18 +1000,28 @@ and that the transformed data contains no NAs.
    grouped_trans.count()  # counts after transformation
    grouped_trans.size()  # Verify non-NA count equals group size
 
-.. note::
+.. _groupby_efficient_transforms:
 
-   Some functions will automatically transform the input when applied to a
-   GroupBy object, but returning an object of the same shape as the original.
-   Passing ``as_index=False`` will not affect these transformation methods.
+As mentioned in the note above, each of the examples in this section can be computed
+more efficiently using built-in methods. In the code below, the inefficient way
+using a UDF is commented out and the faster alternative appears below.
 
-   For example: ``fillna, ffill, bfill, shift.``.
+.. ipython:: python
 
-   .. ipython:: python
+    # ts.groupby(lambda x: x.year).transform(
+    #     lambda x: (x - x.mean()) / x.std()
+    # )
+    grouped = ts.groupby(lambda x: x.year)
+    result = (ts - grouped.transform("mean")) / grouped.transform("std")
 
-      grouped.ffill()
+    # ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
+    grouped = ts.groupby(lambda x: x.year)
+    result = grouped.transform("max") - grouped.transform("min")
 
+    # grouped = data_df.groupby(key)
+    # grouped.transform(lambda x: x.fillna(x.mean()))
+    grouped = data_df.groupby(key)
+    result = data_df.fillna(grouped.transform("mean"))
 
 .. _groupby.transform.window_resample:
 
@@ -915,7 +1032,7 @@ It is possible to use ``resample()``, ``expanding()`` and
 ``rolling()`` as methods on groupbys.
 
 The example below will apply the ``rolling()`` method on the samples of
-the column B based on the groups of column A.
+the column B, based on the groups of column A.
 
 .. ipython:: python
 
@@ -935,7 +1052,7 @@ group.
 
 
 Suppose you want to use the ``resample()`` method to get a daily
-frequency in each group of your dataframe and wish to complete the
+frequency in each group of your dataframe, and wish to complete the
 missing values with the ``ffill()`` method.
 
 .. ipython:: python
@@ -956,109 +1073,111 @@ missing values with the ``ffill()`` method.
 Filtration
 ----------
 
-The ``filter`` method returns a subset of the original object. Suppose we
-want to take only elements that belong to groups with a group sum greater
-than 2.
+A filtration is a GroupBy operation the subsets the original grouping object. It
+may either filter out entire groups, part of groups, or both. Filtrations return
+a filtered version of the calling object, including the grouping columns when provided.
+In the following example, ``class`` is included in the result.
 
 .. ipython:: python
 
-   sf = pd.Series([1, 1, 2, 3, 3, 3])
-   sf.groupby(sf).filter(lambda x: x.sum() > 2)
-
-The argument of ``filter`` must be a function that, applied to the group as a
-whole, returns ``True`` or ``False``.
+    speeds
+    speeds.groupby("class").nth(1)
 
-Another useful operation is filtering out elements that belong to groups
-with only a couple members.
+.. note::
 
-.. ipython:: python
+    Unlike aggregations, filtrations do not add the group keys to the index of the
+    result. Because of this, passing ``as_index=False`` or ``sort=True`` will not
+    affect these methods.
 
-   dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")})
-   dff.groupby("B").filter(lambda x: len(x) > 2)
-
-Alternatively, instead of dropping the offending groups, we can return a
-like-indexed objects where the groups that do not pass the filter are filled
-with NaNs.
+Filtrations will respect subsetting the columns of the GroupBy object.
 
 .. ipython:: python
 
-   dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False)
+    speeds.groupby("class")[["order", "max_speed"]].nth(1)
 
-For DataFrames with multiple columns, filters should explicitly specify a column as the filter criterion.
+Built-in filtrations
+~~~~~~~~~~~~~~~~~~~~
 
-.. ipython:: python
+The following methods on GroupBy act as filtrations. All these methods have a
+Cython-optimized implementation.
 
-   dff["C"] = np.arange(8)
-   dff.groupby("B").filter(lambda x: len(x["C"]) > 2)
+.. csv-table::
+    :header: "Method", "Description"
+    :widths: 20, 80
+    :delim: ;
 
-.. note::
+        :meth:`~.DataFrameGroupBy.head`;Select the top row(s) of each group
+        :meth:`~.DataFrameGroupBy.nth`;Select the nth row(s) of each group
+        :meth:`~.DataFrameGroupBy.tail`;Select the bottom row(s) of each group
 
-   Some functions when applied to a groupby object will act as a **filter** on the input, returning
-   a reduced shape of the original (and potentially eliminating groups), but with the index unchanged.
-   Passing ``as_index=False`` will not affect these transformation methods.
+Users can also use transformations along with Boolean indexing to construct complex
+filtrations within groups. For example, suppose we are given groups of products and
+their volumes, and we wish to subset the data to only the largest products capturing no
+more than 90% of the total volume within each group.
 
-   For example: ``head, tail``.
+.. ipython:: python
 
-   .. ipython:: python
+    product_volumes = pd.DataFrame(
+        {
+            "group": list("xxxxyyy"),
+            "product": list("abcdefg"),
+            "volume": [10, 30, 20, 15, 40, 10, 20],
+        }
+    )
+    product_volumes
 
-      dff.groupby("B").head(2)
+    # Sort by volume to select the largest products first
+    product_volumes = product_volumes.sort_values("volume", ascending=False)
+    grouped = product_volumes.groupby("group")["volume"]
+    cumpct = grouped.cumsum() / grouped.transform("sum")
+    cumpct
+    significant_products = product_volumes[cumpct <= 0.9]
+    significant_products.sort_values(["group", "product"])
 
+The :class:`~DataFrameGroupBy.filter` method
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-.. _groupby.dispatch:
+.. note::
 
-Dispatching to instance methods
--------------------------------
+    Filtering by supplying ``filter`` with a User-Defined Function (UDF) is
+    often less performant than using the built-in methods on GroupBy.
+    Consider breaking up a complex operation into a chain of operations that utilize
+    the built-in methods.
 
-When doing an aggregation or transformation, you might just want to call an
-instance method on each data group. This is pretty easy to do by passing lambda
-functions:
+The ``filter`` method takes a User-Defined Function (UDF) that, when applied to
+an entire group, returns either ``True`` or ``False``. The result of the ``filter``
+method is then the subset of groups for which the UDF returned ``True``.
+
+Suppose we want to take only elements that belong to groups with a group sum greater
+than 2.
 
 .. ipython:: python
-   :okwarning:
 
-   grouped = df.groupby("A")[["C", "D"]]
-   grouped.agg(lambda x: x.std())
+   sf = pd.Series([1, 1, 2, 3, 3, 3])
+   sf.groupby(sf).filter(lambda x: x.sum() > 2)
 
-But, it's rather verbose and can be untidy if you need to pass additional
-arguments. Using a bit of metaprogramming cleverness, GroupBy now has the
-ability to "dispatch" method calls to the groups:
+Another useful operation is filtering out elements that belong to groups
+with only a couple members.
 
 .. ipython:: python
-   :okwarning:
 
-   grouped.std()
+   dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")})
+   dff.groupby("B").filter(lambda x: len(x) > 2)
 
-What is actually happening here is that a function wrapper is being
-generated. When invoked, it takes any passed arguments and invokes the function
-with any arguments on each group (in the above example, the ``std``
-function). The results are then combined together much in the style of ``agg``
-and ``transform`` (it actually uses ``apply`` to infer the gluing, documented
-next). This enables some operations to be carried out rather succinctly:
+Alternatively, instead of dropping the offending groups, we can return a
+like-indexed objects where the groups that do not pass the filter are filled
+with NaNs.
 
 .. ipython:: python
 
-   tsdf = pd.DataFrame(
-       np.random.randn(1000, 3),
-       index=pd.date_range("1/1/2000", periods=1000),
-       columns=["A", "B", "C"],
-   )
-   tsdf.iloc[::2] = np.nan
-   grouped = tsdf.groupby(lambda x: x.year)
-   grouped.fillna(method="pad")
-
-In this example, we chopped the collection of time series into yearly chunks
-then independently called :ref:`fillna <missing_data.fillna>` on the
-groups.
+   dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False)
 
-The ``nlargest`` and ``nsmallest`` methods work on ``Series`` style groupbys:
+For DataFrames with multiple columns, filters should explicitly specify a column as the filter criterion.
 
 .. ipython:: python
 
-   s = pd.Series([9, 8, 7, 5, 19, 1, 4.2, 3.3])
-   g = pd.Series(list("abababab"))
-   gb = s.groupby(g)
-   gb.nlargest(3)
-   gb.nsmallest(3)
+   dff["C"] = np.arange(8)
+   dff.groupby("B").filter(lambda x: len(x["C"]) > 2)
 
 .. _groupby.apply:
 
@@ -1114,7 +1233,7 @@ that is itself a series, and possibly upcast the result to a DataFrame:
     s
     s.apply(f)
 
-Similar to :ref:`groupby.aggregate.udfs`, the resulting dtype will reflect that of the
+Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the
 apply function. If the results from different groups have different dtypes, then
 a common dtype will be determined in the same way as ``DataFrame`` construction.
 
@@ -1144,6 +1263,7 @@ with
 
     df.groupby("A", group_keys=False).apply(lambda x: x)
 
+
 Numba Accelerated Routines
 --------------------------
 
diff --git a/doc/source/user_guide/timeseries.rst b/doc/source/user_guide/timeseries.rst
index a675e30823c89..4cd98c89e7180 100644
--- a/doc/source/user_guide/timeseries.rst
+++ b/doc/source/user_guide/timeseries.rst
@@ -1618,7 +1618,7 @@ The ``resample`` function is very flexible and allows you to specify many
 different parameters to control the frequency conversion and resampling
 operation.
 
-Any function available via :ref:`dispatching <groupby.dispatch>` is available as
+Any built-in method available via :ref:`GroupBy <api.groupby>` is available as
 a method of the returned object, including ``sum``, ``mean``, ``std``, ``sem``,
 ``max``, ``min``, ``median``, ``first``, ``last``, ``ohlc``:
 
diff --git a/doc/source/whatsnew/v0.7.0.rst b/doc/source/whatsnew/v0.7.0.rst
index 1ee6a9899a655..2336ccaeac820 100644
--- a/doc/source/whatsnew/v0.7.0.rst
+++ b/doc/source/whatsnew/v0.7.0.rst
@@ -346,7 +346,7 @@ Other API changes
 Performance improvements
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
-- :ref:`Cythonized GroupBy aggregations <groupby.aggregate.cython>` no longer
+- :ref:`Cythonized GroupBy aggregations <groupby.aggregate.builtin>` no longer
   presort the data, thus achieving a significant speedup (:issue:`93`).  GroupBy
   aggregations with Python functions significantly sped up by clever
   manipulation of the ndarray data type in Cython (:issue:`496`).