From b14886dcd7111f7409be748275f8551eba52e2a4 Mon Sep 17 00:00:00 2001 From: Immanuella Umoren Date: Tue, 1 Aug 2023 18:06:38 -0700 Subject: [PATCH 1/7] Added documentation for Named aggregation in groupby.agg (Issue #18220) --- doc/source/user_guide/groupby.html | 2945 ++++++++++++++++++++++++++++ doc/source/user_guide/groupby.rst | 41 +- pandas/etc/profile.d/micromamba.sh | 66 + 3 files changed, 3046 insertions(+), 6 deletions(-) create mode 100644 doc/source/user_guide/groupby.html create mode 100644 pandas/etc/profile.d/micromamba.sh diff --git a/doc/source/user_guide/groupby.html b/doc/source/user_guide/groupby.html new file mode 100644 index 0000000000000..4630138cad90d --- /dev/null +++ b/doc/source/user_guide/groupby.html @@ -0,0 +1,2945 @@ + + + + + + +groupby.rst + + + +
+ + +

{{ header }}

+
+

Group by: split-apply-combine

+

By "group by" we are referring to a process involving one or more of the following +steps:

+
    +
  • Splitting the data into groups based on some criteria.
  • +
  • Applying a function to each group independently.
  • +
  • Combining the results into a data structure.
  • +
+

Out of these, the split step is the most straightforward. In fact, in many +situations we may wish to split the data set into groups and do something with +those groups. In the apply step, we might wish to do one of the +following:

+
    +
  • Aggregation: compute a summary statistic (or statistics) for each +group. Some examples:

    +
    +
      +
    • Compute group sums or means.
    • +
    • Compute group sizes / counts.
    • +
    +
    +
  • +
  • Transformation: perform some group-specific computations and return a +like-indexed object. Some examples:

    +
    +
      +
    • Standardize data (zscore) within a group.
    • +
    • Filling NAs within groups with a value derived from each group.
    • +
    +
    +
  • +
  • Filtration: discard some groups, according to a group-wise computation +that evaluates to True or False. Some examples:

    +
    +
      +
    • Discard data that belong to groups with only a few members.
    • +
    • Filter out data based on the group sum or mean.
    • +
    +
    +
  • +
+

Many of these operations are defined on GroupBy objects. These operations are similar +to those of the :ref:`aggregating API <basics.aggregate>`, +:ref:`window API <window.overview>`, and :ref:`resample API <timeseries.aggregate>`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 39); backlink

+Unknown interpreted text role "ref".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 39); backlink

+Unknown interpreted text role "ref".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 39); backlink

+Unknown interpreted text role "ref".
+

It is possible that a given operation does not fall into one of these categories or +is some combination of them. In such a case, it may be possible to compute the +operation using GroupBy's apply method. This method will examine the results of the +apply step and try to sensibly combine them into a single result if it doesn't fit into either +of the above three categories.

+
+

Note

+

An operation that is split into multiple steps using built-in GroupBy operations +will be more efficient than using the apply method with a user-defined Python +function.

+
+

Since the set of object instance methods on pandas data structures is generally +rich and expressive, we often simply want to invoke, say, a DataFrame function +on each group. The name GroupBy should be quite familiar to those who have used +a SQL-based tool (or itertools), in which you can write code like:

+
+SELECT Column1, Column2, mean(Column3), sum(Column4)
+FROM SomeTable
+GROUP BY Column1, Column2
+
+

We aim to make operations like this natural and easy to express using +pandas. We'll address each area of GroupBy functionality then provide some +non-trivial examples / use cases.

+

See the :ref:`cookbook<cookbook.grouping>` for some advanced strategies.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 71); backlink

+Unknown interpreted text role "ref".
+
+

Splitting an object into groups

+

The abstract definition of grouping is to provide a mapping of labels to +group names. To create a GroupBy object (more on what the GroupBy object is +later), you may do the following:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 82)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    speeds = pd.DataFrame(
+        [
+            ("bird", "Falconiformes", 389.0),
+            ("bird", "Psittaciformes", 24.0),
+            ("mammal", "Carnivora", 80.2),
+            ("mammal", "Primates", np.nan),
+            ("mammal", "Carnivora", 58),
+        ],
+        index=["falcon", "parrot", "lion", "monkey", "leopard"],
+        columns=("class", "order", "max_speed"),
+    )
+    speeds
+
+    grouped = speeds.groupby("class")
+    grouped = speeds.groupby(["class", "order"])
+
+
+
+

The mapping can be specified many different ways:

+
    +
  • A Python function, to be called on each of the index labels.
  • +
  • A list or NumPy array of the same length as the index.
  • +
  • A dict or Series, providing a label -> group name mapping.
  • +
  • For DataFrame objects, a string indicating either a column name or +an index level name to be used to group.
  • +
  • A list of any of the above things.
  • +
+

Collectively we refer to the grouping objects as the keys. For example, +consider the following DataFrame:

+
+

Note

+

A string passed to groupby may refer to either a column or an index level. +If a string matches both a column name and an index level name, a +ValueError will be raised.

+
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 118)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = pd.DataFrame(
+       {
+           "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
+           "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
+           "C": np.random.randn(8),
+           "D": np.random.randn(8),
+       }
+   )
+   df
+
+
+
+

On a DataFrame, we obtain a GroupBy object by calling :meth:`~DataFrame.groupby`. +This method returns a pandas.api.typing.DataFrameGroupBy instance. +We could naturally group by either the A or B columns, or both:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 130); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 134)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped = df.groupby("A")
+   grouped = df.groupby(["A", "B"])
+
+
+
+
+

Note

+

df.groupby('A') is just syntactic sugar for df.groupby(df['A']).

+
+

If we also have a MultiIndex on columns A and B, we can group by all +the columns except the one we specify:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 146)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df2 = df.set_index(["A", "B"])
+   grouped = df2.groupby(level=df2.index.names.difference(["B"]))
+   grouped.sum()
+
+
+
+

The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do +a transpose:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 155)

+

Unknown directive type "ipython".

+
+.. ipython::
+
+    In [4]: def get_letter_type(letter):
+       ...:     if letter.lower() in 'aeiou':
+       ...:         return 'vowel'
+       ...:     else:
+       ...:         return 'consonant'
+       ...:
+
+    In [5]: grouped = df.T.groupby(get_letter_type)
+
+
+
+

pandas :class:`~pandas.Index` objects support duplicate values. If a +non-unique index is used as the group key in a groupby operation, all values +for the same index value will be considered to be in one group and thus the +output of aggregation functions will only contain unique index values:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 166); backlink

+Unknown interpreted text role "class".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 171)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   lst = [1, 2, 3, 1, 2, 3]
+
+   s = pd.Series([1, 2, 3, 10, 20, 30], lst)
+
+   grouped = s.groupby(level=0)
+
+   grouped.first()
+
+   grouped.last()
+
+   grouped.sum()
+
+
+
+

Note that no splitting occurs until it's needed. Creating the GroupBy object +only verifies that you've passed a valid mapping.

+
+

Note

+

Many kinds of complicated data manipulations can be expressed in terms of +GroupBy operations (though it can't be guaranteed to be the most efficient implementation). +You can get quite creative with the label mapping functions.

+
+
+

GroupBy sorting

+

By default the group keys are sorted during the groupby operation. You may however pass sort=False for potential speedups. With sort=False the order among group-keys follows the order of appearance of the keys in the original dataframe:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 201)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df2 = pd.DataFrame({"X": ["B", "B", "A", "A"], "Y": [1, 2, 3, 4]})
+   df2.groupby(["X"]).sum()
+   df2.groupby(["X"], sort=False).sum()
+
+
+
+
+

Note that groupby will preserve the order in which observations are sorted within each group. +For example, the groups created by groupby() below are in the order they appeared in the original DataFrame:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 211)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df3 = pd.DataFrame({"X": ["A", "B", "A", "B"], "Y": [1, 4, 3, 2]})
+   df3.groupby(["X"]).get_group("A")
+
+   df3.groupby(["X"]).get_group("B")
+
+
+
+
+
+

GroupBy dropna

+

By default NA values are excluded from group keys during the groupby operation. However, +in case you want to include NA values in group keys, you could pass dropna=False to achieve it.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 227)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
+    df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])
+
+    df_dropna
+
+
+
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 234)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    # Default ``dropna`` is set to True, which will exclude NaNs in keys
+    df_dropna.groupby(by=["b"], dropna=True).sum()
+
+    # In order to allow NaN in keys, set ``dropna`` to False
+    df_dropna.groupby(by=["b"], dropna=False).sum()
+
+
+
+

The default setting of dropna argument is True which means NA are not included in group keys.

+
+
+
+

GroupBy object attributes

+

The groups attribute is a dictionary whose keys are the computed unique groups +and corresponding values are the axis labels belonging to each group. In the +above example we have:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 254)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby("A").groups
+   df.T.groupby(get_letter_type).groups
+
+
+
+

Calling the standard Python len function on the GroupBy object just returns +the length of the groups dict, so it is largely just a convenience:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 262)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped = df.groupby(["A", "B"])
+   grouped.groups
+   len(grouped)
+
+
+
+
+

GroupBy will tab complete column names (and other attributes):

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 273)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   n = 10
+   weight = np.random.normal(166, 20, size=n)
+   height = np.random.normal(60, 10, size=n)
+   time = pd.date_range("1/1/2000", periods=n)
+   gender = np.random.choice(["male", "female"], size=n)
+   df = pd.DataFrame(
+       {"height": height, "weight": weight, "gender": gender}, index=time
+   )
+   df
+   gb = df.groupby("gender")
+
+
+
+
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 287)

+

Unknown directive type "ipython".

+
+.. ipython::
+
+   @verbatim
+   In [1]: gb.<TAB>  # noqa: E225, E999
+   gb.agg        gb.boxplot    gb.cummin     gb.describe   gb.filter     gb.get_group  gb.height     gb.last       gb.median     gb.ngroups    gb.plot       gb.rank       gb.std        gb.transform
+   gb.aggregate  gb.count      gb.cumprod    gb.dtype      gb.first      gb.groups     gb.hist       gb.max        gb.min        gb.nth        gb.prod       gb.resample   gb.sum        gb.var
+   gb.apply      gb.cummax     gb.cumsum     gb.fillna     gb.gender     gb.head       gb.indices    gb.mean       gb.name       gb.ohlc       gb.quantile   gb.size       gb.tail       gb.weight
+
+
+
+
+
+

GroupBy with MultiIndex

+

With :ref:`hierarchically-indexed data <advanced.hierarchical>`, it's quite +natural to group by one of the levels of the hierarchy.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 300); backlink

+Unknown interpreted text role "ref".
+

Let's create a Series with a two-level MultiIndex.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 305)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+
+   arrays = [
+       ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
+       ["one", "two", "one", "two", "one", "two", "one", "two"],
+   ]
+   index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
+   s = pd.Series(np.random.randn(8), index=index)
+   s
+
+
+
+

We can then group by one of the levels in s.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 318)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped = s.groupby(level=0)
+   grouped.sum()
+
+
+
+

If the MultiIndex has names specified, these can be passed instead of the level +number:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 326)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   s.groupby(level="second").sum()
+
+
+
+

Grouping with multiple levels is supported.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 332)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   arrays = [
+       ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
+       ["doo", "doo", "bee", "bee", "bop", "bop", "bop", "bop"],
+       ["one", "two", "one", "two", "one", "two", "one", "two"],
+   ]
+   index = pd.MultiIndex.from_arrays(arrays, names=["first", "second", "third"])
+   s = pd.Series(np.random.randn(8), index=index)
+   s
+   s.groupby(level=["first", "second"]).sum()
+
+
+
+

Index level names may be supplied as keys.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 346)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   s.groupby(["first", "second"]).sum()
+
+
+
+

More on the sum function and aggregation later.

+
+
+

Grouping DataFrame with Index levels and columns

+

A DataFrame may be grouped by a combination of columns and index levels. You +can specify both column and index names, or use a :class:`Grouper`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 354); backlink

+Unknown interpreted text role "class".
+

Let's first create a DataFrame with a MultiIndex:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 359)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   arrays = [
+       ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
+       ["one", "two", "one", "two", "one", "two", "one", "two"],
+   ]
+
+   index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
+
+   df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 3, 3], "B": np.arange(8)}, index=index)
+
+   df
+
+
+
+

Then we group df by the second index level and the A column.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 374)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby([pd.Grouper(level=1), "A"]).sum()
+
+
+
+

Index levels may also be specified by name.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 380)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby([pd.Grouper(level="second"), "A"]).sum()
+
+
+
+

Index level names may be specified as keys directly to groupby.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 386)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby(["second", "A"]).sum()
+
+
+
+
+
+

DataFrame column selection in GroupBy

+

Once you have created the GroupBy object from a DataFrame, you might want to do +something different for each of the columns. Thus, by using [] on the GroupBy +object in a similar way as the one used to get a column from a DataFrame, you can do:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 397)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = pd.DataFrame(
+       {
+           "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
+           "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
+           "C": np.random.randn(8),
+           "D": np.random.randn(8),
+       }
+   )
+
+   df
+
+   grouped = df.groupby(["A"])
+   grouped_C = grouped["C"]
+   grouped_D = grouped["D"]
+
+
+
+

This is mainly syntactic sugar for the alternative, which is much more verbose:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 416)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df["C"].groupby(df["A"])
+
+
+
+

Additionally, this method avoids recomputing the internal grouping information +derived from the passed key.

+
+
+
+

Iterating through groups

+

With the GroupBy object in hand, iterating through the grouped data is very +natural and functions similarly to :py:func:`itertools.groupby`:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 428); backlink

+Unknown interpreted text role "py:func".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 431)

+

Unknown directive type "ipython".

+
+.. ipython::
+
+   In [4]: grouped = df.groupby('A')
+
+   In [5]: for name, group in grouped:
+      ...:     print(name)
+      ...:     print(group)
+      ...:
+
+
+
+

In the case of grouping by multiple keys, the group name will be a tuple:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 442)

+

Unknown directive type "ipython".

+
+.. ipython::
+
+   In [5]: for name, group in df.groupby(['A', 'B']):
+      ...:     print(name)
+      ...:     print(group)
+      ...:
+
+
+
+

See :ref:`timeseries.iterating-label`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 449); backlink

+Unknown interpreted text role "ref".
+
+
+

Selecting a group

+

A single group can be selected using +:meth:`~pandas.core.groupby.DataFrameGroupBy.get_group`:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 454); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 457)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped.get_group("bar")
+
+
+
+

Or for an object grouped on multiple columns:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 463)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby(["A", "B"]).get_group(("bar", "one"))
+
+
+
+
+
+

Aggregation

+

An aggregation is a GroupBy operation that reduces the dimension of the grouping +object. The result of an aggregation is, or at least is treated as, +a scalar value for each column in a group. For example, producing the sum of each +column in a group of values.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 477)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   animals = pd.DataFrame(
+       {
+           "kind": ["cat", "dog", "cat", "dog"],
+           "height": [9.1, 6.0, 9.5, 34.0],
+           "weight": [7.9, 7.5, 9.9, 198.0],
+       }
+   )
+   animals
+   animals.groupby("kind").sum()
+
+
+
+

In the result, the keys of the groups appear in the index by default. They can be +instead included in the columns by passing as_index=False.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 492)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   animals.groupby("kind", as_index=False).sum()
+
+
+
+
+

Built-in aggregation methods

+

Many common aggregations are built-in to GroupBy objects as methods. Of the methods +listed below, those with a * do not have a Cython-optimized implementation.

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MethodDescription

:meth:`~.DataFrameGroupBy.any`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute whether any of the values in the groups are truthy

:meth:`~.DataFrameGroupBy.all`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute whether all of the values in the groups are truthy

:meth:`~.DataFrameGroupBy.count`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the number of non-NA values in the groups

:meth:`~.DataFrameGroupBy.cov` *

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the covariance of the groups

:meth:`~.DataFrameGroupBy.first`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the first occurring value in each group

:meth:`~.DataFrameGroupBy.idxmax` *

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the index of the maximum value in each group

:meth:`~.DataFrameGroupBy.idxmin` *

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the index of the minimum value in each group

:meth:`~.DataFrameGroupBy.last`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the last occurring value in each group

:meth:`~.DataFrameGroupBy.max`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the maximum value in each group

:meth:`~.DataFrameGroupBy.mean`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the mean of each group

:meth:`~.DataFrameGroupBy.median`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the median of each group

:meth:`~.DataFrameGroupBy.min`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the minimum value in each group

:meth:`~.DataFrameGroupBy.nunique`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the number of unique values in each group

:meth:`~.DataFrameGroupBy.prod`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the product of the values in each group

:meth:`~.DataFrameGroupBy.quantile`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute a given quantile of the values in each group

:meth:`~.DataFrameGroupBy.sem`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the standard error of the mean of the values in each group

:meth:`~.DataFrameGroupBy.size`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the number of values in each group

:meth:`~.DataFrameGroupBy.skew` *

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the skew of the values in each group

:meth:`~.DataFrameGroupBy.std`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the standard deviation of the values in each group

:meth:`~.DataFrameGroupBy.sum`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the sum of the values in each group

:meth:`~.DataFrameGroupBy.var`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

+Unknown interpreted text role "meth".
+
Compute the variance of the values in each group
+

Some examples:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 533)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby("A")[["C", "D"]].max()
+   df.groupby(["A", "B"]).mean()
+
+
+
+

Another simple aggregation example is to compute the size of each group. +This is included in GroupBy as the size method. It returns a Series whose +index are the group names and whose values are the sizes of each group.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 542)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped = df.groupby(["A", "B"])
+   grouped.size()
+
+
+
+

While the :meth:`~.DataFrameGroupBy.describe` method is not itself a reducer, it +can be used to conveniently produce a collection of summary statistics about each of +the groups.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 547); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 551)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped.describe()
+
+
+
+

Another aggregation example is to compute the number of unique values of each group. +This is similar to the value_counts function, except that it only counts the +number of unique values.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 559)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   ll = [['foo', 1], ['foo', 2], ['foo', 2], ['bar', 1], ['bar', 1]]
+   df4 = pd.DataFrame(ll, columns=["A", "B"])
+   df4
+   df4.groupby("A")["B"].nunique()
+
+
+
+
+

Note

+

Aggregation functions will not return the groups that you are aggregating over +as named columns, when as_index=True, the default. The grouped columns will +be the indices of the returned object.

+

Passing as_index=False will return the groups that you are aggregating over, if they are +named indices or columns.

+
+
+
+

The :meth:`~.DataFrameGroupBy.aggregate` method

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 578); backlink

+Unknown interpreted text role "meth".
+
+

Note

+

The :meth:`~.DataFrameGroupBy.aggregate` method can accept many different types of +inputs. This section details using string aliases for various GroupBy methods; other +inputs are detailed in the sections below.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 582); backlink

+Unknown interpreted text role "meth".
+
+

Any reduction method that pandas implements can be passed as a string to +:meth:`~.DataFrameGroupBy.aggregate`. Users are encouraged to use the shorthand, +agg. It will operate as if the corresponding method was called.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 586); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 590)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped = df.groupby("A")
+   grouped[["C", "D"]].aggregate("sum")
+
+   grouped = df.groupby(["A", "B"])
+   grouped.agg("sum")
+
+
+
+

The result of the aggregation will have the group names as the +new index along the grouped axis. In the case of multiple keys, the result is a +:ref:`MultiIndex <advanced.hierarchical>` by default. As mentioned above, this can be +changed by using the as_index option:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 598); backlink

+Unknown interpreted text role "ref".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 603)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped = df.groupby(["A", "B"], as_index=False)
+   grouped.agg("sum")
+
+   df.groupby("A", as_index=False)[["C", "D"]].agg("sum")
+
+
+
+

Note that you could use the :meth:`DataFrame.reset_index` DataFrame function to achieve +the same result as the column names are stored in the resulting MultiIndex, although +this will make an extra copy.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 610); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 614)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby(["A", "B"]).agg("sum").reset_index()
+
+
+
+
+
+

Aggregation with User-Defined Functions

+

Users can also provide their own User-Defined Functions (UDFs) for custom aggregations.

+
+

Warning

+

When aggregating with a UDF, the UDF should not mutate the +provided Series. See :ref:`gotchas.udf-mutation` for more information.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 627); backlink

+Unknown interpreted text role "ref".
+
+
+

Note

+

Aggregating with a UDF is often less performant than using +the pandas built-in methods on GroupBy. Consider breaking up a complex operation +into a chain of operations that utilize the built-in methods.

+
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 636)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   animals
+   animals.groupby("kind")[["height"]].agg(lambda x: set(x))
+
+
+
+

The resulting dtype will reflect that of the aggregating function. If the results from different groups have +different dtypes, then a common dtype will be determined in the same way as DataFrame construction.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 644)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   animals.groupby("kind")[["height"]].agg(lambda x: x.astype(int).sum())
+
+
+
+
+
+

Applying multiple functions at once

+

With grouped Series you can also pass a list or dict of functions to do +aggregation with, outputting a DataFrame:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 656)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped = df.groupby("A")
+   grouped["C"].agg(["sum", "mean", "std"])
+
+
+
+

On a grouped DataFrame, you can pass a list of functions to apply to each +column, which produces an aggregated result with a hierarchical index:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 664)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped[["C", "D"]].agg(["sum", "mean", "std"])
+
+
+
+
+

The resulting aggregations are named after the functions themselves. If you +need to rename, then you can add in a chained operation for a Series like this:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 672)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   (
+       grouped["C"]
+       .agg(["sum", "mean", "std"])
+       .rename(columns={"sum": "foo", "mean": "bar", "std": "baz"})
+   )
+
+
+
+

For a grouped DataFrame, you can rename in a similar manner:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 682)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   (
+       grouped[["C", "D"]].agg(["sum", "mean", "std"]).rename(
+           columns={"sum": "foo", "mean": "bar", "std": "baz"}
+       )
+   )
+
+
+
+
+

Note

+

In general, the output column names should be unique, but pandas will allow +you apply to the same function (or two functions with the same name) to the same +column.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 696)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped["C"].agg(["sum", "sum"])
+
+
+
+
+

pandas also allows you to provide multiple lambdas. In this case, pandas +will mangle the name of the (nameless) lambda functions, appending _<i> +to each subsequent lambda.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 705)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped["C"].agg([lambda x: x.max() - x.min(), lambda x: x.median() - x.mean()])
+
+
+
+
+
+

Named aggregation

+

To support column-specific aggregation with control over the output column names, pandas +accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg`, known as "named aggregation", where

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 715); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 715); backlink

+Unknown interpreted text role "meth".
+
    +
  • The keywords are the output column names

    +
  • +
  • The values are tuples whose first element is the column to select +and the second element is the aggregation to apply to that column. pandas +provides the :class:`NamedAgg` namedtuple with the fields ['column', 'aggfunc'] +to make it clearer what the arguments are. As usual, the aggregation can +be a callable or a string alias.

    +
    +

    System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 719); backlink

    +

    Unknown interpreted text role "class".

    +
    +
  • +
+
+
+
+

Example:

+

Consider the following DataFrame animals:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 730)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    import pandas as pd
+
+    animals = pd.DataFrame({
+        'kind': ['cat', 'dog', 'cat', 'dog'],
+        'height': [9.1, 6.0, 9.5, 34.0],
+        'weight': [7.9, 7.5, 9.9, 198.0]
+    })
+
+
+
+

To demonstrate "named aggregation," let's group the DataFrame by the 'kind' column and apply different aggregations to the 'height' and 'weight' columns:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 742)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   result = animals.groupby('kind').agg(
+        min_height=pd.NamedAgg(column='height', aggfunc='min'),
+        max_height=pd.NamedAgg(column='height', aggfunc='max'),
+        average_weight=pd.NamedAgg(column='weight', aggfunc='mean')
+    )
+
+
+
+

In the above example, we used "named aggregation" to specify custom output column names (min_height, max_height, and average_weight) for each aggregation. The result will be a new DataFrame with the aggregated values, and the output column names will be as specified.

+

The resulting DataFrame will look like this:

+
+   min_height  max_height  average_weight
+kind
+cat          9.1         9.5            8.90
+dog          6.0        34.0          102.75
+
+

In this example, the 'min_height' column contains the minimum height for each group, the 'max_height' column contains the maximum height, and the 'average_weight' column contains the average weight for each group.

+

By using "named aggregation," you can easily control the output column names and have more descriptive results when performing aggregations with groupby.agg.

+

:class:`NamedAgg` is just a namedtuple. Plain tuples are allowed as well.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 765); backlink

+Unknown interpreted text role "class".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 767)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   animals.groupby("kind").agg(
+       min_height=("height", "min"),
+       max_height=("height", "max"),
+       average_weight=("weight", "mean"),
+   )
+
+
+
+
+

If the column names you want are not valid Python keywords, construct a dictionary +and unpack the keyword arguments

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 779)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   animals.groupby("kind").agg(
+       **{
+           "total weight": pd.NamedAgg(column="weight", aggfunc="sum")
+       }
+   )
+
+
+
+

When using named aggregation, additional keyword arguments are not passed through +to the aggregation functions; only pairs +of (column, aggfunc) should be passed as **kwargs. If your aggregation functions +require additional arguments, apply them partially with :meth:`functools.partial`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 787); backlink

+Unknown interpreted text role "meth".
+

Named aggregation is also valid for Series groupby aggregations. In this case there's +no column selection, so the values are just the functions.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 795)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   animals.groupby("kind").height.agg(
+       min_height="min",
+       max_height="max",
+   )
+
+
+
+
+

Applying different functions to DataFrame columns

+

By passing a dict to aggregate you can apply a different aggregation to the +columns of a DataFrame:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 808)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped.agg({"C": "sum", "D": lambda x: np.std(x, ddof=1)})
+
+
+
+

The function names can also be strings. In order for a string to be valid it +must be implemented on GroupBy:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 815)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped.agg({"C": "sum", "D": "std"})
+
+
+
+
+
+
+

Transformation

+

A transformation is a GroupBy operation whose result is indexed the same +as the one being grouped. Common examples include :meth:`~.DataFrameGroupBy.cumsum` and +:meth:`~.DataFrameGroupBy.diff`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 824); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 824); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 828)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    speeds
+    grouped = speeds.groupby("class")["max_speed"]
+    grouped.cumsum()
+    grouped.diff()
+
+
+
+

Unlike aggregations, the groupings that are used to split +the original object are not included in the result.

+
+

Note

+

Since transformations do not include the groupings that are used to split the result, +the arguments as_index and sort in :meth:`DataFrame.groupby` and +:meth:`Series.groupby` have no effect.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 840); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 840); backlink

+Unknown interpreted text role "meth".
+
+

A common use of a transformation is to add the result back into the original DataFrame.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 846)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    result = speeds.copy()
+    result["cumsum"] = grouped.cumsum()
+    result["diff"] = grouped.diff()
+    result
+
+
+
+
+

Built-in transformation methods

+

The following methods on GroupBy act as transformations. Of these methods, only +fillna does not have a Cython-optimized implementation.

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MethodDescription

:meth:`~.DataFrameGroupBy.bfill`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Back fill NA values within each group

:meth:`~.DataFrameGroupBy.cumcount`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the cumulative count within each group

:meth:`~.DataFrameGroupBy.cummax`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the cumulative max within each group

:meth:`~.DataFrameGroupBy.cummin`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the cumulative min within each group

:meth:`~.DataFrameGroupBy.cumprod`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the cumulative product within each group

:meth:`~.DataFrameGroupBy.cumsum`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the cumulative sum within each group

:meth:`~.DataFrameGroupBy.diff`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the difference between adjacent values within each group

:meth:`~.DataFrameGroupBy.ffill`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Forward fill NA values within each group

:meth:`~.DataFrameGroupBy.fillna`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Fill NA values within each group

:meth:`~.DataFrameGroupBy.pct_change`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the percent change between adjacent values within each group

:meth:`~.DataFrameGroupBy.rank`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Compute the rank of each value within each group

:meth:`~.DataFrameGroupBy.shift`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

+Unknown interpreted text role "meth".
+
Shift values up or down within each group
+

In addition, passing any built-in aggregation method as a string to +:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result +across the group, producing a transformed result. If the aggregation method is +Cython-optimized, this will be performant as well.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 877); backlink

+Unknown interpreted text role "meth".
+
+
+

The :meth:`~.DataFrameGroupBy.transform` method

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 884); backlink

+Unknown interpreted text role "meth".
+

Similar to the :ref:`aggregation method <groupby.aggregate.agg>`, the +:meth:`~.DataFrameGroupBy.transform` method can accept string aliases to the built-in +transformation methods in the previous section. It can also accept string aliases to +the built-in aggregation methods. When an aggregation method is provided, the result +will be broadcast across the group.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 887); backlink

+Unknown interpreted text role "ref".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 887); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 893)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    speeds
+    grouped = speeds.groupby("class")[["max_speed"]]
+    grouped.transform("cumsum")
+    grouped.transform("sum")
+
+
+
+

In addition to string aliases, the :meth:`~.DataFrameGroupBy.transform` method can +also accept User-Defined Functions (UDFs). The UDF must:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 900); backlink

+Unknown interpreted text role "meth".
+
    +
  • Return a result that is either the same size as the group chunk or +broadcastable to the size of the group chunk (e.g., a scalar, +grouped.transform(lambda x: x.iloc[-1])).

    +
  • +
  • Operate column-by-column on the group chunk. The transform is applied to +the first group chunk using chunk.apply.

    +
  • +
  • Not perform in-place operations on the group chunk. Group chunks should +be treated as immutable, and changes to a group chunk may produce unexpected +results. See :ref:`gotchas.udf-mutation` for more information.

    +
    +

    System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 908); backlink

    +

    Unknown interpreted text role "ref".

    +
    +
  • +
  • (Optionally) operates on all columns of the entire group chunk at once. If this is +supported, a fast path is used starting from the second chunk.

    +
  • +
+
+

Note

+

Transforming by supplying transform with a UDF is +often less performant than using the built-in methods on GroupBy. +Consider breaking up a complex operation into a chain of operations that utilize +the built-in methods.

+

All of the examples in this section can be made more performant by calling +built-in methods instead of using transform. +See :ref:`below for examples <groupby_efficient_transforms>`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 921); backlink

+Unknown interpreted text role "ref".
+
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 925)

+

Unknown directive type "versionchanged".

+
+.. versionchanged:: 2.0.0
+
+    When using ``.transform`` on a grouped DataFrame and the transformation function
+    returns a DataFrame, pandas now aligns the result's index
+    with the input's index. You can call ``.to_numpy()`` within the transformation
+    function to avoid alignment.
+
+
+
+

Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the +transformation function. If the results from different groups have different dtypes, then +a common dtype will be determined in the same way as DataFrame construction.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 932); backlink

+Unknown interpreted text role "ref".
+

Suppose we wish to standardize the data within each group:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 938)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   index = pd.date_range("10/1/1999", periods=1100)
+   ts = pd.Series(np.random.normal(0.5, 2, 1100), index)
+   ts = ts.rolling(window=100, min_periods=100).mean().dropna()
+
+   ts.head()
+   ts.tail()
+
+   transformed = ts.groupby(lambda x: x.year).transform(
+       lambda x: (x - x.mean()) / x.std()
+   )
+
+
+
+
+

We would expect the result to now have mean 0 and standard deviation 1 within +each group, which we can easily check:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 955)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   # Original Data
+   grouped = ts.groupby(lambda x: x.year)
+   grouped.mean()
+   grouped.std()
+
+   # Transformed Data
+   grouped_trans = transformed.groupby(lambda x: x.year)
+   grouped_trans.mean()
+   grouped_trans.std()
+
+
+
+

We can also visually compare the original and transformed data sets.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 969)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   compare = pd.DataFrame({"Original": ts, "Transformed": transformed})
+
+   @savefig groupby_transform_plot.png
+   compare.plot()
+
+
+
+

Transformation functions that have lower dimension outputs are broadcast to +match the shape of the input array.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 979)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
+
+
+
+

Another common data transform is to replace missing data with the group mean.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 985)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   cols = ["A", "B", "C"]
+   values = np.random.randn(1000, 3)
+   values[np.random.randint(0, 1000, 100), 0] = np.nan
+   values[np.random.randint(0, 1000, 50), 1] = np.nan
+   values[np.random.randint(0, 1000, 200), 2] = np.nan
+   data_df = pd.DataFrame(values, columns=cols)
+   data_df
+
+   countries = np.array(["US", "UK", "GR", "JP"])
+   key = countries[np.random.randint(0, 4, 1000)]
+
+   grouped = data_df.groupby(key)
+
+   # Non-NA count in each group
+   grouped.count()
+
+   transformed = grouped.transform(lambda x: x.fillna(x.mean()))
+
+
+
+

We can verify that the group means have not changed in the transformed data, +and that the transformed data contains no NAs.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1008)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   grouped_trans = transformed.groupby(key)
+
+   grouped.mean()  # original group means
+   grouped_trans.mean()  # transformation did not change group means
+
+   grouped.count()  # original has some missing data points
+   grouped_trans.count()  # counts after transformation
+   grouped_trans.size()  # Verify non-NA count equals group size
+
+
+
+

As mentioned in the note above, each of the examples in this section can be computed +more efficiently using built-in methods. In the code below, the inefficient way +using a UDF is commented out and the faster alternative appears below.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1025)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    # ts.groupby(lambda x: x.year).transform(
+    #     lambda x: (x - x.mean()) / x.std()
+    # )
+    grouped = ts.groupby(lambda x: x.year)
+    result = (ts - grouped.transform("mean")) / grouped.transform("std")
+
+    # ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
+    grouped = ts.groupby(lambda x: x.year)
+    result = grouped.transform("max") - grouped.transform("min")
+
+    # grouped = data_df.groupby(key)
+    # grouped.transform(lambda x: x.fillna(x.mean()))
+    grouped = data_df.groupby(key)
+    result = data_df.fillna(grouped.transform("mean"))
+
+
+
+
+
+

Window and resample operations

+

It is possible to use resample(), expanding() and +rolling() as methods on groupbys.

+

The example below will apply the rolling() method on the samples of +the column B, based on the groups of column A.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1053)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df_re = pd.DataFrame({"A": [1] * 10 + [5] * 10, "B": np.arange(20)})
+   df_re
+
+   df_re.groupby("A").rolling(4).B.mean()
+
+
+
+
+

The expanding() method will accumulate a given operation +(sum() in the example) for all the members of each particular +group.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1065)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df_re.groupby("A").expanding().sum()
+
+
+
+
+

Suppose you want to use the resample() method to get a daily +frequency in each group of your dataframe, and wish to complete the +missing values with the ffill() method.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1074)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df_re = pd.DataFrame(
+       {
+           "date": pd.date_range(start="2016-01-01", periods=4, freq="W"),
+           "group": [1, 1, 2, 2],
+           "val": [5, 6, 7, 8],
+       }
+   ).set_index("date")
+   df_re
+
+   df_re.groupby("group").resample("1D").ffill()
+
+
+
+
+
+
+

Filtration

+

A filtration is a GroupBy operation the subsets the original grouping object. It +may either filter out entire groups, part of groups, or both. Filtrations return +a filtered version of the calling object, including the grouping columns when provided. +In the following example, class is included in the result.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1097)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    speeds
+    speeds.groupby("class").nth(1)
+
+
+
+
+

Note

+

Unlike aggregations, filtrations do not add the group keys to the index of the +result. Because of this, passing as_index=False or sort=True will not +affect these methods.

+
+

Filtrations will respect subsetting the columns of the GroupBy object.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1110)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    speeds.groupby("class")[["order", "max_speed"]].nth(1)
+
+
+
+
+

Built-in filtrations

+

The following methods on GroupBy act as filtrations. All these methods have a +Cython-optimized implementation.

+ ++++ + + + + + + + + + + + + + + + + +
MethodDescription

:meth:`~.DataFrameGroupBy.head`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1125); backlink

+Unknown interpreted text role "meth".
+
Select the top row(s) of each group

:meth:`~.DataFrameGroupBy.nth`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1125); backlink

+Unknown interpreted text role "meth".
+
Select the nth row(s) of each group

:meth:`~.DataFrameGroupBy.tail`

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1125); backlink

+Unknown interpreted text role "meth".
+
Select the bottom row(s) of each group
+

Users can also use transformations along with Boolean indexing to construct complex +filtrations within groups. For example, suppose we are given groups of products and +their volumes, and we wish to subset the data to only the largest products capturing no +more than 90% of the total volume within each group.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1134)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    product_volumes = pd.DataFrame(
+        {
+            "group": list("xxxxyyy"),
+            "product": list("abcdefg"),
+            "volume": [10, 30, 20, 15, 40, 10, 20],
+        }
+    )
+    product_volumes
+
+    # Sort by volume to select the largest products first
+    product_volumes = product_volumes.sort_values("volume", ascending=False)
+    grouped = product_volumes.groupby("group")["volume"]
+    cumpct = grouped.cumsum() / grouped.transform("sum")
+    cumpct
+    significant_products = product_volumes[cumpct <= 0.9]
+    significant_products.sort_values(["group", "product"])
+
+
+
+
+
+

The :class:`~DataFrameGroupBy.filter` method

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1153); backlink

+Unknown interpreted text role "class".
+
+

Note

+

Filtering by supplying filter with a User-Defined Function (UDF) is +often less performant than using the built-in methods on GroupBy. +Consider breaking up a complex operation into a chain of operations that utilize +the built-in methods.

+
+

The filter method takes a User-Defined Function (UDF) that, when applied to +an entire group, returns either True or False. The result of the filter +method is then the subset of groups for which the UDF returned True.

+

Suppose we want to take only elements that belong to groups with a group sum greater +than 2.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1170)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   sf = pd.Series([1, 1, 2, 3, 3, 3])
+   sf.groupby(sf).filter(lambda x: x.sum() > 2)
+
+
+
+

Another useful operation is filtering out elements that belong to groups +with only a couple members.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1178)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")})
+   dff.groupby("B").filter(lambda x: len(x) > 2)
+
+
+
+

Alternatively, instead of dropping the offending groups, we can return a +like-indexed objects where the groups that do not pass the filter are filled +with NaNs.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1187)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False)
+
+
+
+

For DataFrames with multiple columns, filters should explicitly specify a column as the filter criterion.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1193)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   dff["C"] = np.arange(8)
+   dff.groupby("B").filter(lambda x: len(x["C"]) > 2)
+
+
+
+
+
+
+

Flexible apply

+

Some operations on the grouped data might not fit into the aggregation, +transformation, or filtration categories. For these, you can use the apply +function.

+
+

Warning

+

apply has to try to infer from the result whether it should act as a reducer, +transformer, or filter, depending on exactly what is passed to it. Thus the +grouped column(s) may be included in the output or not. While +it tries to intelligently guess how to behave, it can sometimes guess wrong.

+
+
+

Note

+

All of the examples in this section can be more reliably, and more efficiently, +computed using other pandas functionality.

+
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1219)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df
+   grouped = df.groupby("A")
+
+   # could also just call .describe()
+   grouped["C"].apply(lambda x: x.describe())
+
+
+
+

The dimension of the returned result can also change:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1229)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    grouped = df.groupby('A')['C']
+
+    def f(group):
+        return pd.DataFrame({'original': group,
+                             'demeaned': group - group.mean()})
+
+    grouped.apply(f)
+
+
+
+

Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the +apply function. If the results from different groups have different dtypes, then +a common dtype will be determined in the same way as DataFrame construction.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1239); backlink

+Unknown interpreted text role "ref".
+
+

Control grouped column(s) placement with group_keys

+

To control whether the grouped column(s) are included in the indices, you can use +the argument group_keys which defaults to True. Compare

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1249)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    df.groupby("A", group_keys=True).apply(lambda x: x)
+
+
+
+

with

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1255)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    df.groupby("A", group_keys=False).apply(lambda x: x)
+
+
+
+
+
+
+
+

Numba Accelerated Routines

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1263)

+

Unknown directive type "versionadded".

+
+.. versionadded:: 1.1
+
+
+
+

If Numba is installed as an optional dependency, the transform and +aggregate methods support engine='numba' and engine_kwargs arguments. +See :ref:`enhancing performance with Numba <enhancingperf.numba>` for general usage of the arguments +and performance considerations.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1265); backlink

+Unknown interpreted text role "ref".
+

The function signature must start with values, index exactly as the data belonging to each group +will be passed into values, and the group index will be passed into index.

+
+

Warning

+

When using engine='numba', there will be no "fall back" behavior internally. The group +data and group index will be passed as NumPy arrays to the JITed user defined function, and no +alternative execution attempts will be tried.

+
+
+
+

Other useful features

+
+

Exclusion of "nuisance" columns

+

Again consider the example DataFrame we've been looking at:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1287)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df
+
+
+
+

Suppose we wish to compute the standard deviation grouped by the A +column. There is a slight problem, namely that we don't care about the data in +column B because it is not numeric. We refer to these non-numeric columns as +"nuisance" columns. You can avoid nuisance columns by specifying numeric_only=True:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1296)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby("A").std(numeric_only=True)
+
+
+
+

Note that df.groupby('A').colname.std(). is more efficient than +df.groupby('A').std().colname. So if the result of an aggregation function +is only needed over one column (here colname), it may be filtered +before applying the aggregation function.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1305)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    from decimal import Decimal
+
+    df_dec = pd.DataFrame(
+        {
+            "id": [1, 2, 1, 2],
+            "int_column": [1, 2, 3, 4],
+            "dec_column": [
+                Decimal("0.50"),
+                Decimal("0.15"),
+                Decimal("0.25"),
+                Decimal("0.40"),
+            ],
+        }
+    )
+
+    # Decimal columns can be sum'd explicitly by themselves...
+    df_dec.groupby(["id"])[["dec_column"]].sum()
+
+    # ...but cannot be combined with standard data types or they will be excluded
+    df_dec.groupby(["id"])[["int_column", "dec_column"]].sum()
+
+    # Use .agg function to aggregate over standard and "nuisance" data types
+    # at the same time
+    df_dec.groupby(["id"]).agg({"int_column": "sum", "dec_column": "sum"})
+
+
+
+
+
+

Handling of (un)observed Categorical values

+

When using a Categorical grouper (as a single grouper, or as part of multiple groupers), the observed keyword +controls whether to return a cartesian product of all possible groupers values (observed=False) or only those +that are observed groupers (observed=True).

+

Show all values:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1343)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   pd.Series([1, 1, 1]).groupby(
+       pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=False
+   ).count()
+
+
+
+

Show only the observed values:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1351)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   pd.Series([1, 1, 1]).groupby(
+       pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=True
+   ).count()
+
+
+
+

The returned dtype of the grouped will always include all of the categories that were grouped.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1359)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   s = (
+       pd.Series([1, 1, 1])
+       .groupby(pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=False)
+       .count()
+   )
+   s.index.dtype
+
+
+
+
+
+

NA and NaT group handling

+

If there are any NaN or NaT values in the grouping key, these will be +automatically excluded. In other words, there will never be an "NA group" or +"NaT group". This was not the case in older versions of pandas, but users were +generally discarding the NA group anyway (and supporting it was an +implementation headache).

+
+
+

Grouping with ordered factors

+

Categorical variables represented as instances of pandas's Categorical class +can be used as group keys. If so, the order of the levels will be preserved:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1385)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   data = pd.Series(np.random.randn(100))
+
+   factor = pd.qcut(data, [0, 0.25, 0.5, 0.75, 1.0])
+
+   data.groupby(factor, observed=False).mean()
+
+
+
+
+
+

Grouping with a grouper specification

+

You may need to specify a bit more data to properly group. You can +use the pd.Grouper to provide this local control.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1401)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   import datetime
+
+   df = pd.DataFrame(
+       {
+           "Branch": "A A A A A A A B".split(),
+           "Buyer": "Carl Mark Carl Carl Joe Joe Joe Carl".split(),
+           "Quantity": [1, 3, 5, 1, 8, 1, 9, 3],
+           "Date": [
+               datetime.datetime(2013, 1, 1, 13, 0),
+               datetime.datetime(2013, 1, 1, 13, 5),
+               datetime.datetime(2013, 10, 1, 20, 0),
+               datetime.datetime(2013, 10, 2, 10, 0),
+               datetime.datetime(2013, 10, 1, 20, 0),
+               datetime.datetime(2013, 10, 2, 10, 0),
+               datetime.datetime(2013, 12, 2, 12, 0),
+               datetime.datetime(2013, 12, 2, 14, 0),
+           ],
+       }
+   )
+
+   df
+
+
+
+

Groupby a specific column with the desired frequency. This is like resampling.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1427)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby([pd.Grouper(freq="1M", key="Date"), "Buyer"])[["Quantity"]].sum()
+
+
+
+

When freq is specified, the object returned by pd.Grouper will be an +instance of pandas.api.typing.TimeGrouper. You have an ambiguous specification +in that you have a named index and a column that could be potential groupers.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1435)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = df.set_index("Date")
+   df["Date"] = df.index + pd.offsets.MonthEnd(2)
+   df.groupby([pd.Grouper(freq="6M", key="Date"), "Buyer"])[["Quantity"]].sum()
+
+   df.groupby([pd.Grouper(freq="6M", level="Date"), "Buyer"])[["Quantity"]].sum()
+
+
+
+
+
+
+

Taking the first rows of each group

+

Just like for a DataFrame or Series you can call head and tail on a groupby:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1449)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=["A", "B"])
+   df
+
+   g = df.groupby("A")
+   g.head(1)
+
+   g.tail(1)
+
+
+
+

This shows the first or last n rows from each group.

+
+
+

Taking the nth row of each group

+

To select the nth item from each group, use :meth:`.DataFrameGroupBy.nth` or +:meth:`.SeriesGroupBy.nth`. Arguments supplied can be any integer, lists of integers, +slices, or lists of slices; see below for examples. When the nth element of a group +does not exist an error is not raised; instead no corresponding rows are returned.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1466); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1466); backlink

+Unknown interpreted text role "meth".
+

In general this operation acts as a filtration. In certain cases it will also return +one row per group, making it also a reduction. However because in general it can +return zero or multiple rows per group, pandas treats it as a filtration in all cases.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1475)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=["A", "B"])
+   g = df.groupby("A")
+
+   g.nth(0)
+   g.nth(-1)
+   g.nth(1)
+
+
+
+

If the nth element of a group does not exist, then no corresponding row is included +in the result. In particular, if the specified n is larger than any group, the +result will be an empty DataFrame.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1488)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   g.nth(5)
+
+
+
+

If you want to select the nth not-null item, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1494)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   # nth(0) is the same as g.first()
+   g.nth(0, dropna="any")
+   g.first()
+
+   # nth(-1) is the same as g.last()
+   g.nth(-1, dropna="any")
+   g.last()
+
+   g.B.nth(0, dropna="all")
+
+
+
+

You can also select multiple rows from each group by specifying multiple nth values as a list of ints.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1508)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   business_dates = pd.date_range(start="4/1/2014", end="6/30/2014", freq="B")
+   df = pd.DataFrame(1, index=business_dates, columns=["a", "b"])
+   # get the first, 4th, and last date index for each month
+   df.groupby([df.index.year, df.index.month]).nth([0, 3, -1])
+
+
+
+

You may also use slices or lists of slices.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1517)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df.groupby([df.index.year, df.index.month]).nth[1:]
+   df.groupby([df.index.year, df.index.month]).nth[1:, :-1]
+
+
+
+
+
+

Enumerate group items

+

To see the order in which each row appears within its group, use the +cumcount method:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1528)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   dfg = pd.DataFrame(list("aaabba"), columns=["A"])
+   dfg
+
+   dfg.groupby("A").cumcount()
+
+   dfg.groupby("A").cumcount(ascending=False)
+
+
+
+
+
+

Enumerate groups

+

To see the ordering of the groups (as opposed to the order of rows +within a group given by cumcount) you can use +:meth:`~pandas.core.groupby.DataFrameGroupBy.ngroup`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1542); backlink

+Unknown interpreted text role "meth".
+

Note that the numbers given to the groups match the order in which the +groups would be seen when iterating over the groupby object, not the +order they are first observed.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1552)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   dfg = pd.DataFrame(list("aaabba"), columns=["A"])
+   dfg
+
+   dfg.groupby("A").ngroup()
+
+   dfg.groupby("A").ngroup(ascending=False)
+
+
+
+
+
+

Plotting

+

Groupby also works with some plotting methods. In this case, suppose we +suspect that the values in column 1 are 3 times higher on average in group "B".

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1568)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   np.random.seed(1234)
+   df = pd.DataFrame(np.random.randn(50, 2))
+   df["g"] = np.random.choice(["A", "B"], size=50)
+   df.loc[df["g"] == "B", 1] += 3
+
+
+
+

We can easily visualize this with a boxplot:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1577)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+   :okwarning:
+
+   @savefig groupby_boxplot.png
+   df.groupby("g").boxplot()
+
+
+
+

The result of calling boxplot is a dictionary whose keys are the values +of our grouping column g ("A" and "B"). The values of the resulting dictionary +can be controlled by the return_type keyword of boxplot. +See the :ref:`visualization documentation<visualization.box>` for more.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1583); backlink

+Unknown interpreted text role "ref".
+
+

Warning

+

For historical reasons, df.groupby("g").boxplot() is not equivalent +to df.boxplot(by="g"). See :ref:`here<visualization.box.return>` for +an explanation.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1590); backlink

+Unknown interpreted text role "ref".
+
+
+
+

Piping function calls

+

Similar to the functionality provided by DataFrame and Series, functions +that take GroupBy objects can be chained together using a pipe method to +allow for a cleaner, more readable syntax. To read about .pipe in general terms, +see :ref:`here <basics.pipe>`.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1599); backlink

+Unknown interpreted text role "ref".
+

Combining .groupby and .pipe is often useful when you need to reuse +GroupBy objects.

+

As an example, imagine having a DataFrame with columns for stores, products, +revenue and quantity sold. We'd like to do a groupwise calculation of prices +(i.e. revenue/quantity) per store and per product. We could do this in a +multi-step operation, but expressing it in terms of piping can make the +code more readable. First we set the data:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1613)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   n = 1000
+   df = pd.DataFrame(
+       {
+           "Store": np.random.choice(["Store_1", "Store_2"], n),
+           "Product": np.random.choice(["Product_1", "Product_2"], n),
+           "Revenue": (np.random.random(n) * 50 + 10).round(2),
+           "Quantity": np.random.randint(1, 10, size=n),
+       }
+   )
+   df.head(2)
+
+
+
+

Now, to find prices per store/product, we can simply do:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1628)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   (
+       df.groupby(["Store", "Product"])
+       .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum())
+       .unstack()
+       .round(2)
+   )
+
+
+
+

Piping can also be expressive when you want to deliver a grouped object to some +arbitrary function, for example:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1640)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   def mean(groupby):
+       return groupby.mean()
+
+
+   df.groupby(["Store", "Product"]).pipe(mean)
+
+
+
+

Here mean takes a GroupBy object and finds the mean of the Revenue and Quantity +columns respectively for each Store-Product combination. The mean function can +be any function that takes in a GroupBy object; the .pipe will pass the GroupBy +object as a parameter into the function you specify.

+
+
+
+

Examples

+
+

Regrouping by factor

+

Regroup columns of a DataFrame according to their sum, and sum the aggregated ones.

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1661)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = pd.DataFrame({"a": [1, 0, 0], "b": [0, 1, 0], "c": [1, 0, 0], "d": [2, 3, 4]})
+   df
+   dft = df.T
+   dft.groupby(dft.sum()).sum()
+
+
+
+
+
+

Multi-column factorization

+

By using :meth:`~pandas.core.groupby.DataFrameGroupBy.ngroup`, we can extract +information about the groups in a way similar to :func:`factorize` (as described +further in the :ref:`reshaping API <reshaping.factorize>`) but which applies +naturally to multiple columns of mixed type and different +sources. This can be useful as an intermediate categorical-like step +in processing, when the relationships between the group rows are more +important than their content, or as input to an algorithm which only +accepts the integer encoding. (For more information about support in +pandas for full categorical data, see the :ref:`Categorical +introduction <categorical>` and the +:ref:`API documentation <api.arrays.categorical>`.)

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

+Unknown interpreted text role "meth".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

+Unknown interpreted text role "func".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

+Unknown interpreted text role "ref".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

+Unknown interpreted text role "ref".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

+Unknown interpreted text role "ref".
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1685)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+    dfg = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")})
+
+    dfg
+
+    dfg.groupby(["A", "B"]).ngroup()
+
+    dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup()
+
+
+
+
+
+

Groupby by indexer to 'resample' data

+

Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.

+

In order for resample to work on indices that are non-datetimelike, the following procedure can be utilized.

+

In the following examples, df.index // 5 returns a binary array which is used to determine what gets selected for the groupby operation.

+
+

Note

+

The example below shows how we can downsample by consolidation of samples into fewer ones. +Here by using df.index // 5, we are aggregating the samples in bins. By applying std() +function, we aggregate the information contained in many samples into a small subset of values +which is their standard deviation thereby reducing the number of samples.

+
+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1711)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = pd.DataFrame(np.random.randn(10, 2))
+   df
+   df.index // 5
+   df.groupby(df.index // 5).std()
+
+
+
+
+
+

Returning a Series to propagate names

+

Group DataFrame columns, compute a set of metrics and return a named Series. +The Series name is used as the name for the column index. This is especially +useful in conjunction with reshaping operations such as stacking, in which the +column index name will be used as the name of the inserted column:

+
+

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1726)

+

Unknown directive type "ipython".

+
+.. ipython:: python
+
+   df = pd.DataFrame(
+       {
+           "a": [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
+           "b": [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
+           "c": [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
+           "d": [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
+       }
+   )
+
+   def compute_metrics(x):
+       result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
+       return pd.Series(result, name="metrics")
+
+   result = df.groupby("a").apply(compute_metrics)
+
+   result
+
+   result.stack()
+
+
+
+
+
+
+ + diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst index 482e3fe91ca09..7fcd84e0e1f90 100644 --- a/doc/source/user_guide/groupby.rst +++ b/doc/source/user_guide/groupby.rst @@ -722,16 +722,45 @@ accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGr to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias. +Example: +------- + +Consider the following DataFrame `animals`: + .. ipython:: python - animals + import pandas as pd - animals.groupby("kind").agg( - min_height=pd.NamedAgg(column="height", aggfunc="min"), - max_height=pd.NamedAgg(column="height", aggfunc="max"), - average_weight=pd.NamedAgg(column="weight", aggfunc="mean"), - ) + animals = pd.DataFrame({ + 'kind': ['cat', 'dog', 'cat', 'dog'], + 'height': [9.1, 6.0, 9.5, 34.0], + 'weight': [7.9, 7.5, 9.9, 198.0] + }) + +To demonstrate "named aggregation," let's group the DataFrame by the 'kind' column and apply different aggregations to the 'height' and 'weight' columns: + +.. ipython:: python + + result = animals.groupby('kind').agg( + min_height=pd.NamedAgg(column='height', aggfunc='min'), + max_height=pd.NamedAgg(column='height', aggfunc='max'), + average_weight=pd.NamedAgg(column='weight', aggfunc='mean') + ) + +In the above example, we used "named aggregation" to specify custom output column names (`min_height`, `max_height`, and `average_weight`) for each aggregation. The result will be a new DataFrame with the aggregated values, and the output column names will be as specified. + +The resulting DataFrame will look like this: + +.. code-block:: bash + + min_height max_height average_weight + kind + cat 9.1 9.5 8.90 + dog 6.0 34.0 102.75 + +In this example, the 'min_height' column contains the minimum height for each group, the 'max_height' column contains the maximum height, and the 'average_weight' column contains the average weight for each group. +By using "named aggregation," you can easily control the output column names and have more descriptive results when performing aggregations with `groupby.agg`. :class:`NamedAgg` is just a ``namedtuple``. Plain tuples are allowed as well. diff --git a/pandas/etc/profile.d/micromamba.sh b/pandas/etc/profile.d/micromamba.sh new file mode 100644 index 0000000000000..c0b515215e338 --- /dev/null +++ b/pandas/etc/profile.d/micromamba.sh @@ -0,0 +1,66 @@ +# Copyright (C) 2012 Anaconda, Inc +# SPDX-License-Identifier: BSD-3-Clause + +__mamba_exe() ( + "$MAMBA_EXE" "${@}" +) + +__mamba_hashr() { + if [ -n "${ZSH_VERSION:+x}" ]; then + \rehash + elif [ -n "${POSH_VERSION:+x}" ]; then + : # pass + else + \hash -r + fi +} + +__mamba_xctivate() { + \local ask_conda + ask_conda="$(PS1="${PS1:-}" __mamba_exe shell "${@}" --shell bash)" || \return + \eval "${ask_conda}" + __mamba_hashr +} + +micromamba() { + \local cmd="${1-__missing__}" + case "${cmd}" in + activate|reactivate|deactivate) + __mamba_xctivate "${@}" + ;; + install|update|upgrade|remove|uninstall) + __mamba_exe "${@}" || \return + __mamba_xctivate reactivate + ;; + self-update) + __mamba_exe "${@}" || \return + + # remove leftover backup file on Windows + if [ -f "$MAMBA_EXE.bkup" ]; then + rm -f "$MAMBA_EXE.bkup" + fi + ;; + *) + __mamba_exe "${@}" + ;; + esac +} + +if [ -z "${CONDA_SHLVL+x}" ]; then + \export CONDA_SHLVL=0 + # In dev-mode MAMBA_EXE is python.exe and on Windows + # it is in a different relative location to condabin. + if [ -n "${_CE_CONDA+x}" ] && [ -n "${WINDIR+x}" ]; then + PATH="${MAMBA_ROOT_PREFIX}/condabin:${PATH}" + else + PATH="${MAMBA_ROOT_PREFIX}/condabin:${PATH}" + fi + \export PATH + + # We're not allowing PS1 to be unbound. It must at least be set. + # However, we're not exporting it, which can cause problems when starting a second shell + # via a first shell (i.e. starting zsh from bash). + if [ -z "${PS1+x}" ]; then + PS1= + fi +fi From df7a8852fc85390bf55dd8f8c5df5278ae1551f4 Mon Sep 17 00:00:00 2001 From: Immanuella Umoren Date: Thu, 10 Aug 2023 21:10:52 -0700 Subject: [PATCH 2/7] Issue #18220 Added a missing documentation on an existing behavior of aggregation. --- doc/source/user_guide/groupby.rst | 54 +++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst index 7fcd84e0e1f90..e7e14e3855873 100644 --- a/doc/source/user_guide/groupby.rst +++ b/doc/source/user_guide/groupby.rst @@ -799,6 +799,60 @@ no column selection, so the values are just the functions. max_height="max", ) + +Passing a List of Tuples +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Instead of a dictionary, you can also pass a list of tuples to the `agg` method to achieve similar results. Each tuple contains the output column name as the first element and the aggregation function as the second element. This approach is particularly useful for applying multiple aggregations on the same column. + +Example: +-------- + +Consider the following DataFrame `df`: + +.. ipython:: python + + import pandas as pd + import numpy as np + + df = pd.DataFrame({'key': ['a', 'a', 'b', 'b', 'a'], + 'data': np.random.randn(5)}) + +Suppose we want to group the DataFrame by the 'key' column and apply different aggregations to the 'data' column: + +.. ipython:: python + + result = df.groupby('key')['data'].agg([('foo', 'mean')]) + +In this example, the output column 'foo' contains the mean value of the 'data' column for each group. + +To apply multiple aggregations to the same column, you can pass a list of tuples: + +.. ipython:: python + + result = df.groupby('key')['data'].agg([('col1', 'mean'), ('col2', 'std')]) + +In this case, the resulting DataFrame will have two columns: 'col1' containing the mean and 'col2' containing the standard deviation of the 'data' column for each group. + +Similarly, you can extend this approach to include more aggregations: + +.. ipython:: python + + result = df.groupby('key')['data'].agg([('col1', 'mean'), ('col2', 'std'), ('col3', 'min')]) + +Here, the resulting DataFrame will have three columns: 'col1', 'col2', and 'col3', each containing the respective aggregation result for the 'data' column. + +In addition to the examples above, let's consider a scenario where we want to calculate both the mean and the median of the 'data' column for each group: + +.. ipython:: python + + result = df.groupby('key')['data'].agg([('mean_value', 'mean'), ('median_value', 'median')]) + +The resulting DataFrame will have two columns: 'mean_value' and 'median_value', each containing the corresponding aggregation results. + +Using a list of tuples provides a concise way to apply multiple aggregations to the same column while controlling the output column names. This approach is especially handy when you need to calculate various statistics on the same data within each group. + + Applying different functions to DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 9d67daaf60820fffa561da62d44fb213dc38c933 Mon Sep 17 00:00:00 2001 From: Immanuella Umoren Date: Thu, 10 Aug 2023 21:27:20 -0700 Subject: [PATCH 3/7] Resolved conflicts in groupby.rst --- doc/source/user_guide/groupby.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst index e7e14e3855873..32e43091f61c9 100644 --- a/doc/source/user_guide/groupby.rst +++ b/doc/source/user_guide/groupby.rst @@ -799,7 +799,6 @@ no column selection, so the values are just the functions. max_height="max", ) - Passing a List of Tuples ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -852,7 +851,6 @@ The resulting DataFrame will have two columns: 'mean_value' and 'median_value', Using a list of tuples provides a concise way to apply multiple aggregations to the same column while controlling the output column names. This approach is especially handy when you need to calculate various statistics on the same data within each group. - Applying different functions to DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From aaab736520ff39ebdb8f5b3ff13f8ff2c9ee8b2a Mon Sep 17 00:00:00 2001 From: Immanuella Umoren Date: Thu, 10 Aug 2023 21:33:53 -0700 Subject: [PATCH 4/7] Added a missing documentation on an existing behavior of aggregation. --- doc/source/user_guide/groupby.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst index 32e43091f61c9..64331d84ce325 100644 --- a/doc/source/user_guide/groupby.rst +++ b/doc/source/user_guide/groupby.rst @@ -851,6 +851,24 @@ The resulting DataFrame will have two columns: 'mean_value' and 'median_value', Using a list of tuples provides a concise way to apply multiple aggregations to the same column while controlling the output column names. This approach is especially handy when you need to calculate various statistics on the same data within each group. +For a copy-pastable example, consider the following DataFrame `df`: + +.. ipython:: python + + import pandas as pd + import numpy as np + + df = pd.DataFrame({'key': ['a', 'a', 'b', 'b', 'a'], + 'data': np.random.randn(5)}) + +You can then use the `agg` function with a list of tuples for aggregations: + +.. ipython:: python + + result = df.groupby('key')['data'].agg([('foo', 'mean')]) + +This will create a DataFrame with the mean values for each group under the 'foo' column. + Applying different functions to DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 2dd7adb7437319254e4f6f983d41222dc4cd73c5 Mon Sep 17 00:00:00 2001 From: Immanuella Umoren Date: Fri, 11 Aug 2023 08:01:02 -0700 Subject: [PATCH 5/7] Added a missing documentation on an existing behavior of aggregation. --- doc/source/user_guide/groupby.rst | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst index 64331d84ce325..a19920b0943e2 100644 --- a/doc/source/user_guide/groupby.rst +++ b/doc/source/user_guide/groupby.rst @@ -810,48 +810,38 @@ Example: Consider the following DataFrame `df`: .. ipython:: python - import pandas as pd import numpy as np - df = pd.DataFrame({'key': ['a', 'a', 'b', 'b', 'a'], 'data': np.random.randn(5)}) - Suppose we want to group the DataFrame by the 'key' column and apply different aggregations to the 'data' column: .. ipython:: python - result = df.groupby('key')['data'].agg([('foo', 'mean')]) - In this example, the output column 'foo' contains the mean value of the 'data' column for each group. To apply multiple aggregations to the same column, you can pass a list of tuples: .. ipython:: python - result = df.groupby('key')['data'].agg([('col1', 'mean'), ('col2', 'std')]) - In this case, the resulting DataFrame will have two columns: 'col1' containing the mean and 'col2' containing the standard deviation of the 'data' column for each group. Similarly, you can extend this approach to include more aggregations: .. ipython:: python - result = df.groupby('key')['data'].agg([('col1', 'mean'), ('col2', 'std'), ('col3', 'min')]) - Here, the resulting DataFrame will have three columns: 'col1', 'col2', and 'col3', each containing the respective aggregation result for the 'data' column. In addition to the examples above, let's consider a scenario where we want to calculate both the mean and the median of the 'data' column for each group: .. ipython:: python - result = df.groupby('key')['data'].agg([('mean_value', 'mean'), ('median_value', 'median')]) - The resulting DataFrame will have two columns: 'mean_value' and 'median_value', each containing the corresponding aggregation results. Using a list of tuples provides a concise way to apply multiple aggregations to the same column while controlling the output column names. This approach is especially handy when you need to calculate various statistics on the same data within each group. For a copy-pastable example, consider the following DataFrame `df`: +<<<<<<< HEAD .. ipython:: python @@ -869,6 +859,17 @@ You can then use the `agg` function with a list of tuples for aggregations: This will create a DataFrame with the mean values for each group under the 'foo' column. +.. ipython:: python + import pandas as pd + import numpy as np + df = pd.DataFrame({'key': ['a', 'a', 'b', 'b', 'a'], + 'data': np.random.randn(5)}) +You can then use the `agg` function with a list of tuples for aggregations: + +.. ipython:: python + result = df.groupby('key')['data'].agg([('foo', 'mean')]) +This will create a DataFrame with the mean values for each group under the 'foo' column. + Applying different functions to DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 5a7aa35bb7c2591bac4900fab796e6ae20d5a6b5 Mon Sep 17 00:00:00 2001 From: Immanuella Umoren Date: Fri, 11 Aug 2023 10:18:50 -0500 Subject: [PATCH 6/7] Delete micromamba.sh --- pandas/etc/profile.d/micromamba.sh | 66 ------------------------------ 1 file changed, 66 deletions(-) delete mode 100644 pandas/etc/profile.d/micromamba.sh diff --git a/pandas/etc/profile.d/micromamba.sh b/pandas/etc/profile.d/micromamba.sh deleted file mode 100644 index c0b515215e338..0000000000000 --- a/pandas/etc/profile.d/micromamba.sh +++ /dev/null @@ -1,66 +0,0 @@ -# Copyright (C) 2012 Anaconda, Inc -# SPDX-License-Identifier: BSD-3-Clause - -__mamba_exe() ( - "$MAMBA_EXE" "${@}" -) - -__mamba_hashr() { - if [ -n "${ZSH_VERSION:+x}" ]; then - \rehash - elif [ -n "${POSH_VERSION:+x}" ]; then - : # pass - else - \hash -r - fi -} - -__mamba_xctivate() { - \local ask_conda - ask_conda="$(PS1="${PS1:-}" __mamba_exe shell "${@}" --shell bash)" || \return - \eval "${ask_conda}" - __mamba_hashr -} - -micromamba() { - \local cmd="${1-__missing__}" - case "${cmd}" in - activate|reactivate|deactivate) - __mamba_xctivate "${@}" - ;; - install|update|upgrade|remove|uninstall) - __mamba_exe "${@}" || \return - __mamba_xctivate reactivate - ;; - self-update) - __mamba_exe "${@}" || \return - - # remove leftover backup file on Windows - if [ -f "$MAMBA_EXE.bkup" ]; then - rm -f "$MAMBA_EXE.bkup" - fi - ;; - *) - __mamba_exe "${@}" - ;; - esac -} - -if [ -z "${CONDA_SHLVL+x}" ]; then - \export CONDA_SHLVL=0 - # In dev-mode MAMBA_EXE is python.exe and on Windows - # it is in a different relative location to condabin. - if [ -n "${_CE_CONDA+x}" ] && [ -n "${WINDIR+x}" ]; then - PATH="${MAMBA_ROOT_PREFIX}/condabin:${PATH}" - else - PATH="${MAMBA_ROOT_PREFIX}/condabin:${PATH}" - fi - \export PATH - - # We're not allowing PS1 to be unbound. It must at least be set. - # However, we're not exporting it, which can cause problems when starting a second shell - # via a first shell (i.e. starting zsh from bash). - if [ -z "${PS1+x}" ]; then - PS1= - fi -fi From 4890b72544b885cbb1e4f1d1b2080b9d291d8496 Mon Sep 17 00:00:00 2001 From: Immanuella Umoren Date: Fri, 11 Aug 2023 10:19:21 -0500 Subject: [PATCH 7/7] Delete groupby.html --- doc/source/user_guide/groupby.html | 2945 ---------------------------- 1 file changed, 2945 deletions(-) delete mode 100644 doc/source/user_guide/groupby.html diff --git a/doc/source/user_guide/groupby.html b/doc/source/user_guide/groupby.html deleted file mode 100644 index 4630138cad90d..0000000000000 --- a/doc/source/user_guide/groupby.html +++ /dev/null @@ -1,2945 +0,0 @@ - - - - - - -groupby.rst - - - -
- - -

{{ header }}

-
-

Group by: split-apply-combine

-

By "group by" we are referring to a process involving one or more of the following -steps:

-
    -
  • Splitting the data into groups based on some criteria.
  • -
  • Applying a function to each group independently.
  • -
  • Combining the results into a data structure.
  • -
-

Out of these, the split step is the most straightforward. In fact, in many -situations we may wish to split the data set into groups and do something with -those groups. In the apply step, we might wish to do one of the -following:

-
    -
  • Aggregation: compute a summary statistic (or statistics) for each -group. Some examples:

    -
    -
      -
    • Compute group sums or means.
    • -
    • Compute group sizes / counts.
    • -
    -
    -
  • -
  • Transformation: perform some group-specific computations and return a -like-indexed object. Some examples:

    -
    -
      -
    • Standardize data (zscore) within a group.
    • -
    • Filling NAs within groups with a value derived from each group.
    • -
    -
    -
  • -
  • Filtration: discard some groups, according to a group-wise computation -that evaluates to True or False. Some examples:

    -
    -
      -
    • Discard data that belong to groups with only a few members.
    • -
    • Filter out data based on the group sum or mean.
    • -
    -
    -
  • -
-

Many of these operations are defined on GroupBy objects. These operations are similar -to those of the :ref:`aggregating API <basics.aggregate>`, -:ref:`window API <window.overview>`, and :ref:`resample API <timeseries.aggregate>`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 39); backlink

-Unknown interpreted text role "ref".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 39); backlink

-Unknown interpreted text role "ref".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 39); backlink

-Unknown interpreted text role "ref".
-

It is possible that a given operation does not fall into one of these categories or -is some combination of them. In such a case, it may be possible to compute the -operation using GroupBy's apply method. This method will examine the results of the -apply step and try to sensibly combine them into a single result if it doesn't fit into either -of the above three categories.

-
-

Note

-

An operation that is split into multiple steps using built-in GroupBy operations -will be more efficient than using the apply method with a user-defined Python -function.

-
-

Since the set of object instance methods on pandas data structures is generally -rich and expressive, we often simply want to invoke, say, a DataFrame function -on each group. The name GroupBy should be quite familiar to those who have used -a SQL-based tool (or itertools), in which you can write code like:

-
-SELECT Column1, Column2, mean(Column3), sum(Column4)
-FROM SomeTable
-GROUP BY Column1, Column2
-
-

We aim to make operations like this natural and easy to express using -pandas. We'll address each area of GroupBy functionality then provide some -non-trivial examples / use cases.

-

See the :ref:`cookbook<cookbook.grouping>` for some advanced strategies.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 71); backlink

-Unknown interpreted text role "ref".
-
-

Splitting an object into groups

-

The abstract definition of grouping is to provide a mapping of labels to -group names. To create a GroupBy object (more on what the GroupBy object is -later), you may do the following:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 82)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    speeds = pd.DataFrame(
-        [
-            ("bird", "Falconiformes", 389.0),
-            ("bird", "Psittaciformes", 24.0),
-            ("mammal", "Carnivora", 80.2),
-            ("mammal", "Primates", np.nan),
-            ("mammal", "Carnivora", 58),
-        ],
-        index=["falcon", "parrot", "lion", "monkey", "leopard"],
-        columns=("class", "order", "max_speed"),
-    )
-    speeds
-
-    grouped = speeds.groupby("class")
-    grouped = speeds.groupby(["class", "order"])
-
-
-
-

The mapping can be specified many different ways:

-
    -
  • A Python function, to be called on each of the index labels.
  • -
  • A list or NumPy array of the same length as the index.
  • -
  • A dict or Series, providing a label -> group name mapping.
  • -
  • For DataFrame objects, a string indicating either a column name or -an index level name to be used to group.
  • -
  • A list of any of the above things.
  • -
-

Collectively we refer to the grouping objects as the keys. For example, -consider the following DataFrame:

-
-

Note

-

A string passed to groupby may refer to either a column or an index level. -If a string matches both a column name and an index level name, a -ValueError will be raised.

-
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 118)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = pd.DataFrame(
-       {
-           "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
-           "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
-           "C": np.random.randn(8),
-           "D": np.random.randn(8),
-       }
-   )
-   df
-
-
-
-

On a DataFrame, we obtain a GroupBy object by calling :meth:`~DataFrame.groupby`. -This method returns a pandas.api.typing.DataFrameGroupBy instance. -We could naturally group by either the A or B columns, or both:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 130); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 134)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped = df.groupby("A")
-   grouped = df.groupby(["A", "B"])
-
-
-
-
-

Note

-

df.groupby('A') is just syntactic sugar for df.groupby(df['A']).

-
-

If we also have a MultiIndex on columns A and B, we can group by all -the columns except the one we specify:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 146)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df2 = df.set_index(["A", "B"])
-   grouped = df2.groupby(level=df2.index.names.difference(["B"]))
-   grouped.sum()
-
-
-
-

The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do -a transpose:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 155)

-

Unknown directive type "ipython".

-
-.. ipython::
-
-    In [4]: def get_letter_type(letter):
-       ...:     if letter.lower() in 'aeiou':
-       ...:         return 'vowel'
-       ...:     else:
-       ...:         return 'consonant'
-       ...:
-
-    In [5]: grouped = df.T.groupby(get_letter_type)
-
-
-
-

pandas :class:`~pandas.Index` objects support duplicate values. If a -non-unique index is used as the group key in a groupby operation, all values -for the same index value will be considered to be in one group and thus the -output of aggregation functions will only contain unique index values:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 166); backlink

-Unknown interpreted text role "class".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 171)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   lst = [1, 2, 3, 1, 2, 3]
-
-   s = pd.Series([1, 2, 3, 10, 20, 30], lst)
-
-   grouped = s.groupby(level=0)
-
-   grouped.first()
-
-   grouped.last()
-
-   grouped.sum()
-
-
-
-

Note that no splitting occurs until it's needed. Creating the GroupBy object -only verifies that you've passed a valid mapping.

-
-

Note

-

Many kinds of complicated data manipulations can be expressed in terms of -GroupBy operations (though it can't be guaranteed to be the most efficient implementation). -You can get quite creative with the label mapping functions.

-
-
-

GroupBy sorting

-

By default the group keys are sorted during the groupby operation. You may however pass sort=False for potential speedups. With sort=False the order among group-keys follows the order of appearance of the keys in the original dataframe:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 201)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df2 = pd.DataFrame({"X": ["B", "B", "A", "A"], "Y": [1, 2, 3, 4]})
-   df2.groupby(["X"]).sum()
-   df2.groupby(["X"], sort=False).sum()
-
-
-
-
-

Note that groupby will preserve the order in which observations are sorted within each group. -For example, the groups created by groupby() below are in the order they appeared in the original DataFrame:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 211)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df3 = pd.DataFrame({"X": ["A", "B", "A", "B"], "Y": [1, 4, 3, 2]})
-   df3.groupby(["X"]).get_group("A")
-
-   df3.groupby(["X"]).get_group("B")
-
-
-
-
-
-

GroupBy dropna

-

By default NA values are excluded from group keys during the groupby operation. However, -in case you want to include NA values in group keys, you could pass dropna=False to achieve it.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 227)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    df_list = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
-    df_dropna = pd.DataFrame(df_list, columns=["a", "b", "c"])
-
-    df_dropna
-
-
-
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 234)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    # Default ``dropna`` is set to True, which will exclude NaNs in keys
-    df_dropna.groupby(by=["b"], dropna=True).sum()
-
-    # In order to allow NaN in keys, set ``dropna`` to False
-    df_dropna.groupby(by=["b"], dropna=False).sum()
-
-
-
-

The default setting of dropna argument is True which means NA are not included in group keys.

-
-
-
-

GroupBy object attributes

-

The groups attribute is a dictionary whose keys are the computed unique groups -and corresponding values are the axis labels belonging to each group. In the -above example we have:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 254)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby("A").groups
-   df.T.groupby(get_letter_type).groups
-
-
-
-

Calling the standard Python len function on the GroupBy object just returns -the length of the groups dict, so it is largely just a convenience:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 262)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped = df.groupby(["A", "B"])
-   grouped.groups
-   len(grouped)
-
-
-
-
-

GroupBy will tab complete column names (and other attributes):

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 273)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   n = 10
-   weight = np.random.normal(166, 20, size=n)
-   height = np.random.normal(60, 10, size=n)
-   time = pd.date_range("1/1/2000", periods=n)
-   gender = np.random.choice(["male", "female"], size=n)
-   df = pd.DataFrame(
-       {"height": height, "weight": weight, "gender": gender}, index=time
-   )
-   df
-   gb = df.groupby("gender")
-
-
-
-
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 287)

-

Unknown directive type "ipython".

-
-.. ipython::
-
-   @verbatim
-   In [1]: gb.<TAB>  # noqa: E225, E999
-   gb.agg        gb.boxplot    gb.cummin     gb.describe   gb.filter     gb.get_group  gb.height     gb.last       gb.median     gb.ngroups    gb.plot       gb.rank       gb.std        gb.transform
-   gb.aggregate  gb.count      gb.cumprod    gb.dtype      gb.first      gb.groups     gb.hist       gb.max        gb.min        gb.nth        gb.prod       gb.resample   gb.sum        gb.var
-   gb.apply      gb.cummax     gb.cumsum     gb.fillna     gb.gender     gb.head       gb.indices    gb.mean       gb.name       gb.ohlc       gb.quantile   gb.size       gb.tail       gb.weight
-
-
-
-
-
-

GroupBy with MultiIndex

-

With :ref:`hierarchically-indexed data <advanced.hierarchical>`, it's quite -natural to group by one of the levels of the hierarchy.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 300); backlink

-Unknown interpreted text role "ref".
-

Let's create a Series with a two-level MultiIndex.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 305)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-
-   arrays = [
-       ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
-       ["one", "two", "one", "two", "one", "two", "one", "two"],
-   ]
-   index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
-   s = pd.Series(np.random.randn(8), index=index)
-   s
-
-
-
-

We can then group by one of the levels in s.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 318)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped = s.groupby(level=0)
-   grouped.sum()
-
-
-
-

If the MultiIndex has names specified, these can be passed instead of the level -number:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 326)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   s.groupby(level="second").sum()
-
-
-
-

Grouping with multiple levels is supported.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 332)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   arrays = [
-       ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
-       ["doo", "doo", "bee", "bee", "bop", "bop", "bop", "bop"],
-       ["one", "two", "one", "two", "one", "two", "one", "two"],
-   ]
-   index = pd.MultiIndex.from_arrays(arrays, names=["first", "second", "third"])
-   s = pd.Series(np.random.randn(8), index=index)
-   s
-   s.groupby(level=["first", "second"]).sum()
-
-
-
-

Index level names may be supplied as keys.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 346)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   s.groupby(["first", "second"]).sum()
-
-
-
-

More on the sum function and aggregation later.

-
-
-

Grouping DataFrame with Index levels and columns

-

A DataFrame may be grouped by a combination of columns and index levels. You -can specify both column and index names, or use a :class:`Grouper`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 354); backlink

-Unknown interpreted text role "class".
-

Let's first create a DataFrame with a MultiIndex:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 359)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   arrays = [
-       ["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],
-       ["one", "two", "one", "two", "one", "two", "one", "two"],
-   ]
-
-   index = pd.MultiIndex.from_arrays(arrays, names=["first", "second"])
-
-   df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 3, 3], "B": np.arange(8)}, index=index)
-
-   df
-
-
-
-

Then we group df by the second index level and the A column.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 374)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby([pd.Grouper(level=1), "A"]).sum()
-
-
-
-

Index levels may also be specified by name.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 380)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby([pd.Grouper(level="second"), "A"]).sum()
-
-
-
-

Index level names may be specified as keys directly to groupby.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 386)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby(["second", "A"]).sum()
-
-
-
-
-
-

DataFrame column selection in GroupBy

-

Once you have created the GroupBy object from a DataFrame, you might want to do -something different for each of the columns. Thus, by using [] on the GroupBy -object in a similar way as the one used to get a column from a DataFrame, you can do:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 397)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = pd.DataFrame(
-       {
-           "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
-           "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
-           "C": np.random.randn(8),
-           "D": np.random.randn(8),
-       }
-   )
-
-   df
-
-   grouped = df.groupby(["A"])
-   grouped_C = grouped["C"]
-   grouped_D = grouped["D"]
-
-
-
-

This is mainly syntactic sugar for the alternative, which is much more verbose:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 416)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df["C"].groupby(df["A"])
-
-
-
-

Additionally, this method avoids recomputing the internal grouping information -derived from the passed key.

-
-
-
-

Iterating through groups

-

With the GroupBy object in hand, iterating through the grouped data is very -natural and functions similarly to :py:func:`itertools.groupby`:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 428); backlink

-Unknown interpreted text role "py:func".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 431)

-

Unknown directive type "ipython".

-
-.. ipython::
-
-   In [4]: grouped = df.groupby('A')
-
-   In [5]: for name, group in grouped:
-      ...:     print(name)
-      ...:     print(group)
-      ...:
-
-
-
-

In the case of grouping by multiple keys, the group name will be a tuple:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 442)

-

Unknown directive type "ipython".

-
-.. ipython::
-
-   In [5]: for name, group in df.groupby(['A', 'B']):
-      ...:     print(name)
-      ...:     print(group)
-      ...:
-
-
-
-

See :ref:`timeseries.iterating-label`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 449); backlink

-Unknown interpreted text role "ref".
-
-
-

Selecting a group

-

A single group can be selected using -:meth:`~pandas.core.groupby.DataFrameGroupBy.get_group`:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 454); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 457)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped.get_group("bar")
-
-
-
-

Or for an object grouped on multiple columns:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 463)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby(["A", "B"]).get_group(("bar", "one"))
-
-
-
-
-
-

Aggregation

-

An aggregation is a GroupBy operation that reduces the dimension of the grouping -object. The result of an aggregation is, or at least is treated as, -a scalar value for each column in a group. For example, producing the sum of each -column in a group of values.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 477)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   animals = pd.DataFrame(
-       {
-           "kind": ["cat", "dog", "cat", "dog"],
-           "height": [9.1, 6.0, 9.5, 34.0],
-           "weight": [7.9, 7.5, 9.9, 198.0],
-       }
-   )
-   animals
-   animals.groupby("kind").sum()
-
-
-
-

In the result, the keys of the groups appear in the index by default. They can be -instead included in the columns by passing as_index=False.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 492)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   animals.groupby("kind", as_index=False).sum()
-
-
-
-
-

Built-in aggregation methods

-

Many common aggregations are built-in to GroupBy objects as methods. Of the methods -listed below, those with a * do not have a Cython-optimized implementation.

- ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MethodDescription

:meth:`~.DataFrameGroupBy.any`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute whether any of the values in the groups are truthy

:meth:`~.DataFrameGroupBy.all`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute whether all of the values in the groups are truthy

:meth:`~.DataFrameGroupBy.count`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the number of non-NA values in the groups

:meth:`~.DataFrameGroupBy.cov` *

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the covariance of the groups

:meth:`~.DataFrameGroupBy.first`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the first occurring value in each group

:meth:`~.DataFrameGroupBy.idxmax` *

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the index of the maximum value in each group

:meth:`~.DataFrameGroupBy.idxmin` *

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the index of the minimum value in each group

:meth:`~.DataFrameGroupBy.last`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the last occurring value in each group

:meth:`~.DataFrameGroupBy.max`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the maximum value in each group

:meth:`~.DataFrameGroupBy.mean`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the mean of each group

:meth:`~.DataFrameGroupBy.median`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the median of each group

:meth:`~.DataFrameGroupBy.min`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the minimum value in each group

:meth:`~.DataFrameGroupBy.nunique`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the number of unique values in each group

:meth:`~.DataFrameGroupBy.prod`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the product of the values in each group

:meth:`~.DataFrameGroupBy.quantile`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute a given quantile of the values in each group

:meth:`~.DataFrameGroupBy.sem`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the standard error of the mean of the values in each group

:meth:`~.DataFrameGroupBy.size`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the number of values in each group

:meth:`~.DataFrameGroupBy.skew` *

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the skew of the values in each group

:meth:`~.DataFrameGroupBy.std`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the standard deviation of the values in each group

:meth:`~.DataFrameGroupBy.sum`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the sum of the values in each group

:meth:`~.DataFrameGroupBy.var`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 509); backlink

-Unknown interpreted text role "meth".
-
Compute the variance of the values in each group
-

Some examples:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 533)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby("A")[["C", "D"]].max()
-   df.groupby(["A", "B"]).mean()
-
-
-
-

Another simple aggregation example is to compute the size of each group. -This is included in GroupBy as the size method. It returns a Series whose -index are the group names and whose values are the sizes of each group.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 542)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped = df.groupby(["A", "B"])
-   grouped.size()
-
-
-
-

While the :meth:`~.DataFrameGroupBy.describe` method is not itself a reducer, it -can be used to conveniently produce a collection of summary statistics about each of -the groups.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 547); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 551)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped.describe()
-
-
-
-

Another aggregation example is to compute the number of unique values of each group. -This is similar to the value_counts function, except that it only counts the -number of unique values.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 559)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   ll = [['foo', 1], ['foo', 2], ['foo', 2], ['bar', 1], ['bar', 1]]
-   df4 = pd.DataFrame(ll, columns=["A", "B"])
-   df4
-   df4.groupby("A")["B"].nunique()
-
-
-
-
-

Note

-

Aggregation functions will not return the groups that you are aggregating over -as named columns, when as_index=True, the default. The grouped columns will -be the indices of the returned object.

-

Passing as_index=False will return the groups that you are aggregating over, if they are -named indices or columns.

-
-
-
-

The :meth:`~.DataFrameGroupBy.aggregate` method

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 578); backlink

-Unknown interpreted text role "meth".
-
-

Note

-

The :meth:`~.DataFrameGroupBy.aggregate` method can accept many different types of -inputs. This section details using string aliases for various GroupBy methods; other -inputs are detailed in the sections below.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 582); backlink

-Unknown interpreted text role "meth".
-
-

Any reduction method that pandas implements can be passed as a string to -:meth:`~.DataFrameGroupBy.aggregate`. Users are encouraged to use the shorthand, -agg. It will operate as if the corresponding method was called.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 586); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 590)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped = df.groupby("A")
-   grouped[["C", "D"]].aggregate("sum")
-
-   grouped = df.groupby(["A", "B"])
-   grouped.agg("sum")
-
-
-
-

The result of the aggregation will have the group names as the -new index along the grouped axis. In the case of multiple keys, the result is a -:ref:`MultiIndex <advanced.hierarchical>` by default. As mentioned above, this can be -changed by using the as_index option:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 598); backlink

-Unknown interpreted text role "ref".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 603)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped = df.groupby(["A", "B"], as_index=False)
-   grouped.agg("sum")
-
-   df.groupby("A", as_index=False)[["C", "D"]].agg("sum")
-
-
-
-

Note that you could use the :meth:`DataFrame.reset_index` DataFrame function to achieve -the same result as the column names are stored in the resulting MultiIndex, although -this will make an extra copy.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 610); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 614)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby(["A", "B"]).agg("sum").reset_index()
-
-
-
-
-
-

Aggregation with User-Defined Functions

-

Users can also provide their own User-Defined Functions (UDFs) for custom aggregations.

-
-

Warning

-

When aggregating with a UDF, the UDF should not mutate the -provided Series. See :ref:`gotchas.udf-mutation` for more information.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 627); backlink

-Unknown interpreted text role "ref".
-
-
-

Note

-

Aggregating with a UDF is often less performant than using -the pandas built-in methods on GroupBy. Consider breaking up a complex operation -into a chain of operations that utilize the built-in methods.

-
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 636)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   animals
-   animals.groupby("kind")[["height"]].agg(lambda x: set(x))
-
-
-
-

The resulting dtype will reflect that of the aggregating function. If the results from different groups have -different dtypes, then a common dtype will be determined in the same way as DataFrame construction.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 644)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   animals.groupby("kind")[["height"]].agg(lambda x: x.astype(int).sum())
-
-
-
-
-
-

Applying multiple functions at once

-

With grouped Series you can also pass a list or dict of functions to do -aggregation with, outputting a DataFrame:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 656)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped = df.groupby("A")
-   grouped["C"].agg(["sum", "mean", "std"])
-
-
-
-

On a grouped DataFrame, you can pass a list of functions to apply to each -column, which produces an aggregated result with a hierarchical index:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 664)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped[["C", "D"]].agg(["sum", "mean", "std"])
-
-
-
-
-

The resulting aggregations are named after the functions themselves. If you -need to rename, then you can add in a chained operation for a Series like this:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 672)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   (
-       grouped["C"]
-       .agg(["sum", "mean", "std"])
-       .rename(columns={"sum": "foo", "mean": "bar", "std": "baz"})
-   )
-
-
-
-

For a grouped DataFrame, you can rename in a similar manner:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 682)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   (
-       grouped[["C", "D"]].agg(["sum", "mean", "std"]).rename(
-           columns={"sum": "foo", "mean": "bar", "std": "baz"}
-       )
-   )
-
-
-
-
-

Note

-

In general, the output column names should be unique, but pandas will allow -you apply to the same function (or two functions with the same name) to the same -column.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 696)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped["C"].agg(["sum", "sum"])
-
-
-
-
-

pandas also allows you to provide multiple lambdas. In this case, pandas -will mangle the name of the (nameless) lambda functions, appending _<i> -to each subsequent lambda.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 705)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped["C"].agg([lambda x: x.max() - x.min(), lambda x: x.median() - x.mean()])
-
-
-
-
-
-

Named aggregation

-

To support column-specific aggregation with control over the output column names, pandas -accepts the special syntax in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg`, known as "named aggregation", where

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 715); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 715); backlink

-Unknown interpreted text role "meth".
-
    -
  • The keywords are the output column names

    -
  • -
  • The values are tuples whose first element is the column to select -and the second element is the aggregation to apply to that column. pandas -provides the :class:`NamedAgg` namedtuple with the fields ['column', 'aggfunc'] -to make it clearer what the arguments are. As usual, the aggregation can -be a callable or a string alias.

    -
    -

    System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 719); backlink

    -

    Unknown interpreted text role "class".

    -
    -
  • -
-
-
-
-

Example:

-

Consider the following DataFrame animals:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 730)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    import pandas as pd
-
-    animals = pd.DataFrame({
-        'kind': ['cat', 'dog', 'cat', 'dog'],
-        'height': [9.1, 6.0, 9.5, 34.0],
-        'weight': [7.9, 7.5, 9.9, 198.0]
-    })
-
-
-
-

To demonstrate "named aggregation," let's group the DataFrame by the 'kind' column and apply different aggregations to the 'height' and 'weight' columns:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 742)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   result = animals.groupby('kind').agg(
-        min_height=pd.NamedAgg(column='height', aggfunc='min'),
-        max_height=pd.NamedAgg(column='height', aggfunc='max'),
-        average_weight=pd.NamedAgg(column='weight', aggfunc='mean')
-    )
-
-
-
-

In the above example, we used "named aggregation" to specify custom output column names (min_height, max_height, and average_weight) for each aggregation. The result will be a new DataFrame with the aggregated values, and the output column names will be as specified.

-

The resulting DataFrame will look like this:

-
-   min_height  max_height  average_weight
-kind
-cat          9.1         9.5            8.90
-dog          6.0        34.0          102.75
-
-

In this example, the 'min_height' column contains the minimum height for each group, the 'max_height' column contains the maximum height, and the 'average_weight' column contains the average weight for each group.

-

By using "named aggregation," you can easily control the output column names and have more descriptive results when performing aggregations with groupby.agg.

-

:class:`NamedAgg` is just a namedtuple. Plain tuples are allowed as well.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 765); backlink

-Unknown interpreted text role "class".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 767)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   animals.groupby("kind").agg(
-       min_height=("height", "min"),
-       max_height=("height", "max"),
-       average_weight=("weight", "mean"),
-   )
-
-
-
-
-

If the column names you want are not valid Python keywords, construct a dictionary -and unpack the keyword arguments

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 779)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   animals.groupby("kind").agg(
-       **{
-           "total weight": pd.NamedAgg(column="weight", aggfunc="sum")
-       }
-   )
-
-
-
-

When using named aggregation, additional keyword arguments are not passed through -to the aggregation functions; only pairs -of (column, aggfunc) should be passed as **kwargs. If your aggregation functions -require additional arguments, apply them partially with :meth:`functools.partial`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 787); backlink

-Unknown interpreted text role "meth".
-

Named aggregation is also valid for Series groupby aggregations. In this case there's -no column selection, so the values are just the functions.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 795)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   animals.groupby("kind").height.agg(
-       min_height="min",
-       max_height="max",
-   )
-
-
-
-
-

Applying different functions to DataFrame columns

-

By passing a dict to aggregate you can apply a different aggregation to the -columns of a DataFrame:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 808)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped.agg({"C": "sum", "D": lambda x: np.std(x, ddof=1)})
-
-
-
-

The function names can also be strings. In order for a string to be valid it -must be implemented on GroupBy:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 815)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped.agg({"C": "sum", "D": "std"})
-
-
-
-
-
-
-

Transformation

-

A transformation is a GroupBy operation whose result is indexed the same -as the one being grouped. Common examples include :meth:`~.DataFrameGroupBy.cumsum` and -:meth:`~.DataFrameGroupBy.diff`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 824); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 824); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 828)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    speeds
-    grouped = speeds.groupby("class")["max_speed"]
-    grouped.cumsum()
-    grouped.diff()
-
-
-
-

Unlike aggregations, the groupings that are used to split -the original object are not included in the result.

-
-

Note

-

Since transformations do not include the groupings that are used to split the result, -the arguments as_index and sort in :meth:`DataFrame.groupby` and -:meth:`Series.groupby` have no effect.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 840); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 840); backlink

-Unknown interpreted text role "meth".
-
-

A common use of a transformation is to add the result back into the original DataFrame.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 846)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    result = speeds.copy()
-    result["cumsum"] = grouped.cumsum()
-    result["diff"] = grouped.diff()
-    result
-
-
-
-
-

Built-in transformation methods

-

The following methods on GroupBy act as transformations. Of these methods, only -fillna does not have a Cython-optimized implementation.

- ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MethodDescription

:meth:`~.DataFrameGroupBy.bfill`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Back fill NA values within each group

:meth:`~.DataFrameGroupBy.cumcount`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the cumulative count within each group

:meth:`~.DataFrameGroupBy.cummax`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the cumulative max within each group

:meth:`~.DataFrameGroupBy.cummin`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the cumulative min within each group

:meth:`~.DataFrameGroupBy.cumprod`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the cumulative product within each group

:meth:`~.DataFrameGroupBy.cumsum`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the cumulative sum within each group

:meth:`~.DataFrameGroupBy.diff`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the difference between adjacent values within each group

:meth:`~.DataFrameGroupBy.ffill`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Forward fill NA values within each group

:meth:`~.DataFrameGroupBy.fillna`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Fill NA values within each group

:meth:`~.DataFrameGroupBy.pct_change`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the percent change between adjacent values within each group

:meth:`~.DataFrameGroupBy.rank`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Compute the rank of each value within each group

:meth:`~.DataFrameGroupBy.shift`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 864); backlink

-Unknown interpreted text role "meth".
-
Shift values up or down within each group
-

In addition, passing any built-in aggregation method as a string to -:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result -across the group, producing a transformed result. If the aggregation method is -Cython-optimized, this will be performant as well.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 877); backlink

-Unknown interpreted text role "meth".
-
-
-

The :meth:`~.DataFrameGroupBy.transform` method

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 884); backlink

-Unknown interpreted text role "meth".
-

Similar to the :ref:`aggregation method <groupby.aggregate.agg>`, the -:meth:`~.DataFrameGroupBy.transform` method can accept string aliases to the built-in -transformation methods in the previous section. It can also accept string aliases to -the built-in aggregation methods. When an aggregation method is provided, the result -will be broadcast across the group.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 887); backlink

-Unknown interpreted text role "ref".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 887); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 893)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    speeds
-    grouped = speeds.groupby("class")[["max_speed"]]
-    grouped.transform("cumsum")
-    grouped.transform("sum")
-
-
-
-

In addition to string aliases, the :meth:`~.DataFrameGroupBy.transform` method can -also accept User-Defined Functions (UDFs). The UDF must:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 900); backlink

-Unknown interpreted text role "meth".
-
    -
  • Return a result that is either the same size as the group chunk or -broadcastable to the size of the group chunk (e.g., a scalar, -grouped.transform(lambda x: x.iloc[-1])).

    -
  • -
  • Operate column-by-column on the group chunk. The transform is applied to -the first group chunk using chunk.apply.

    -
  • -
  • Not perform in-place operations on the group chunk. Group chunks should -be treated as immutable, and changes to a group chunk may produce unexpected -results. See :ref:`gotchas.udf-mutation` for more information.

    -
    -

    System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 908); backlink

    -

    Unknown interpreted text role "ref".

    -
    -
  • -
  • (Optionally) operates on all columns of the entire group chunk at once. If this is -supported, a fast path is used starting from the second chunk.

    -
  • -
-
-

Note

-

Transforming by supplying transform with a UDF is -often less performant than using the built-in methods on GroupBy. -Consider breaking up a complex operation into a chain of operations that utilize -the built-in methods.

-

All of the examples in this section can be made more performant by calling -built-in methods instead of using transform. -See :ref:`below for examples <groupby_efficient_transforms>`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 921); backlink

-Unknown interpreted text role "ref".
-
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 925)

-

Unknown directive type "versionchanged".

-
-.. versionchanged:: 2.0.0
-
-    When using ``.transform`` on a grouped DataFrame and the transformation function
-    returns a DataFrame, pandas now aligns the result's index
-    with the input's index. You can call ``.to_numpy()`` within the transformation
-    function to avoid alignment.
-
-
-
-

Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the -transformation function. If the results from different groups have different dtypes, then -a common dtype will be determined in the same way as DataFrame construction.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 932); backlink

-Unknown interpreted text role "ref".
-

Suppose we wish to standardize the data within each group:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 938)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   index = pd.date_range("10/1/1999", periods=1100)
-   ts = pd.Series(np.random.normal(0.5, 2, 1100), index)
-   ts = ts.rolling(window=100, min_periods=100).mean().dropna()
-
-   ts.head()
-   ts.tail()
-
-   transformed = ts.groupby(lambda x: x.year).transform(
-       lambda x: (x - x.mean()) / x.std()
-   )
-
-
-
-
-

We would expect the result to now have mean 0 and standard deviation 1 within -each group, which we can easily check:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 955)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   # Original Data
-   grouped = ts.groupby(lambda x: x.year)
-   grouped.mean()
-   grouped.std()
-
-   # Transformed Data
-   grouped_trans = transformed.groupby(lambda x: x.year)
-   grouped_trans.mean()
-   grouped_trans.std()
-
-
-
-

We can also visually compare the original and transformed data sets.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 969)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   compare = pd.DataFrame({"Original": ts, "Transformed": transformed})
-
-   @savefig groupby_transform_plot.png
-   compare.plot()
-
-
-
-

Transformation functions that have lower dimension outputs are broadcast to -match the shape of the input array.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 979)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
-
-
-
-

Another common data transform is to replace missing data with the group mean.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 985)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   cols = ["A", "B", "C"]
-   values = np.random.randn(1000, 3)
-   values[np.random.randint(0, 1000, 100), 0] = np.nan
-   values[np.random.randint(0, 1000, 50), 1] = np.nan
-   values[np.random.randint(0, 1000, 200), 2] = np.nan
-   data_df = pd.DataFrame(values, columns=cols)
-   data_df
-
-   countries = np.array(["US", "UK", "GR", "JP"])
-   key = countries[np.random.randint(0, 4, 1000)]
-
-   grouped = data_df.groupby(key)
-
-   # Non-NA count in each group
-   grouped.count()
-
-   transformed = grouped.transform(lambda x: x.fillna(x.mean()))
-
-
-
-

We can verify that the group means have not changed in the transformed data, -and that the transformed data contains no NAs.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1008)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   grouped_trans = transformed.groupby(key)
-
-   grouped.mean()  # original group means
-   grouped_trans.mean()  # transformation did not change group means
-
-   grouped.count()  # original has some missing data points
-   grouped_trans.count()  # counts after transformation
-   grouped_trans.size()  # Verify non-NA count equals group size
-
-
-
-

As mentioned in the note above, each of the examples in this section can be computed -more efficiently using built-in methods. In the code below, the inefficient way -using a UDF is commented out and the faster alternative appears below.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1025)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    # ts.groupby(lambda x: x.year).transform(
-    #     lambda x: (x - x.mean()) / x.std()
-    # )
-    grouped = ts.groupby(lambda x: x.year)
-    result = (ts - grouped.transform("mean")) / grouped.transform("std")
-
-    # ts.groupby(lambda x: x.year).transform(lambda x: x.max() - x.min())
-    grouped = ts.groupby(lambda x: x.year)
-    result = grouped.transform("max") - grouped.transform("min")
-
-    # grouped = data_df.groupby(key)
-    # grouped.transform(lambda x: x.fillna(x.mean()))
-    grouped = data_df.groupby(key)
-    result = data_df.fillna(grouped.transform("mean"))
-
-
-
-
-
-

Window and resample operations

-

It is possible to use resample(), expanding() and -rolling() as methods on groupbys.

-

The example below will apply the rolling() method on the samples of -the column B, based on the groups of column A.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1053)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df_re = pd.DataFrame({"A": [1] * 10 + [5] * 10, "B": np.arange(20)})
-   df_re
-
-   df_re.groupby("A").rolling(4).B.mean()
-
-
-
-
-

The expanding() method will accumulate a given operation -(sum() in the example) for all the members of each particular -group.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1065)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df_re.groupby("A").expanding().sum()
-
-
-
-
-

Suppose you want to use the resample() method to get a daily -frequency in each group of your dataframe, and wish to complete the -missing values with the ffill() method.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1074)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df_re = pd.DataFrame(
-       {
-           "date": pd.date_range(start="2016-01-01", periods=4, freq="W"),
-           "group": [1, 1, 2, 2],
-           "val": [5, 6, 7, 8],
-       }
-   ).set_index("date")
-   df_re
-
-   df_re.groupby("group").resample("1D").ffill()
-
-
-
-
-
-
-

Filtration

-

A filtration is a GroupBy operation the subsets the original grouping object. It -may either filter out entire groups, part of groups, or both. Filtrations return -a filtered version of the calling object, including the grouping columns when provided. -In the following example, class is included in the result.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1097)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    speeds
-    speeds.groupby("class").nth(1)
-
-
-
-
-

Note

-

Unlike aggregations, filtrations do not add the group keys to the index of the -result. Because of this, passing as_index=False or sort=True will not -affect these methods.

-
-

Filtrations will respect subsetting the columns of the GroupBy object.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1110)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    speeds.groupby("class")[["order", "max_speed"]].nth(1)
-
-
-
-
-

Built-in filtrations

-

The following methods on GroupBy act as filtrations. All these methods have a -Cython-optimized implementation.

- ---- - - - - - - - - - - - - - - - - -
MethodDescription

:meth:`~.DataFrameGroupBy.head`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1125); backlink

-Unknown interpreted text role "meth".
-
Select the top row(s) of each group

:meth:`~.DataFrameGroupBy.nth`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1125); backlink

-Unknown interpreted text role "meth".
-
Select the nth row(s) of each group

:meth:`~.DataFrameGroupBy.tail`

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1125); backlink

-Unknown interpreted text role "meth".
-
Select the bottom row(s) of each group
-

Users can also use transformations along with Boolean indexing to construct complex -filtrations within groups. For example, suppose we are given groups of products and -their volumes, and we wish to subset the data to only the largest products capturing no -more than 90% of the total volume within each group.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1134)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    product_volumes = pd.DataFrame(
-        {
-            "group": list("xxxxyyy"),
-            "product": list("abcdefg"),
-            "volume": [10, 30, 20, 15, 40, 10, 20],
-        }
-    )
-    product_volumes
-
-    # Sort by volume to select the largest products first
-    product_volumes = product_volumes.sort_values("volume", ascending=False)
-    grouped = product_volumes.groupby("group")["volume"]
-    cumpct = grouped.cumsum() / grouped.transform("sum")
-    cumpct
-    significant_products = product_volumes[cumpct <= 0.9]
-    significant_products.sort_values(["group", "product"])
-
-
-
-
-
-

The :class:`~DataFrameGroupBy.filter` method

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1153); backlink

-Unknown interpreted text role "class".
-
-

Note

-

Filtering by supplying filter with a User-Defined Function (UDF) is -often less performant than using the built-in methods on GroupBy. -Consider breaking up a complex operation into a chain of operations that utilize -the built-in methods.

-
-

The filter method takes a User-Defined Function (UDF) that, when applied to -an entire group, returns either True or False. The result of the filter -method is then the subset of groups for which the UDF returned True.

-

Suppose we want to take only elements that belong to groups with a group sum greater -than 2.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1170)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   sf = pd.Series([1, 1, 2, 3, 3, 3])
-   sf.groupby(sf).filter(lambda x: x.sum() > 2)
-
-
-
-

Another useful operation is filtering out elements that belong to groups -with only a couple members.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1178)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")})
-   dff.groupby("B").filter(lambda x: len(x) > 2)
-
-
-
-

Alternatively, instead of dropping the offending groups, we can return a -like-indexed objects where the groups that do not pass the filter are filled -with NaNs.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1187)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False)
-
-
-
-

For DataFrames with multiple columns, filters should explicitly specify a column as the filter criterion.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1193)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   dff["C"] = np.arange(8)
-   dff.groupby("B").filter(lambda x: len(x["C"]) > 2)
-
-
-
-
-
-
-

Flexible apply

-

Some operations on the grouped data might not fit into the aggregation, -transformation, or filtration categories. For these, you can use the apply -function.

-
-

Warning

-

apply has to try to infer from the result whether it should act as a reducer, -transformer, or filter, depending on exactly what is passed to it. Thus the -grouped column(s) may be included in the output or not. While -it tries to intelligently guess how to behave, it can sometimes guess wrong.

-
-
-

Note

-

All of the examples in this section can be more reliably, and more efficiently, -computed using other pandas functionality.

-
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1219)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df
-   grouped = df.groupby("A")
-
-   # could also just call .describe()
-   grouped["C"].apply(lambda x: x.describe())
-
-
-
-

The dimension of the returned result can also change:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1229)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    grouped = df.groupby('A')['C']
-
-    def f(group):
-        return pd.DataFrame({'original': group,
-                             'demeaned': group - group.mean()})
-
-    grouped.apply(f)
-
-
-
-

Similar to :ref:`groupby.aggregate.agg`, the resulting dtype will reflect that of the -apply function. If the results from different groups have different dtypes, then -a common dtype will be determined in the same way as DataFrame construction.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1239); backlink

-Unknown interpreted text role "ref".
-
-

Control grouped column(s) placement with group_keys

-

To control whether the grouped column(s) are included in the indices, you can use -the argument group_keys which defaults to True. Compare

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1249)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    df.groupby("A", group_keys=True).apply(lambda x: x)
-
-
-
-

with

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1255)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    df.groupby("A", group_keys=False).apply(lambda x: x)
-
-
-
-
-
-
-
-

Numba Accelerated Routines

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1263)

-

Unknown directive type "versionadded".

-
-.. versionadded:: 1.1
-
-
-
-

If Numba is installed as an optional dependency, the transform and -aggregate methods support engine='numba' and engine_kwargs arguments. -See :ref:`enhancing performance with Numba <enhancingperf.numba>` for general usage of the arguments -and performance considerations.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1265); backlink

-Unknown interpreted text role "ref".
-

The function signature must start with values, index exactly as the data belonging to each group -will be passed into values, and the group index will be passed into index.

-
-

Warning

-

When using engine='numba', there will be no "fall back" behavior internally. The group -data and group index will be passed as NumPy arrays to the JITed user defined function, and no -alternative execution attempts will be tried.

-
-
-
-

Other useful features

-
-

Exclusion of "nuisance" columns

-

Again consider the example DataFrame we've been looking at:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1287)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df
-
-
-
-

Suppose we wish to compute the standard deviation grouped by the A -column. There is a slight problem, namely that we don't care about the data in -column B because it is not numeric. We refer to these non-numeric columns as -"nuisance" columns. You can avoid nuisance columns by specifying numeric_only=True:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1296)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby("A").std(numeric_only=True)
-
-
-
-

Note that df.groupby('A').colname.std(). is more efficient than -df.groupby('A').std().colname. So if the result of an aggregation function -is only needed over one column (here colname), it may be filtered -before applying the aggregation function.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1305)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    from decimal import Decimal
-
-    df_dec = pd.DataFrame(
-        {
-            "id": [1, 2, 1, 2],
-            "int_column": [1, 2, 3, 4],
-            "dec_column": [
-                Decimal("0.50"),
-                Decimal("0.15"),
-                Decimal("0.25"),
-                Decimal("0.40"),
-            ],
-        }
-    )
-
-    # Decimal columns can be sum'd explicitly by themselves...
-    df_dec.groupby(["id"])[["dec_column"]].sum()
-
-    # ...but cannot be combined with standard data types or they will be excluded
-    df_dec.groupby(["id"])[["int_column", "dec_column"]].sum()
-
-    # Use .agg function to aggregate over standard and "nuisance" data types
-    # at the same time
-    df_dec.groupby(["id"]).agg({"int_column": "sum", "dec_column": "sum"})
-
-
-
-
-
-

Handling of (un)observed Categorical values

-

When using a Categorical grouper (as a single grouper, or as part of multiple groupers), the observed keyword -controls whether to return a cartesian product of all possible groupers values (observed=False) or only those -that are observed groupers (observed=True).

-

Show all values:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1343)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   pd.Series([1, 1, 1]).groupby(
-       pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=False
-   ).count()
-
-
-
-

Show only the observed values:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1351)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   pd.Series([1, 1, 1]).groupby(
-       pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=True
-   ).count()
-
-
-
-

The returned dtype of the grouped will always include all of the categories that were grouped.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1359)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   s = (
-       pd.Series([1, 1, 1])
-       .groupby(pd.Categorical(["a", "a", "a"], categories=["a", "b"]), observed=False)
-       .count()
-   )
-   s.index.dtype
-
-
-
-
-
-

NA and NaT group handling

-

If there are any NaN or NaT values in the grouping key, these will be -automatically excluded. In other words, there will never be an "NA group" or -"NaT group". This was not the case in older versions of pandas, but users were -generally discarding the NA group anyway (and supporting it was an -implementation headache).

-
-
-

Grouping with ordered factors

-

Categorical variables represented as instances of pandas's Categorical class -can be used as group keys. If so, the order of the levels will be preserved:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1385)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   data = pd.Series(np.random.randn(100))
-
-   factor = pd.qcut(data, [0, 0.25, 0.5, 0.75, 1.0])
-
-   data.groupby(factor, observed=False).mean()
-
-
-
-
-
-

Grouping with a grouper specification

-

You may need to specify a bit more data to properly group. You can -use the pd.Grouper to provide this local control.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1401)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   import datetime
-
-   df = pd.DataFrame(
-       {
-           "Branch": "A A A A A A A B".split(),
-           "Buyer": "Carl Mark Carl Carl Joe Joe Joe Carl".split(),
-           "Quantity": [1, 3, 5, 1, 8, 1, 9, 3],
-           "Date": [
-               datetime.datetime(2013, 1, 1, 13, 0),
-               datetime.datetime(2013, 1, 1, 13, 5),
-               datetime.datetime(2013, 10, 1, 20, 0),
-               datetime.datetime(2013, 10, 2, 10, 0),
-               datetime.datetime(2013, 10, 1, 20, 0),
-               datetime.datetime(2013, 10, 2, 10, 0),
-               datetime.datetime(2013, 12, 2, 12, 0),
-               datetime.datetime(2013, 12, 2, 14, 0),
-           ],
-       }
-   )
-
-   df
-
-
-
-

Groupby a specific column with the desired frequency. This is like resampling.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1427)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby([pd.Grouper(freq="1M", key="Date"), "Buyer"])[["Quantity"]].sum()
-
-
-
-

When freq is specified, the object returned by pd.Grouper will be an -instance of pandas.api.typing.TimeGrouper. You have an ambiguous specification -in that you have a named index and a column that could be potential groupers.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1435)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = df.set_index("Date")
-   df["Date"] = df.index + pd.offsets.MonthEnd(2)
-   df.groupby([pd.Grouper(freq="6M", key="Date"), "Buyer"])[["Quantity"]].sum()
-
-   df.groupby([pd.Grouper(freq="6M", level="Date"), "Buyer"])[["Quantity"]].sum()
-
-
-
-
-
-
-

Taking the first rows of each group

-

Just like for a DataFrame or Series you can call head and tail on a groupby:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1449)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=["A", "B"])
-   df
-
-   g = df.groupby("A")
-   g.head(1)
-
-   g.tail(1)
-
-
-
-

This shows the first or last n rows from each group.

-
-
-

Taking the nth row of each group

-

To select the nth item from each group, use :meth:`.DataFrameGroupBy.nth` or -:meth:`.SeriesGroupBy.nth`. Arguments supplied can be any integer, lists of integers, -slices, or lists of slices; see below for examples. When the nth element of a group -does not exist an error is not raised; instead no corresponding rows are returned.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1466); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1466); backlink

-Unknown interpreted text role "meth".
-

In general this operation acts as a filtration. In certain cases it will also return -one row per group, making it also a reduction. However because in general it can -return zero or multiple rows per group, pandas treats it as a filtration in all cases.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1475)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=["A", "B"])
-   g = df.groupby("A")
-
-   g.nth(0)
-   g.nth(-1)
-   g.nth(1)
-
-
-
-

If the nth element of a group does not exist, then no corresponding row is included -in the result. In particular, if the specified n is larger than any group, the -result will be an empty DataFrame.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1488)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   g.nth(5)
-
-
-
-

If you want to select the nth not-null item, use the dropna kwarg. For a DataFrame this should be either 'any' or 'all' just like you would pass to dropna:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1494)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   # nth(0) is the same as g.first()
-   g.nth(0, dropna="any")
-   g.first()
-
-   # nth(-1) is the same as g.last()
-   g.nth(-1, dropna="any")
-   g.last()
-
-   g.B.nth(0, dropna="all")
-
-
-
-

You can also select multiple rows from each group by specifying multiple nth values as a list of ints.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1508)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   business_dates = pd.date_range(start="4/1/2014", end="6/30/2014", freq="B")
-   df = pd.DataFrame(1, index=business_dates, columns=["a", "b"])
-   # get the first, 4th, and last date index for each month
-   df.groupby([df.index.year, df.index.month]).nth([0, 3, -1])
-
-
-
-

You may also use slices or lists of slices.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1517)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df.groupby([df.index.year, df.index.month]).nth[1:]
-   df.groupby([df.index.year, df.index.month]).nth[1:, :-1]
-
-
-
-
-
-

Enumerate group items

-

To see the order in which each row appears within its group, use the -cumcount method:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1528)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   dfg = pd.DataFrame(list("aaabba"), columns=["A"])
-   dfg
-
-   dfg.groupby("A").cumcount()
-
-   dfg.groupby("A").cumcount(ascending=False)
-
-
-
-
-
-

Enumerate groups

-

To see the ordering of the groups (as opposed to the order of rows -within a group given by cumcount) you can use -:meth:`~pandas.core.groupby.DataFrameGroupBy.ngroup`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1542); backlink

-Unknown interpreted text role "meth".
-

Note that the numbers given to the groups match the order in which the -groups would be seen when iterating over the groupby object, not the -order they are first observed.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1552)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   dfg = pd.DataFrame(list("aaabba"), columns=["A"])
-   dfg
-
-   dfg.groupby("A").ngroup()
-
-   dfg.groupby("A").ngroup(ascending=False)
-
-
-
-
-
-

Plotting

-

Groupby also works with some plotting methods. In this case, suppose we -suspect that the values in column 1 are 3 times higher on average in group "B".

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1568)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   np.random.seed(1234)
-   df = pd.DataFrame(np.random.randn(50, 2))
-   df["g"] = np.random.choice(["A", "B"], size=50)
-   df.loc[df["g"] == "B", 1] += 3
-
-
-
-

We can easily visualize this with a boxplot:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1577)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-   :okwarning:
-
-   @savefig groupby_boxplot.png
-   df.groupby("g").boxplot()
-
-
-
-

The result of calling boxplot is a dictionary whose keys are the values -of our grouping column g ("A" and "B"). The values of the resulting dictionary -can be controlled by the return_type keyword of boxplot. -See the :ref:`visualization documentation<visualization.box>` for more.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1583); backlink

-Unknown interpreted text role "ref".
-
-

Warning

-

For historical reasons, df.groupby("g").boxplot() is not equivalent -to df.boxplot(by="g"). See :ref:`here<visualization.box.return>` for -an explanation.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1590); backlink

-Unknown interpreted text role "ref".
-
-
-
-

Piping function calls

-

Similar to the functionality provided by DataFrame and Series, functions -that take GroupBy objects can be chained together using a pipe method to -allow for a cleaner, more readable syntax. To read about .pipe in general terms, -see :ref:`here <basics.pipe>`.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1599); backlink

-Unknown interpreted text role "ref".
-

Combining .groupby and .pipe is often useful when you need to reuse -GroupBy objects.

-

As an example, imagine having a DataFrame with columns for stores, products, -revenue and quantity sold. We'd like to do a groupwise calculation of prices -(i.e. revenue/quantity) per store and per product. We could do this in a -multi-step operation, but expressing it in terms of piping can make the -code more readable. First we set the data:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1613)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   n = 1000
-   df = pd.DataFrame(
-       {
-           "Store": np.random.choice(["Store_1", "Store_2"], n),
-           "Product": np.random.choice(["Product_1", "Product_2"], n),
-           "Revenue": (np.random.random(n) * 50 + 10).round(2),
-           "Quantity": np.random.randint(1, 10, size=n),
-       }
-   )
-   df.head(2)
-
-
-
-

Now, to find prices per store/product, we can simply do:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1628)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   (
-       df.groupby(["Store", "Product"])
-       .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum())
-       .unstack()
-       .round(2)
-   )
-
-
-
-

Piping can also be expressive when you want to deliver a grouped object to some -arbitrary function, for example:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1640)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   def mean(groupby):
-       return groupby.mean()
-
-
-   df.groupby(["Store", "Product"]).pipe(mean)
-
-
-
-

Here mean takes a GroupBy object and finds the mean of the Revenue and Quantity -columns respectively for each Store-Product combination. The mean function can -be any function that takes in a GroupBy object; the .pipe will pass the GroupBy -object as a parameter into the function you specify.

-
-
-
-

Examples

-
-

Regrouping by factor

-

Regroup columns of a DataFrame according to their sum, and sum the aggregated ones.

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1661)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = pd.DataFrame({"a": [1, 0, 0], "b": [0, 1, 0], "c": [1, 0, 0], "d": [2, 3, 4]})
-   df
-   dft = df.T
-   dft.groupby(dft.sum()).sum()
-
-
-
-
-
-

Multi-column factorization

-

By using :meth:`~pandas.core.groupby.DataFrameGroupBy.ngroup`, we can extract -information about the groups in a way similar to :func:`factorize` (as described -further in the :ref:`reshaping API <reshaping.factorize>`) but which applies -naturally to multiple columns of mixed type and different -sources. This can be useful as an intermediate categorical-like step -in processing, when the relationships between the group rows are more -important than their content, or as input to an algorithm which only -accepts the integer encoding. (For more information about support in -pandas for full categorical data, see the :ref:`Categorical -introduction <categorical>` and the -:ref:`API documentation <api.arrays.categorical>`.)

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

-Unknown interpreted text role "meth".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

-Unknown interpreted text role "func".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

-Unknown interpreted text role "ref".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

-Unknown interpreted text role "ref".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1673); backlink

-Unknown interpreted text role "ref".
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1685)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-    dfg = pd.DataFrame({"A": [1, 1, 2, 3, 2], "B": list("aaaba")})
-
-    dfg
-
-    dfg.groupby(["A", "B"]).ngroup()
-
-    dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup()
-
-
-
-
-
-

Groupby by indexer to 'resample' data

-

Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.

-

In order for resample to work on indices that are non-datetimelike, the following procedure can be utilized.

-

In the following examples, df.index // 5 returns a binary array which is used to determine what gets selected for the groupby operation.

-
-

Note

-

The example below shows how we can downsample by consolidation of samples into fewer ones. -Here by using df.index // 5, we are aggregating the samples in bins. By applying std() -function, we aggregate the information contained in many samples into a small subset of values -which is their standard deviation thereby reducing the number of samples.

-
-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1711)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = pd.DataFrame(np.random.randn(10, 2))
-   df
-   df.index // 5
-   df.groupby(df.index // 5).std()
-
-
-
-
-
-

Returning a Series to propagate names

-

Group DataFrame columns, compute a set of metrics and return a named Series. -The Series name is used as the name for the column index. This is especially -useful in conjunction with reshaping operations such as stacking, in which the -column index name will be used as the name of the inserted column:

-
-

System Message: ERROR/3 (/Users/immanuellaumoren/Two-Sigma/pandas/doc/source/user_guide/groupby.rst, line 1726)

-

Unknown directive type "ipython".

-
-.. ipython:: python
-
-   df = pd.DataFrame(
-       {
-           "a": [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
-           "b": [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
-           "c": [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
-           "d": [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
-       }
-   )
-
-   def compute_metrics(x):
-       result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
-       return pd.Series(result, name="metrics")
-
-   result = df.groupby("a").apply(compute_metrics)
-
-   result
-
-   result.stack()
-
-
-
-
-
-
- -