Skip to content

DOC: Add SeriesGroupBy and DataFrameGroupBy reference pages #48500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 94 additions & 62 deletions doc/source/reference/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,14 @@ Indexing, iteration
.. autosummary::
:toctree: api/

GroupBy.__iter__
GroupBy.groups
GroupBy.indices
GroupBy.get_group
DataFrameGroupBy.__iter__
SeriesGroupBy.__iter__
DataFrameGroupBy.groups
SeriesGroupBy.groups
DataFrameGroupBy.indices
SeriesGroupBy.indices
DataFrameGroupBy.get_group
SeriesGroupBy.get_group

.. currentmodule:: pandas

Expand All @@ -41,57 +45,21 @@ Function application
.. autosummary::
:toctree: api/

GroupBy.apply
GroupBy.agg
SeriesGroupBy.apply
DataFrameGroupBy.apply
SeriesGroupBy.agg
DataFrameGroupBy.agg
SeriesGroupBy.aggregate
DataFrameGroupBy.aggregate
SeriesGroupBy.transform
DataFrameGroupBy.transform
GroupBy.pipe

Computations / descriptive stats
--------------------------------
.. autosummary::
:toctree: api/

GroupBy.all
GroupBy.any
GroupBy.bfill
GroupBy.backfill
GroupBy.count
GroupBy.cumcount
GroupBy.cummax
GroupBy.cummin
GroupBy.cumprod
GroupBy.cumsum
GroupBy.ffill
GroupBy.first
GroupBy.head
GroupBy.last
GroupBy.max
GroupBy.mean
GroupBy.median
GroupBy.min
GroupBy.ngroup
GroupBy.nth
GroupBy.ohlc
GroupBy.pad
GroupBy.prod
GroupBy.rank
GroupBy.pct_change
GroupBy.size
GroupBy.sem
GroupBy.std
GroupBy.sum
GroupBy.var
GroupBy.tail

The following methods are available in both ``SeriesGroupBy`` and
``DataFrameGroupBy`` objects, but may differ slightly, usually in that
the ``DataFrameGroupBy`` version usually permits the specification of an
axis argument, and often an argument indicating whether to restrict
application to columns of a specific data type.
SeriesGroupBy.pipe
DataFrameGroupBy.pipe
DataFrameGroupBy.filter
SeriesGroupBy.filter

``DataFrameGroupBy`` computations / descriptive stats
-----------------------------------------------------
.. autosummary::
:toctree: api/

Expand All @@ -100,6 +68,7 @@ application to columns of a specific data type.
DataFrameGroupBy.backfill
DataFrameGroupBy.bfill
DataFrameGroupBy.corr
DataFrameGroupBy.corrwith
DataFrameGroupBy.count
DataFrameGroupBy.cov
DataFrameGroupBy.cumcount
Expand All @@ -111,42 +80,105 @@ application to columns of a specific data type.
DataFrameGroupBy.diff
DataFrameGroupBy.ffill
DataFrameGroupBy.fillna
DataFrameGroupBy.filter
DataFrameGroupBy.hist
DataFrameGroupBy.first
DataFrameGroupBy.head
DataFrameGroupBy.idxmax
DataFrameGroupBy.idxmin
DataFrameGroupBy.last
DataFrameGroupBy.mad
DataFrameGroupBy.max
DataFrameGroupBy.mean
DataFrameGroupBy.median
DataFrameGroupBy.min
DataFrameGroupBy.ngroup
DataFrameGroupBy.nth
DataFrameGroupBy.nunique
DataFrameGroupBy.ohlc
DataFrameGroupBy.pad
DataFrameGroupBy.pct_change
DataFrameGroupBy.plot
DataFrameGroupBy.prod
DataFrameGroupBy.quantile
DataFrameGroupBy.rank
DataFrameGroupBy.resample
DataFrameGroupBy.sample
DataFrameGroupBy.sem
DataFrameGroupBy.shift
DataFrameGroupBy.size
DataFrameGroupBy.skew
DataFrameGroupBy.std
DataFrameGroupBy.sum
DataFrameGroupBy.var
DataFrameGroupBy.tail
DataFrameGroupBy.take
DataFrameGroupBy.tshift
DataFrameGroupBy.value_counts

The following methods are available only for ``SeriesGroupBy`` objects.

``SeriesGroupBy`` computations / descriptive stats
--------------------------------------------------
.. autosummary::
:toctree: api/

SeriesGroupBy.hist
SeriesGroupBy.all
SeriesGroupBy.any
SeriesGroupBy.backfill
SeriesGroupBy.bfill
SeriesGroupBy.corr
SeriesGroupBy.count
SeriesGroupBy.cov
SeriesGroupBy.cumcount
SeriesGroupBy.cummax
SeriesGroupBy.cummin
SeriesGroupBy.cumprod
SeriesGroupBy.cumsum
SeriesGroupBy.describe
SeriesGroupBy.diff
SeriesGroupBy.ffill
SeriesGroupBy.fillna
SeriesGroupBy.first
SeriesGroupBy.head
SeriesGroupBy.last
SeriesGroupBy.idxmax
SeriesGroupBy.idxmin
SeriesGroupBy.is_monotonic_increasing
SeriesGroupBy.is_monotonic_decreasing
SeriesGroupBy.mad
SeriesGroupBy.max
SeriesGroupBy.mean
SeriesGroupBy.median
SeriesGroupBy.min
SeriesGroupBy.ngroup
SeriesGroupBy.nlargest
SeriesGroupBy.nsmallest
SeriesGroupBy.nth
SeriesGroupBy.nunique
SeriesGroupBy.unique
SeriesGroupBy.is_monotonic_increasing
SeriesGroupBy.is_monotonic_decreasing

The following methods are available only for ``DataFrameGroupBy`` objects.

SeriesGroupBy.ohlc
SeriesGroupBy.pad
SeriesGroupBy.pct_change
SeriesGroupBy.prod
SeriesGroupBy.quantile
SeriesGroupBy.rank
SeriesGroupBy.resample
SeriesGroupBy.sample
SeriesGroupBy.sem
SeriesGroupBy.shift
SeriesGroupBy.size
SeriesGroupBy.skew
SeriesGroupBy.std
SeriesGroupBy.sum
SeriesGroupBy.var
SeriesGroupBy.tail
SeriesGroupBy.take
SeriesGroupBy.tshift
SeriesGroupBy.value_counts

Plotting and visualization
--------------------------
.. autosummary::
:toctree: api/

DataFrameGroupBy.corrwith
DataFrameGroupBy.boxplot
DataFrameGroupBy.hist
SeriesGroupBy.hist
DataFrameGroupBy.plot
SeriesGroupBy.plot
4 changes: 2 additions & 2 deletions doc/source/user_guide/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -528,15 +528,15 @@ See the :ref:`Grouping section <groupby>`.
)
df

Grouping and then applying the :meth:`~pandas.core.groupby.GroupBy.sum` function to the resulting
Grouping and then applying the :meth:`~pandas.core.groupby.DataFrameGroupBy.sum` function to the resulting
groups:

.. ipython:: python

df.groupby("A")[["C", "D"]].sum()

Grouping by multiple columns forms a hierarchical index, and again we can
apply the :meth:`~pandas.core.groupby.GroupBy.sum` function:
apply the :meth:`~pandas.core.groupby.DataFrameGroupBy.sum` function:

.. ipython:: python

Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -632,7 +632,7 @@ Named aggregation
.. versionadded:: 0.25.0

To support column-specific aggregation *with control over the output column names*, pandas
accepts the special syntax in :meth:`GroupBy.agg`, known as "named aggregation", where
accepts the special syntax in :meth:`DataFrameGroupBy.agg` and :meth:`SeriesGroupBy.agg`, known as "named aggregation", where

- The keywords are the *output* column names
- The values are tuples whose first element is the column to select
Expand Down
6 changes: 3 additions & 3 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -774,7 +774,7 @@ source, you should no longer need to install Cython into your build environment
Other API changes
^^^^^^^^^^^^^^^^^

- :class:`core.groupby.GroupBy.transform` now raises on invalid operation names (:issue:`27489`)
- :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` now raises on invalid operation names (:issue:`27489`)
- :meth:`pandas.api.types.infer_dtype` will now return "integer-na" for integer and ``np.nan`` mix (:issue:`27283`)
- :meth:`MultiIndex.from_arrays` will no longer infer names from arrays if ``names=None`` is explicitly provided (:issue:`27292`)
- In order to improve tab-completion, pandas does not include most deprecated attributes when introspecting a pandas object using ``dir`` (e.g. ``dir(df)``).
Expand Down Expand Up @@ -1232,8 +1232,8 @@ GroupBy/resample/rolling
- Bug in :meth:`core.groupby.DataFrameGroupBy.agg` with timezone-aware datetime64 column incorrectly casting results to the original dtype (:issue:`29641`)
- Bug in :meth:`DataFrame.groupby` when using axis=1 and having a single level columns index (:issue:`30208`)
- Bug in :meth:`DataFrame.groupby` when using nunique on axis=1 (:issue:`30253`)
- Bug in :meth:`GroupBy.quantile` with multiple list-like q value and integer column names (:issue:`30289`)
- Bug in :meth:`GroupBy.pct_change` and :meth:`core.groupby.SeriesGroupBy.pct_change` causes ``TypeError`` when ``fill_method`` is ``None`` (:issue:`30463`)
- Bug in :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` with multiple list-like q value and integer column names (:issue:`30289`)
- Bug in :meth:`.DataFrameGroupBy.pct_change` and :meth:`.SeriesGroupBy.pct_change` causes ``TypeError`` when ``fill_method`` is ``None`` (:issue:`30463`)
- Bug in :meth:`Rolling.count` and :meth:`Expanding.count` argument where ``min_periods`` was ignored (:issue:`26996`)

Reshaping
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.0.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Fixed regressions
- Fixed regression in :class:`Series` multiplication when multiplying a numeric :class:`Series` with >10000 elements with a timedelta-like scalar (:issue:`31457`)
- Fixed regression in ``.groupby().agg()`` raising an ``AssertionError`` for some reductions like ``min`` on object-dtype columns (:issue:`31522`)
- Fixed regression in ``.groupby()`` aggregations with categorical dtype using Cythonized reduction functions (e.g. ``first``) (:issue:`31450`)
- Fixed regression in :meth:`GroupBy.apply` if called with a function which returned a non-pandas non-scalar object (e.g. a list or numpy array) (:issue:`31441`)
- Fixed regression in :meth:`.DataFrameGroupBy.apply` and :meth:`.SeriesGroupBy.apply` if called with a function which returned a non-pandas non-scalar object (e.g. a list or numpy array) (:issue:`31441`)
- Fixed regression in :meth:`DataFrame.groupby` whereby taking the minimum or maximum of a column with period dtype would raise a ``TypeError``. (:issue:`31471`)
- Fixed regression in :meth:`DataFrame.groupby` with an empty DataFrame grouping by a level of a MultiIndex (:issue:`31670`).
- Fixed regression in :meth:`DataFrame.apply` with object dtype and non-reducing function (:issue:`31505`)
Expand Down
6 changes: 3 additions & 3 deletions doc/source/whatsnew/v1.0.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@ Fixed regressions

**Groupby**

- Fixed regression in :meth:`groupby(..).agg() <pandas.core.groupby.GroupBy.agg>` which was failing on frames with :class:`MultiIndex` columns and a custom function (:issue:`31777`)
- Fixed regression in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg` which were failing on frames with :class:`MultiIndex` columns and a custom function (:issue:`31777`)
- Fixed regression in ``groupby(..).rolling(..).apply()`` (``RollingGroupby``) where the ``raw`` parameter was ignored (:issue:`31754`)
- Fixed regression in :meth:`rolling(..).corr() <pandas.core.window.rolling.Rolling.corr>` when using a time offset (:issue:`31789`)
- Fixed regression in :meth:`groupby(..).nunique() <pandas.core.groupby.DataFrameGroupBy.nunique>` which was modifying the original values if ``NaN`` values were present (:issue:`31950`)
- Fixed regression in ``DataFrame.groupby`` raising a ``ValueError`` from an internal operation (:issue:`31802`)
- Fixed regression in :meth:`groupby(..).agg() <pandas.core.groupby.GroupBy.agg>` calling a user-provided function an extra time on an empty input (:issue:`31760`)
- Fixed regression in :meth:`.DataFrameGroupBy.agg` and :meth:`.SeriesGroupBy.agg` calling a user-provided function an extra time on an empty input (:issue:`31760`)

**I/O**

Expand Down Expand Up @@ -104,7 +104,7 @@ Bug fixes
- Fixed bug in :meth:`DataFrame.convert_dtypes` for series with mix of integers and strings (:issue:`32117`)
- Fixed bug in :meth:`DataFrame.convert_dtypes` where ``BooleanDtype`` columns were converted to ``Int64`` (:issue:`32287`)
- Fixed bug in setting values using a slice indexer with string dtype (:issue:`31772`)
- Fixed bug where :meth:`pandas.core.groupby.GroupBy.first` and :meth:`pandas.core.groupby.GroupBy.last` would raise a ``TypeError`` when groups contained ``pd.NA`` in a column of object dtype (:issue:`32123`)
- Fixed bug where :meth:`.DataFrameGroupBy.first`, :meth:`.SeriesGroupBy.first`, :meth:`.DataFrameGroupBy.last`, and :meth:`.SeriesGroupBy.last` would raise a ``TypeError`` when groups contained ``pd.NA`` in a column of object dtype (:issue:`32123`)
- Fixed bug where :meth:`DataFrameGroupBy.mean`, :meth:`DataFrameGroupBy.median`, :meth:`DataFrameGroupBy.var`, and :meth:`DataFrameGroupBy.std` would raise a ``TypeError`` on ``Int64`` dtype columns (:issue:`32219`)

**Strings**
Expand Down
6 changes: 3 additions & 3 deletions doc/source/whatsnew/v1.0.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ including other versions of pandas.
Fixed regressions
~~~~~~~~~~~~~~~~~
- Fix regression where :meth:`Series.isna` and :meth:`DataFrame.isna` would raise for categorical dtype when ``pandas.options.mode.use_inf_as_na`` was set to ``True`` (:issue:`33594`)
- Fix regression in :meth:`GroupBy.first` and :meth:`GroupBy.last` where None is not preserved in object dtype (:issue:`32800`)
- Fix regression in :meth:`.DataFrameGroupBy.first`, :meth:`.SeriesGroupBy.first`, :meth:`.DataFrameGroupBy.last`, and :meth:`.SeriesGroupBy.last` where None is not preserved in object dtype (:issue:`32800`)
- Fix regression in DataFrame reductions using ``numeric_only=True`` and ExtensionArrays (:issue:`33256`).
- Fix performance regression in ``memory_usage(deep=True)`` for object dtype (:issue:`33012`)
- Fix regression where :meth:`Categorical.replace` would replace with ``NaN`` whenever the new value and replacement value were equal (:issue:`33288`)
Expand All @@ -26,7 +26,7 @@ Fixed regressions
- Fix regression in :meth:`DataFrame.describe` raising ``TypeError: unhashable type: 'dict'`` (:issue:`32409`)
- Fix regression in :meth:`DataFrame.replace` casts columns to ``object`` dtype if items in ``to_replace`` not in values (:issue:`32988`)
- Fix regression in :meth:`Series.groupby` would raise ``ValueError`` when grouping by :class:`PeriodIndex` level (:issue:`34010`)
- Fix regression in :meth:`GroupBy.rolling.apply` ignores args and kwargs parameters (:issue:`33433`)
- Fix regression in :meth:`DataFrameGroupBy.rolling.apply` and :meth:`SeriesGroupBy.rolling.apply` ignoring args and kwargs parameters (:issue:`33433`)
- Fix regression in error message with ``np.min`` or ``np.max`` on unordered :class:`Categorical` (:issue:`33115`)
- Fix regression in :meth:`DataFrame.loc` and :meth:`Series.loc` throwing an error when a ``datetime64[ns, tz]`` value is provided (:issue:`32395`)

Expand All @@ -40,7 +40,7 @@ Bug fixes
- Bug in :meth:`~DataFrame.to_csv` was silently failing when writing to an invalid s3 bucket. (:issue:`32486`)
- Bug in :meth:`read_parquet` was raising a ``FileNotFoundError`` when passed an s3 directory path. (:issue:`26388`)
- Bug in :meth:`~DataFrame.to_parquet` was throwing an ``AttributeError`` when writing a partitioned parquet file to s3 (:issue:`27596`)
- Bug in :meth:`GroupBy.quantile` causes the quantiles to be shifted when the ``by`` axis contains ``NaN`` (:issue:`33200`, :issue:`33569`)
- Bug in :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` causes the quantiles to be shifted when the ``by`` axis contains ``NaN`` (:issue:`33200`, :issue:`33569`)

Contributors
~~~~~~~~~~~~
Expand Down
8 changes: 4 additions & 4 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1126,16 +1126,16 @@ GroupBy/resample/rolling

- Using a :class:`pandas.api.indexers.BaseIndexer` with ``count``, ``min``, ``max``, ``median``, ``skew``, ``cov``, ``corr`` will now return correct results for any monotonic :class:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
- :meth:`DataFrameGroupby.mean` and :meth:`SeriesGroupby.mean` (and similarly for :meth:`~DataFrameGroupby.median`, :meth:`~DataFrameGroupby.std` and :meth:`~DataFrameGroupby.var`) now raise a ``TypeError`` if a non-accepted keyword argument is passed into it. Previously an ``UnsupportedFunctionCall`` was raised (``AssertionError`` if ``min_count`` passed into :meth:`~DataFrameGroupby.median`) (:issue:`31485`)
- Bug in :meth:`GroupBy.apply` raises ``ValueError`` when the ``by`` axis is not sorted, has duplicates, and the applied ``func`` does not mutate passed in objects (:issue:`30667`)
- Bug in :meth:`.DataFrameGroupBy.apply` and :meth:`.SeriesGroupBy.apply` raising ``ValueError`` when the ``by`` axis is not sorted, has duplicates, and the applied ``func`` does not mutate passed in objects (:issue:`30667`)
- Bug in :meth:`DataFrameGroupBy.transform` produces an incorrect result with transformation functions (:issue:`30918`)
- Bug in :meth:`Groupby.transform` was returning the wrong result when grouping by multiple keys of which some were categorical and others not (:issue:`32494`)
- Bug in :meth:`GroupBy.count` causes segmentation fault when grouped-by columns contain NaNs (:issue:`32841`)
- Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` were returning the wrong result when grouping by multiple keys of which some were categorical and others not (:issue:`32494`)
- Bug in :meth:`.DataFrameGroupBy.count` and :meth:`.SeriesGroupBy.count` causing segmentation fault when grouped-by columns contain NaNs (:issue:`32841`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` produces inconsistent type when aggregating Boolean :class:`Series` (:issue:`32894`)
- Bug in :meth:`DataFrameGroupBy.sum` and :meth:`SeriesGroupBy.sum` where a large negative number would be returned when the number of non-null values was below ``min_count`` for nullable integer dtypes (:issue:`32861`)
- Bug in :meth:`SeriesGroupBy.quantile` was raising on nullable integers (:issue:`33136`)
- Bug in :meth:`DataFrame.resample` where an ``AmbiguousTimeError`` would be raised when the resulting timezone aware :class:`DatetimeIndex` had a DST transition at midnight (:issue:`25758`)
- Bug in :meth:`DataFrame.groupby` where a ``ValueError`` would be raised when grouping by a categorical column with read-only categories and ``sort=False`` (:issue:`33410`)
- Bug in :meth:`GroupBy.agg`, :meth:`GroupBy.transform`, and :meth:`GroupBy.resample` where subclasses are not preserved (:issue:`28330`)
- Bug in :meth:`.DataFrameGroupBy.agg`, :meth:`.SeriesGroupBy.agg`, :meth:`.DataFrameGroupBy.transform`, :meth:`.SeriesGroupBy.transform`, :meth:`.DataFrameGroupBy.resample`, and :meth:`.SeriesGroupBy.resample` where subclasses are not preserved (:issue:`28330`)
- Bug in :meth:`SeriesGroupBy.agg` where any column name was accepted in the named aggregation of :class:`SeriesGroupBy` previously. The behaviour now allows only ``str`` and callables else would raise ``TypeError``. (:issue:`34422`)
- Bug in :meth:`DataFrame.groupby` lost the name of the :class:`Index` when one of the ``agg`` keys referenced an empty list (:issue:`32580`)
- Bug in :meth:`Rolling.apply` where ``center=True`` was ignored when ``engine='numba'`` was specified (:issue:`34784`)
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Bug fixes
~~~~~~~~~
- Bug causing ``groupby(...).sum()`` and similar to not preserve metadata (:issue:`29442`)
- Bug in :meth:`Series.isin` and :meth:`DataFrame.isin` raising a ``ValueError`` when the target was read-only (:issue:`37174`)
- Bug in :meth:`GroupBy.fillna` that introduced a performance regression after 1.0.5 (:issue:`36757`)
- Bug in :meth:`.DataFrameGroupBy.fillna` and :meth:`.SeriesGroupBy.fillna` that introduced a performance regression after 1.0.5 (:issue:`36757`)
- Bug in :meth:`DataFrame.info` was raising a ``KeyError`` when the DataFrame has integer column names (:issue:`37245`)
- Bug in :meth:`DataFrameGroupby.apply` would drop a :class:`CategoricalIndex` when grouped on (:issue:`35792`)

Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.fillna` not filling ``NaN`` after other operations such as :meth:`DataFrame.pivot` (:issue:`36495`).
- Fixed performance regression in ``df.groupby(..).rolling(..)`` (:issue:`38038`)
- Fixed regression in :meth:`MultiIndex.intersection` returning duplicates when at least one of the indexes had duplicates (:issue:`36915`)
- Fixed regression in :meth:`.GroupBy.first` and :meth:`.GroupBy.last` where ``None`` was considered a non-NA value (:issue:`38286`)
- Fixed regression in :meth:`.DataFrameGroupBy.first`, :meth:`.SeriesGroupBy.first`, :meth:`.DataFrameGroupBy.last`, and :meth:`.SeriesGroupBy.last` where ``None`` was considered a non-NA value (:issue:`38286`)

.. ---------------------------------------------------------------------------

Expand Down
Loading