Skip to content

DOC: Add details to DataFrame groupby transform #14388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 33 additions & 5 deletions doc/source/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -580,9 +580,21 @@ Transformation
--------------

The ``transform`` method returns an object that is indexed the same (same size)
as the one being grouped. Thus, the passed transform function should return a
result that is the same size as the group chunk. For example, suppose we wished
to standardize the data within each group:
as the one being grouped. The transform function must:

* Return a result that is either the same size as the group chunk or
broadcastable to the size of the group chunk (e.g., a scalar,
``grouped.transform(lambda x: x.iloc[-1])``).
* Operate column-by-column on the group chunk. The transform is applied to
the first group chunk using chunk.apply.
* Not perform in-place operations on the group chunk. Group chunks should
be treated as immutable, and changes to a group chunk may produce unexpected
results. For example, when using ``fillna``, ``inplace`` must be ``False``
(``grouped.transform(lambda x: x.fillna(inplace=False))``).
* (Optionally) operates on the entire group chunk. If this is supported, a
fast path is used starting from the *second* chunk.

For example, suppose we wished to standardize the data within each group:

.. ipython:: python

Expand Down Expand Up @@ -620,6 +632,21 @@ We can also visually compare the original and transformed data sets.
@savefig groupby_transform_plot.png
compare.plot()

Transformation functions that have lower dimension outputs are broadcast to
match the shape of the input array.

.. ipython:: python

data_range = lambda x: x.max() - x.min()
ts.groupby(key).transform(data_range)

Alternatively the built-in methods can be could be used to produce the same
outputs

.. ipython:: python

ts.groupby(key).transform('max') - ts.groupby(key).transform('min')

Another common data transform is to replace missing data with the group mean.

.. ipython:: python
Expand Down Expand Up @@ -664,8 +691,9 @@ and that the transformed data contains no NAs.

.. note::

Some functions when applied to a groupby object will automatically transform the input, returning
an object of the same shape as the original. Passing ``as_index=False`` will not affect these transformation methods.
Some functions when applied to a groupby object will automatically transform
the input, returning an object of the same shape as the original. Passing
``as_index=False`` will not affect these transformation methods.

For example: ``fillna, ffill, bfill, shift``.

Expand Down
15 changes: 15 additions & 0 deletions pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -3674,10 +3674,25 @@ def transform(self, func, *args, **kwargs):
Each subframe is endowed the attribute 'name' in case you need to know
which group you are working on.

The current implementation imposes three requirements on f:

* f must return a value that either has the same shape as the input
subframe or can be broadcast to the shape of the input subframe.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example here (and in the docs), for each of these? and example function that is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I just added that point above!

For example, f returns a scalar it will be broadcast to have the
same shape as the input subframe.
* f must support application column-by-column in the subframe. If f
also supports application to the entire subframe, then a fast path
is used starting from the second chunk.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the same to SeriesGroupBy.transform? or even better is to rip out the doc-strings to have a common one :> and use Appender.

* f must not mutate subframes. Mutation is not supported and may
produce unexpected results.

Examples
--------
>>> grouped = df.groupby(lambda x: mapping[x])
# Same shape
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
# Broadcastable
>>> grouped.transform(lambda x: x.max() - x.min())
"""

# optimized transforms
Expand Down