Skip to content

Commit 9b33584

Browse files
committed
DOC: Add details to DataFrame groupby transform
Add requirements for user function in groupby transform closes #13543 [skip ci]
1 parent 0252385 commit 9b33584

File tree

2 files changed

+37
-5
lines changed

2 files changed

+37
-5
lines changed

doc/source/groupby.rst

+22-5
Original file line numberDiff line numberDiff line change
@@ -580,9 +580,17 @@ Transformation
580580
--------------
581581

582582
The ``transform`` method returns an object that is indexed the same (same size)
583-
as the one being grouped. Thus, the passed transform function should return a
584-
result that is the same size as the group chunk. For example, suppose we wished
585-
to standardize the data within each group:
583+
as the one being grouped. The transform function must:
584+
585+
* Return a result that is either the same size as the group chunk or
586+
broadcastable to the size of the group chunk.
587+
* Operate column-by-column on the group chunk. A fast path is used if the
588+
transform function also operates on the entire group chunk as a DataFrame.
589+
* Does not perform in-place operations on the group chunk. Group chunks should
590+
be treated as immutable, and changes to a group chunk may produce unexpected
591+
results.
592+
593+
For example, suppose we wished to standardize the data within each group:
586594

587595
.. ipython:: python
588596
@@ -620,6 +628,14 @@ We can also visually compare the original and transformed data sets.
620628
@savefig groupby_transform_plot.png
621629
compare.plot()
622630
631+
Transformation functions that have lower dimension outputs are broadcast to
632+
match the shape of the input array.
633+
634+
.. ipython:: python
635+
636+
extremum = lambda x: x.max() - x.min()
637+
transformed = ts.groupby(key).transform(extremum)
638+
623639
Another common data transform is to replace missing data with the group mean.
624640

625641
.. ipython:: python
@@ -664,8 +680,9 @@ and that the transformed data contains no NAs.
664680
665681
.. note::
666682

667-
Some functions when applied to a groupby object will automatically transform the input, returning
668-
an object of the same shape as the original. Passing ``as_index=False`` will not affect these transformation methods.
683+
Some functions when applied to a groupby object will automatically transform
684+
the input, returning an object of the same shape as the original. Passing
685+
``as_index=False`` will not affect these transformation methods.
669686

670687
For example: ``fillna, ffill, bfill, shift``.
671688

pandas/core/groupby.py

+15
Original file line numberDiff line numberDiff line change
@@ -3548,10 +3548,25 @@ def transform(self, func, *args, **kwargs):
35483548
Each subframe is endowed the attribute 'name' in case you need to know
35493549
which group you are working on.
35503550
3551+
The current implementation imposes three requirements on f:
3552+
3553+
* f must return a value that either has the same shape as the input
3554+
subframe or can be broadcast to the shape of the input subframe.
3555+
For example, f returns a scalar it will be broadcast to have the
3556+
same shape as the input subframe.
3557+
* f must support application column-by-column in the subframe. An
3558+
optional fast path is used if f also supports application to the
3559+
entire subframe.
3560+
* f must not mutate subframes. Mutation is not supported and may
3561+
produce unexpected results.
3562+
35513563
Examples
35523564
--------
35533565
>>> grouped = df.groupby(lambda x: mapping[x])
3566+
# Same shape
35543567
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
3568+
# Broadcastable
3569+
>>> grouped.transform(lambda x: x.max() - x.min())
35553570
"""
35563571

35573572
# optimized transforms

0 commit comments

Comments
 (0)