Skip to content

Commit 95e345b

Browse files
bashtagemattip
authored andcommitted
DOC: Add details to DataFrame groupby transform
closes pandas-dev#13543 Author: Kevin Sheppard <[email protected]> Closes pandas-dev#14388 from bashtage/groupby-transform-doc-string and squashes the following commits: ef1ff13 [Kevin Sheppard] DOC: Add details to DataFrame groupby transform
1 parent 6c3d1f5 commit 95e345b

File tree

2 files changed

+48
-5
lines changed

2 files changed

+48
-5
lines changed

doc/source/groupby.rst

+33-5
Original file line numberDiff line numberDiff line change
@@ -580,9 +580,21 @@ Transformation
580580
--------------
581581

582582
The ``transform`` method returns an object that is indexed the same (same size)
583-
as the one being grouped. Thus, the passed transform function should return a
584-
result that is the same size as the group chunk. For example, suppose we wished
585-
to standardize the data within each group:
583+
as the one being grouped. The transform function must:
584+
585+
* Return a result that is either the same size as the group chunk or
586+
broadcastable to the size of the group chunk (e.g., a scalar,
587+
``grouped.transform(lambda x: x.iloc[-1])``).
588+
* Operate column-by-column on the group chunk. The transform is applied to
589+
the first group chunk using chunk.apply.
590+
* Not perform in-place operations on the group chunk. Group chunks should
591+
be treated as immutable, and changes to a group chunk may produce unexpected
592+
results. For example, when using ``fillna``, ``inplace`` must be ``False``
593+
(``grouped.transform(lambda x: x.fillna(inplace=False))``).
594+
* (Optionally) operates on the entire group chunk. If this is supported, a
595+
fast path is used starting from the *second* chunk.
596+
597+
For example, suppose we wished to standardize the data within each group:
586598

587599
.. ipython:: python
588600
@@ -620,6 +632,21 @@ We can also visually compare the original and transformed data sets.
620632
@savefig groupby_transform_plot.png
621633
compare.plot()
622634
635+
Transformation functions that have lower dimension outputs are broadcast to
636+
match the shape of the input array.
637+
638+
.. ipython:: python
639+
640+
data_range = lambda x: x.max() - x.min()
641+
ts.groupby(key).transform(data_range)
642+
643+
Alternatively the built-in methods can be could be used to produce the same
644+
outputs
645+
646+
.. ipython:: python
647+
648+
ts.groupby(key).transform('max') - ts.groupby(key).transform('min')
649+
623650
Another common data transform is to replace missing data with the group mean.
624651

625652
.. ipython:: python
@@ -664,8 +691,9 @@ and that the transformed data contains no NAs.
664691
665692
.. note::
666693

667-
Some functions when applied to a groupby object will automatically transform the input, returning
668-
an object of the same shape as the original. Passing ``as_index=False`` will not affect these transformation methods.
694+
Some functions when applied to a groupby object will automatically transform
695+
the input, returning an object of the same shape as the original. Passing
696+
``as_index=False`` will not affect these transformation methods.
669697

670698
For example: ``fillna, ffill, bfill, shift``.
671699

pandas/core/groupby.py

+15
Original file line numberDiff line numberDiff line change
@@ -3649,10 +3649,25 @@ def transform(self, func, *args, **kwargs):
36493649
Each subframe is endowed the attribute 'name' in case you need to know
36503650
which group you are working on.
36513651
3652+
The current implementation imposes three requirements on f:
3653+
3654+
* f must return a value that either has the same shape as the input
3655+
subframe or can be broadcast to the shape of the input subframe.
3656+
For example, f returns a scalar it will be broadcast to have the
3657+
same shape as the input subframe.
3658+
* f must support application column-by-column in the subframe. If f
3659+
also supports application to the entire subframe, then a fast path
3660+
is used starting from the second chunk.
3661+
* f must not mutate subframes. Mutation is not supported and may
3662+
produce unexpected results.
3663+
36523664
Examples
36533665
--------
36543666
>>> grouped = df.groupby(lambda x: mapping[x])
3667+
# Same shape
36553668
>>> grouped.transform(lambda x: (x - x.mean()) / x.std())
3669+
# Broadcastable
3670+
>>> grouped.transform(lambda x: x.max() - x.min())
36563671
"""
36573672

36583673
# optimized transforms

0 commit comments

Comments
 (0)