Skip to content

Commit 4060fde

Browse files
rhshadrachmeeseeksmachine
authored andcommitted
Backport PR pandas-dev#45393: DOC: whatsnew note for groupby.apply bugfix
1 parent 638a42f commit 4060fde

File tree

1 file changed

+59
-0
lines changed

1 file changed

+59
-0
lines changed

doc/source/whatsnew/v1.4.0.rst

+59
Original file line numberDiff line numberDiff line change
@@ -379,6 +379,65 @@ instead (:issue:`26314`).
379379
380380
.. ---------------------------------------------------------------------------
381381
382+
.. _whatsnew_140.notable_bug_fixes.groupby_apply_mutation:
383+
384+
groupby.apply consistent transform detection
385+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
386+
387+
:meth:`.GroupBy.apply` is designed to be flexible, allowing users to perform
388+
aggregations, transformations, filters, and use it with user-defined functions
389+
that might not fall into any of these categories. As part of this, apply
390+
will attempt to detect when an operation is a transform, and in such a
391+
case, the result will have the same index as the input. In order to
392+
determine if the operation is a transform, pandas compares the
393+
input's index to the result's and determines if it has been mutated.
394+
Previously in pandas 1.3, different code paths used different definitions
395+
of "mutated": some would use Python's ``is`` whereas others would test
396+
only up to equality.
397+
398+
This inconsistency has been removed, pandas now tests up to equality.
399+
400+
.. ipython:: python
401+
402+
def func(x):
403+
return x.copy()
404+
405+
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
406+
df
407+
408+
*Previous behavior*:
409+
410+
.. code-block:: ipython
411+
412+
In [3]: df.groupby(['a']).apply(func)
413+
Out[3]:
414+
a b c
415+
a
416+
1 0 1 3 5
417+
2 1 2 4 6
418+
419+
In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
420+
Out[4]:
421+
c
422+
a b
423+
1 3 5
424+
2 4 6
425+
426+
In the examples above, the first uses a code path where pandas uses
427+
``is`` and determines that ``func`` is not a transform whereas the second
428+
tests up to equality and determines that ``func`` is a transform. In the
429+
first case, the result's index is not the same as the input's.
430+
431+
*New behavior*:
432+
433+
.. ipython:: python
434+
435+
df.groupby(['a']).apply(func)
436+
df.set_index(['a', 'b']).groupby(['a']).apply(func)
437+
438+
Now in both cases it is determined that ``func`` is a transform. In each case, the
439+
result has the same index as the input.
440+
382441
.. _whatsnew_140.api_breaking:
383442

384443
Backwards incompatible API changes

0 commit comments

Comments
 (0)