Skip to content

DOC: whatsnew note for groupby.apply bugfix #45393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 16, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,65 @@ instead (:issue:`26314`).

.. ---------------------------------------------------------------------------

.. _whatsnew_140.notable_bug_fixes.groupby_apply_mutation:

groupby.apply consistent transform detection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:`.GroupBy.apply` is designed to be flexible, allowing users to perform
aggregations, transformations, filters, and use it with user-defined functions
that might not fall into any of these categories. As part of this, apply
will attempt to detect when an operation is a transform, and in such a
case, the result will have the same index as the input. In order to
determine if the operation is a transform, pandas compares the
input's index to the result's and determines if it has been mutated.
Previously in pandas 1.3, different code paths used different definitions
of "mutated": some would use Python's ``is`` whereas others would test
only up to equality.

This inconsistency has been removed, pandas now tests up to equality.

.. ipython:: python

def func(x):
return x.copy()

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
df

*Previous behavior*:

.. code-block:: ipython

In [3]: df.groupby(['a']).apply(func)
Out[3]:
a b c
a
1 0 1 3 5
2 1 2 4 6

In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
c
a b
1 3 5
2 4 6

In the examples above, the first uses a code path where pandas uses
``is`` and determines that ``func`` is not a transform whereas the second
tests up to equality and determines that ``func`` is a transform. In the
first case, the result's index is not the same as the input's.

*New behavior*:

.. ipython:: python

df.groupby(['a']).apply(func)
df.set_index(['a', 'b']).groupby(['a']).apply(func)

Now in both cases it is determined that ``func`` is a transform. In each case, the
result has the same index as the input.

.. _whatsnew_140.api_breaking:

Backwards incompatible API changes
Expand Down