@@ -379,6 +379,65 @@ instead (:issue:`26314`).
379
379
380
380
.. ---------------------------------------------------------------------------
381
381
382
+ .. _whatsnew_140.notable_bug_fixes.groupby_apply_mutation :
383
+
384
+ groupby.apply consistent transform detection
385
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
386
+
387
+ :meth: `.GroupBy.apply ` is designed to be flexible, allowing users to perform
388
+ aggregations, transformations, filters, and use it with user-defined functions
389
+ that might not fall into any of these categories. As part of this, apply
390
+ will attempt to detect when an operation is a transform, and in such a
391
+ case, the result will have the same index as the input. In order to
392
+ determine if the operation is a transform, pandas compares the
393
+ input's index to the result's and determines if it has been mutated.
394
+ Previously in pandas 1.3, different code paths used different definitions
395
+ of "mutated": some would use Python's ``is `` whereas others would test
396
+ only up to equality.
397
+
398
+ This inconsistency has been removed, pandas now tests up to equality.
399
+
400
+ .. ipython :: python
401
+
402
+ def func (x ):
403
+ return x.copy()
404
+
405
+ df = pd.DataFrame({' a' : [1 , 2 ], ' b' : [3 , 4 ], ' c' : [5 , 6 ]})
406
+ df
407
+
408
+ *Previous behavior *:
409
+
410
+ .. code-block :: ipython
411
+
412
+ In [3]: df.groupby(['a']).apply(func)
413
+ Out[3]:
414
+ a b c
415
+ a
416
+ 1 0 1 3 5
417
+ 2 1 2 4 6
418
+
419
+ In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
420
+ Out[4]:
421
+ c
422
+ a b
423
+ 1 3 5
424
+ 2 4 6
425
+
426
+ In the examples above, the first uses a code path where pandas uses
427
+ ``is `` and determines that ``func `` is not a transform whereas the second
428
+ tests up to equality and determines that ``func `` is a transform. In the
429
+ first case, the result's index is not the same as the input's.
430
+
431
+ *New behavior *:
432
+
433
+ .. ipython :: python
434
+
435
+ df.groupby([' a' ]).apply(func)
436
+ df.set_index([' a' , ' b' ]).groupby([' a' ]).apply(func)
437
+
438
+ Now in both cases it is determined that ``func `` is a transform. In each case, the
439
+ result has the same index as the input.
440
+
382
441
.. _whatsnew_140.api_breaking :
383
442
384
443
Backwards incompatible API changes
0 commit comments