Skip to content

Commit 6d8b38b

Browse files
BACKPORT: groupby.apply incorrectly dropping nan #43236 (#43426)
1 parent 81f956e commit 6d8b38b

File tree

3 files changed

+11
-12
lines changed

3 files changed

+11
-12
lines changed

doc/source/whatsnew/v1.3.3.rst

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Fixed regressions
1717
- Fixed regression in :class:`DataFrame` constructor failing to broadcast for defined :class:`Index` and len one list of :class:`Timestamp` (:issue:`42810`)
1818
- Performance regression in :meth:`core.window.ewm.ExponentialMovingWindow.mean` (:issue:`42333`)
1919
- Fixed regression in :meth:`.GroupBy.agg` incorrectly raising in some cases (:issue:`42390`)
20+
- Fixed regression in :meth:`.GroupBy.apply` where ``nan`` values were dropped even with ``dropna=False`` (:issue:`43205`)
2021
- Fixed regression in :meth:`merge` where ``on`` columns with ``ExtensionDtype`` or ``bool`` data types were cast to ``object`` in ``right`` and ``outer`` merge (:issue:`40073`)
2122
- Fixed regression in :meth:`RangeIndex.where` and :meth:`RangeIndex.putmask` raising ``AssertionError`` when result did not represent a :class:`RangeIndex` (:issue:`43240`)
2223
- Fixed regression in :meth:`read_parquet` where the ``fastparquet`` engine would not work properly with fastparquet 0.7.0 (:issue:`43075`)

pandas/core/groupby/groupby.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -1012,7 +1012,11 @@ def reset_identity(values):
10121012

10131013
if not not_indexed_same:
10141014
result = concat(values, axis=self.axis)
1015-
ax = self.filter(lambda x: True).axes[self.axis]
1015+
ax = (
1016+
self.filter(lambda x: True).axes[self.axis]
1017+
if self.dropna
1018+
else self._selected_obj._get_axis(self.axis)
1019+
)
10161020

10171021
# this is a very unfortunate situation
10181022
# we can't use reindex to restore the original order

pandas/tests/groupby/test_apply.py

+5-11
Original file line numberDiff line numberDiff line change
@@ -1102,25 +1102,19 @@ def test_apply_by_cols_equals_apply_by_rows_transposed():
11021102
tm.assert_frame_equal(by_cols, df)
11031103

11041104

1105-
def test_apply_dropna_with_indexed_same():
1105+
@pytest.mark.parametrize("dropna", [True, False])
1106+
def test_apply_dropna_with_indexed_same(dropna):
11061107
# GH 38227
1107-
1108+
# GH#43205
11081109
df = DataFrame(
11091110
{
11101111
"col": [1, 2, 3, 4, 5],
11111112
"group": ["a", np.nan, np.nan, "b", "b"],
11121113
},
11131114
index=list("xxyxz"),
11141115
)
1115-
result = df.groupby("group").apply(lambda x: x)
1116-
expected = DataFrame(
1117-
{
1118-
"col": [1, 4, 5],
1119-
"group": ["a", "b", "b"],
1120-
},
1121-
index=list("xxz"),
1122-
)
1123-
1116+
result = df.groupby("group", dropna=dropna).apply(lambda x: x)
1117+
expected = df.dropna() if dropna else df.iloc[[0, 3, 1, 2, 4]]
11241118
tm.assert_frame_equal(result, expected)
11251119

11261120

0 commit comments

Comments
 (0)