Skip to content

BUG Fixing columns dropped from multi index in group by transform GH4… #47840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 17, 2022
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1010,6 +1010,7 @@ Groupby/resample/rolling
- Bug in :meth:`DataFrame.resample` reduction methods when used with ``on`` would attempt to aggregate the provided column (:issue:`47079`)
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` would not respect ``dropna=False`` when the input DataFrame/Series had a NaN values in a :class:`MultiIndex` (:issue:`46783`)
- Bug in :meth:`DataFrameGroupBy.resample` raises ``KeyError`` when getting the result from a key list which misses the resample key (:issue:`47362`)
- Bug in :meth:`DataFrame.groupby` would lose index columns when the DataFrame is empty for transforms, like fillna (:issue:`47787`)
-

Reshaping
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1020,6 +1020,11 @@ def curried(x):
return self.apply(curried)

is_transform = name in base.transformation_kernels

# Transform needs to keep the same schema, including when empty
if is_transform and self._obj_with_exclusions.empty:
return self._obj_with_exclusions

result = self._python_apply_general(
curried, self._obj_with_exclusions, is_transform=is_transform
)
Expand Down
35 changes: 35 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2348,6 +2348,41 @@ def test_groupby_duplicate_index():
tm.assert_series_equal(result, expected)


def test_group_on_empty_multiindex(transformation_func, request):
# GH 47787
# With one row, those are transforms so the schema should be the same
if transformation_func == "tshift":
mark = pytest.mark.xfail(raises=NotImplementedError)
request.node.add_marker(mark)
df = DataFrame(
data=[[1, Timestamp("today"), 3, 4]],
columns=["col_1", "col_2", "col_3", "col_4"],
)
df = df.set_index(["col_1", "col_2"])
if transformation_func == "fillna":
args = ("ffill",)
elif transformation_func == "tshift":
args = (1, "D")
else:
args = ()
result = df.groupby(["col_1"]).transform(transformation_func, *args)
tm.assert_index_equal(df.index, result.index)

col_3 = df["col_3"]
result = col_3.groupby(["col_1"]).transform(transformation_func, *args)
tm.assert_index_equal(col_3.index, result.index)

# When empty, expect the same schema as well
df = DataFrame(data=[], columns=["col_1", "col_2", "col_3", "col_4"], dtype=int)
df = df.set_index(["col_1", "col_2"])
result = df.groupby(["col_1"]).transform(transformation_func, *args)
assert df.index.names == result.index.names

col_3 = df["col_3"]
result = col_3.groupby(["col_1"]).transform(transformation_func, *args)
assert col_3.index.names == result.index.names


@pytest.mark.parametrize(
"idx",
[
Expand Down