Skip to content

Commit f977dd0

Browse files
mattB1989matt
authored andcommitted
BUG Fixing columns dropped from multi index in group by transform GH4… (pandas-dev#47840)
* BUG Fixing columns dropped from multi index in group by transform GH47787 * fixing pep8 issues * testing series as well as dataframe * fixing typo * adding a timestamp in the index so tshift fails with the right error * fixing formatting * using the module assert * adding a test on the dataframe * improve test post review * typo fix * explicitly casting to int Co-authored-by: matt <[email protected]>
1 parent 46d1bf7 commit f977dd0

File tree

3 files changed

+42
-0
lines changed

3 files changed

+42
-0
lines changed

doc/source/whatsnew/v1.5.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1096,6 +1096,7 @@ Groupby/resample/rolling
10961096
- Bug in :meth:`DataFrame.resample` reduction methods when used with ``on`` would attempt to aggregate the provided column (:issue:`47079`)
10971097
- Bug in :meth:`DataFrame.groupby` and :meth:`Series.groupby` would not respect ``dropna=False`` when the input DataFrame/Series had a NaN values in a :class:`MultiIndex` (:issue:`46783`)
10981098
- Bug in :meth:`DataFrameGroupBy.resample` raises ``KeyError`` when getting the result from a key list which misses the resample key (:issue:`47362`)
1099+
- Bug in :meth:`DataFrame.groupby` would lose index columns when the DataFrame is empty for transforms, like fillna (:issue:`47787`)
10991100
-
11001101

11011102
Reshaping

pandas/core/groupby/groupby.py

+5
Original file line numberDiff line numberDiff line change
@@ -1034,6 +1034,11 @@ def curried(x):
10341034
return self.apply(curried)
10351035

10361036
is_transform = name in base.transformation_kernels
1037+
1038+
# Transform needs to keep the same schema, including when empty
1039+
if is_transform and self._obj_with_exclusions.empty:
1040+
return self._obj_with_exclusions
1041+
10371042
result = self._python_apply_general(
10381043
curried, self._obj_with_exclusions, is_transform=is_transform
10391044
)

pandas/tests/groupby/test_groupby.py

+36
Original file line numberDiff line numberDiff line change
@@ -2349,6 +2349,42 @@ def test_groupby_duplicate_index():
23492349
tm.assert_series_equal(result, expected)
23502350

23512351

2352+
def test_group_on_empty_multiindex(transformation_func, request):
2353+
# GH 47787
2354+
# With one row, those are transforms so the schema should be the same
2355+
if transformation_func == "tshift":
2356+
mark = pytest.mark.xfail(raises=NotImplementedError)
2357+
request.node.add_marker(mark)
2358+
df = DataFrame(
2359+
data=[[1, Timestamp("today"), 3, 4]],
2360+
columns=["col_1", "col_2", "col_3", "col_4"],
2361+
)
2362+
df["col_3"] = df["col_3"].astype(int)
2363+
df["col_4"] = df["col_4"].astype(int)
2364+
df = df.set_index(["col_1", "col_2"])
2365+
if transformation_func == "fillna":
2366+
args = ("ffill",)
2367+
elif transformation_func == "tshift":
2368+
args = (1, "D")
2369+
else:
2370+
args = ()
2371+
result = df.iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
2372+
expected = df.groupby(["col_1"]).transform(transformation_func, *args).iloc[:0]
2373+
if transformation_func in ("diff", "shift"):
2374+
expected = expected.astype(int)
2375+
tm.assert_equal(result, expected)
2376+
2377+
result = (
2378+
df["col_3"].iloc[:0].groupby(["col_1"]).transform(transformation_func, *args)
2379+
)
2380+
expected = (
2381+
df["col_3"].groupby(["col_1"]).transform(transformation_func, *args).iloc[:0]
2382+
)
2383+
if transformation_func in ("diff", "shift"):
2384+
expected = expected.astype(int)
2385+
tm.assert_equal(result, expected)
2386+
2387+
23522388
@pytest.mark.parametrize(
23532389
"idx",
23542390
[

0 commit comments

Comments
 (0)