Skip to content

Commit 039eaed

Browse files
JanVHIInoatamir
authored andcommitted
BUG: Bug fix for GH48567 (pandas-dev#48772)
* BUG: Bug fix for GH48567 Groupby for Series fails if an entry of the index of the Series is equal to the name of the index. Analyzing the bug, I think the following small change would solve the issue: diff --git a/pandas/core/groupby/grouper.py b/pandas/core/groupby/grouper.py index 5fc713d84e..9281040a79 100644 --- a/pandas/core/groupby/grouper.py +++ b/pandas/core/groupby/grouper.py @@ -875,7 +875,7 @@ def get_grouper( exclusions.add(gpr.name) elif is_in_axis(gpr): # df.groupby('name') - if gpr in obj: + if not isinstance(obj, Series) and gpr in obj: if validate: obj._check_label_or_level_ambiguity(gpr, axis=axis) in_axis, name, gpr = True, gpr, obj[gpr] From my point of view "gpr in obj" does not make sense for series at this point. I think the if-statement is written for dataframes to look if gbr is one of the columns. Skipping the if-statement for series gives the correct results for the examples above. The testsuite for series and groupby runs without errors after the change. I added a test to the test suite at the end of pandas/tests/groupby/test_groupby.py and a comment in the BUG section to doc/source/whatsnew/v1.6.0.rst. * BUG: Bug fix for GH48567 Used pytest.mark.parameterize to simplify similar tests. Corrected formatting issues. * BUG: Bug fix for GH48567 Changed the isinstance check to obj.ndim != 1 as requested.
1 parent 9762f10 commit 039eaed

File tree

3 files changed

+33
-1
lines changed

3 files changed

+33
-1
lines changed

doc/source/whatsnew/v1.6.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,7 @@ Groupby/resample/rolling
243243
^^^^^^^^^^^^^^^^^^^^^^^^
244244
- Bug in :class:`.ExponentialMovingWindow` with ``online`` not raising a ``NotImplementedError`` for unsupported operations (:issue:`48834`)
245245
- Bug in :meth:`DataFrameGroupBy.sample` raises ``ValueError`` when the object is empty (:issue:`48459`)
246+
- Bug in :meth:`Series.groupby` raises ``ValueError`` when an entry of the index is equal to the name of the index (:issue:`48567`)
246247
-
247248

248249
Reshaping

pandas/core/groupby/grouper.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -875,7 +875,7 @@ def is_in_obj(gpr) -> bool:
875875
exclusions.add(gpr.name)
876876

877877
elif is_in_axis(gpr): # df.groupby('name')
878-
if gpr in obj:
878+
if obj.ndim != 1 and gpr in obj:
879879
if validate:
880880
obj._check_label_or_level_ambiguity(gpr, axis=axis)
881881
in_axis, name, gpr = True, gpr, obj[gpr]

pandas/tests/groupby/test_groupby.py

+31
Original file line numberDiff line numberDiff line change
@@ -2907,3 +2907,34 @@ def test_groupby_cumsum_mask(any_numeric_ea_dtype, skipna, val):
29072907
dtype=any_numeric_ea_dtype,
29082908
)
29092909
tm.assert_frame_equal(result, expected)
2910+
2911+
2912+
@pytest.mark.parametrize(
2913+
"val_in, index, val_out",
2914+
[
2915+
(
2916+
[1.0, 2.0, 3.0, 4.0, 5.0],
2917+
["foo", "foo", "bar", "baz", "blah"],
2918+
[3.0, 4.0, 5.0, 3.0],
2919+
),
2920+
(
2921+
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0],
2922+
["foo", "foo", "bar", "baz", "blah", "blah"],
2923+
[3.0, 4.0, 11.0, 3.0],
2924+
),
2925+
],
2926+
)
2927+
def test_groupby_index_name_in_index_content(val_in, index, val_out):
2928+
# GH 48567
2929+
series = Series(data=val_in, name="values", index=Index(index, name="blah"))
2930+
result = series.groupby("blah").sum()
2931+
expected = Series(
2932+
data=val_out,
2933+
name="values",
2934+
index=Index(["bar", "baz", "blah", "foo"], name="blah"),
2935+
)
2936+
tm.assert_series_equal(result, expected)
2937+
2938+
result = series.to_frame().groupby("blah").sum()
2939+
expected = expected.to_frame()
2940+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)