Skip to content

BUG: Bug fix for GH48567 #48772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 30, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.6.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,7 @@ Plotting
Groupby/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^
- Bug in :meth:`DataFrameGroupBy.sample` raises ``ValueError`` when the object is empty (:issue:`48459`)
- Bug in :meth:`Series.groupby` raises ``ValueError`` when an entry of the index is equal to the name of the index (:issue:`48567`)
-

Reshaping
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -875,7 +875,7 @@ def is_in_obj(gpr) -> bool:
exclusions.add(gpr.name)

elif is_in_axis(gpr): # df.groupby('name')
if gpr in obj:
if not isinstance(obj, Series) and gpr in obj:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be grp in group_axis?

I am not 100% familiar with this code path but there may be a case where this can apply to the Series index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Series index the if-statement one line before "is_in_axis(gpr)" gives true. But the if-statement "gpr in obj" is only for dataframes to check whether the group key gpr is one of the columns. For series (and gpr in series index) the corresponding else-statement is correct. The new test cases prove that the result is correct, if the group key is the series index. I checked that it is working with series that has MultiIndex as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - this looks good to me. For a Series, we only need to consider the case axis=0. So when you have an "in-axis grouper" for a Series, I think there are conceptually only two options: the index, or the values itself. I believe grouping the by the values via the in-axis path, e.g.

ser = pd.Series([1, 2, 1, 2], index=[1, 1, 2, 2], name="b")
ser.index.name = "a"
ser.groupby("b").sum()

is entirely not supported (though ser.groupby(ser).sum() is supported - but that's a different code path) and is outside the scope here. So that only leaves checking if the provided grouper matches the index name, which is the elif block here.

if validate:
obj._check_label_or_level_ambiguity(gpr, axis=axis)
in_axis, name, gpr = True, gpr, obj[gpr]
Expand Down
27 changes: 27 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2907,3 +2907,30 @@ def test_groupby_cumsum_mask(any_numeric_ea_dtype, skipna, val):
dtype=any_numeric_ea_dtype,
)
tm.assert_frame_equal(result, expected)

def test_groupby_index_name_in_index_content():
# GH 48567
series1 = Series(data=[1.0, 2.0, 3.0, 4.0, 5.0], name='values',
index=Index(['foo', 'foo', 'bar', 'baz', 'blah'],
name='blah'))
result1 = series1.groupby('blah').sum()
expected1 = Series(data=[3.0, 4.0, 5.0, 3.0], name='values',
index=Index(['bar', 'baz', 'blah', 'foo'], name='blah'))
tm.assert_series_equal(result1, expected1)

series2 = Series(data=[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name='values',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest.mark.parameterize to simplify these similar tests?

Also looks like theres formatting issues here: https://github.com/pandas-dev/pandas/actions/runs/3122348100/jobs/5064232051

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to simplify the tests using pytest.mark.parameterize.

I am not able to use pre-commit (there is a known bug with posix_prefix, posix_local and virtualenv) on my system. I am still searching for a solution. Is there another way to check formatting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-commit runs black and flake8 so if you're able to run those manually it should be comparable

index=Index(['foo', 'foo', 'bar', 'baz', 'blah', 'blah'],
name='blah'))
result2 = series2.groupby('blah').sum()
expected2 = Series(data=[3.0, 4.0, 11.0, 3.0], name='values',
index=Index(['bar', 'baz', 'blah', 'foo'],
name='blah'))
tm.assert_series_equal(result2, expected2)

result3 = series1.to_frame().groupby('blah').sum()
expected3 = expected1.to_frame()
tm.assert_frame_equal(result3, expected3)

result4 = series2.to_frame().groupby('blah').sum()
expected4 = expected2.to_frame()
tm.assert_frame_equal(result4, expected4)