-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Bug fix for GH48567 #48772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Bug fix for GH48567 #48772
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2907,3 +2907,30 @@ def test_groupby_cumsum_mask(any_numeric_ea_dtype, skipna, val): | |
dtype=any_numeric_ea_dtype, | ||
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
def test_groupby_index_name_in_index_content(): | ||
# GH 48567 | ||
series1 = Series(data=[1.0, 2.0, 3.0, 4.0, 5.0], name='values', | ||
index=Index(['foo', 'foo', 'bar', 'baz', 'blah'], | ||
name='blah')) | ||
result1 = series1.groupby('blah').sum() | ||
expected1 = Series(data=[3.0, 4.0, 5.0, 3.0], name='values', | ||
index=Index(['bar', 'baz', 'blah', 'foo'], name='blah')) | ||
tm.assert_series_equal(result1, expected1) | ||
|
||
series2 = Series(data=[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], name='values', | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you use Also looks like theres formatting issues here: https://github.com/pandas-dev/pandas/actions/runs/3122348100/jobs/5064232051 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will try to simplify the tests using pytest.mark.parameterize. I am not able to use pre-commit (there is a known bug with posix_prefix, posix_local and virtualenv) on my system. I am still searching for a solution. Is there another way to check formatting? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pre-commit runs black and flake8 so if you're able to run those manually it should be comparable |
||
index=Index(['foo', 'foo', 'bar', 'baz', 'blah', 'blah'], | ||
name='blah')) | ||
result2 = series2.groupby('blah').sum() | ||
expected2 = Series(data=[3.0, 4.0, 11.0, 3.0], name='values', | ||
index=Index(['bar', 'baz', 'blah', 'foo'], | ||
name='blah')) | ||
tm.assert_series_equal(result2, expected2) | ||
|
||
result3 = series1.to_frame().groupby('blah').sum() | ||
expected3 = expected1.to_frame() | ||
tm.assert_frame_equal(result3, expected3) | ||
|
||
result4 = series2.to_frame().groupby('blah').sum() | ||
expected4 = expected2.to_frame() | ||
tm.assert_frame_equal(result4, expected4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be
grp in group_axis
?I am not 100% familiar with this code path but there may be a case where this can apply to the Series index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Series index the if-statement one line before "is_in_axis(gpr)" gives true. But the if-statement "gpr in obj" is only for dataframes to check whether the group key gpr is one of the columns. For series (and gpr in series index) the corresponding else-statement is correct. The new test cases prove that the result is correct, if the group key is the series index. I checked that it is working with series that has MultiIndex as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - this looks good to me. For a Series, we only need to consider the case
axis=0
. So when you have an "in-axis grouper" for a Series, I think there are conceptually only two options: the index, or the values itself. I believe grouping the by the values via the in-axis path, e.g.is entirely not supported (though
ser.groupby(ser).sum()
is supported - but that's a different code path) and is outside the scope here. So that only leaves checking if the provided grouper matches the index name, which is the elif block here.