-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix groupby observed=True when aggregating a column #24412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 9 commits
debb02a
f450dc2
3b50900
e6e9894
5181639
4c68d41
77cf3c6
572d9c3
8793995
b75bc7b
ef48de6
d87e159
b01d809
7a6898f
d584327
3e9178a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -860,3 +860,17 @@ def test_groupby_multiindex_categorical_datetime(): | |
expected = pd.DataFrame( | ||
{'values': [0, 4, 8, 3, 4, 5, 6, np.nan, 2]}, index=idx) | ||
assert_frame_equal(result, expected) | ||
|
||
|
||
def test_groupby_agg_observed_true_single_column(): | ||
# GH-23970 | ||
expected = pd.DataFrame({ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can create a test case where I originally had There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you parameterize over as_index=True/False There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was parameterising
I understand that the issue is with not using CategoricalIndex at MultiIndex[0] but I have no idea how to go about doing it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot! |
||
'a': pd.Series([1, 1, 2], dtype='category'), | ||
'b': [1, 2, 2], | ||
'x': [1, 2, 3] | ||
}) | ||
|
||
result = expected.groupby( | ||
['a', 'b'], as_index=False, observed=True)['x'].sum() | ||
|
||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is when its a categorical column; also this is only for
as_index=False
, right?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any operation over a selected column was effectively ignoring the
observed
parameter that was being passed to it. Withas_index=True
theobserved
value isn't used so the code behaves as expected.