-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix min_count issue for groupby.sum #32914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
2a3f814
3ce0384
e25c823
2656bc3
300b1e5
e2e08e2
ea59d54
ab1050f
98f0922
ea1ef36
cdf7dfb
b4edea3
02fc55c
3044843
fe0ab93
e6f5b4d
c29dfad
fb6b1d5
46eb601
2e65c14
a36131a
7c815b5
aeec4b0
01c1d56
fc0f406
47c19e2
8da2977
d74a905
ca3659e
0443d73
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1636,3 +1636,20 @@ def test_apply_to_nullable_integer_returns_float(values, function): | |
result = groups.agg([function]) | ||
expected.columns = MultiIndex.from_tuples([("b", function)]) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_groupby_sum_below_mincount_nullable_integer(): | ||
# https://github.com/pandas-dev/pandas/issues/32861 | ||
df = pd.DataFrame({"a": [0, 1, 2], "b": [0, 1, 2], "c": [0, 1, 2]}, dtype="Int64") | ||
grouped = df.groupby("a") | ||
idx = pd.Index([0, 1, 2], dtype=object, name="a") | ||
|
||
result = grouped["b"].sum(min_count=2) | ||
expected = pd.Series([np.nan] * 3, index=idx, name="b") | ||
tm.assert_series_equal(result, expected) | ||
|
||
result = grouped.sum(min_count=2) | ||
expected = pd.DataFrame( | ||
{"b": [pd.NA] * 3, "c": [pd.NA] * 3}, dtype="Int64", index=idx | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not ideal but it seems we get There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. huh? that seems odd, can you track this down There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you create an issue about this. I am not sure this is correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which part seems incorrect? The dtype of the index being object is maybe odd, but otherwise seems okay? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it seems your comment about is not correct, e.g. we have NA in all cases. is that not what you meant here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that comment was from before adding the conversion using maybe_cast_result (so the original test had NaN for Series but NA for DataFrame): 2a3f814 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok this is fine then |
||
) | ||
tm.assert_frame_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this condition to the above elif & add an appropriate comment as 3.