Skip to content

BUG: groupby with no non-empty groups, #21624 #21849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
2 changes: 1 addition & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2349,7 +2349,7 @@ def size(self):
if ngroup:
out = np.bincount(ids[ids != -1], minlength=ngroup)
else:
out = ids
out = []
return Series(out,
index=self.result_index,
dtype='int64')
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/groupby/test_transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -782,3 +782,14 @@ def test_any_all_np_func(func):

res = df.groupby('key')['val'].transform(func)
tm.assert_series_equal(res, exp)


def test_transform_with_all_nan():
# GH 21624
df = DataFrame({'groups': [np.nan, np.nan, np.nan],
'values': [1, 2, 3]})

grouped = df.groupby('groups')
summed = grouped['values'].transform('sum')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit but we mostly use result to compare against expected - can you change the variable name here and on the assertion line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

expected = Series([np.nan, np.nan, np.nan], name='values')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe I'm missing the point but shouldn't these be 6 and not np.nan?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the behavior is to output np.nan for anything in a group with label np.nan. Making this output 6 would be changing the interpretation of np.nan from not in any group to in a group with label np.nan, which is going to be a pretty significant change in how things work. The treatment here of ignoring NA values is specified in the groupby docs.

For example, see test_groupby_transform_with_nan_group() in test_transform.py

tm.assert_series_equal(summed, expected)