Skip to content

BUG: regression when applying groupby aggregation on categorical columns #31359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jan 29, 2020
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
7e461a1
remove \n from docstring
charlesdong1991 Dec 3, 2018
1314059
fix conflicts
charlesdong1991 Jan 19, 2019
8bcb313
Merge remote-tracking branch 'upstream/master'
charlesdong1991 Jul 30, 2019
24c3ede
Merge remote-tracking branch 'upstream/master'
charlesdong1991 Jan 14, 2020
dea38f2
fix issue 17038
charlesdong1991 Jan 14, 2020
cd9e7ac
revert change
charlesdong1991 Jan 14, 2020
e5e912b
revert change
charlesdong1991 Jan 14, 2020
d0cddb3
Merge remote-tracking branch 'upstream/master' into fix_issue_31256
charlesdong1991 Jan 26, 2020
4e1abde
try fix 31256
charlesdong1991 Jan 27, 2020
0b917c6
pep8
charlesdong1991 Jan 27, 2020
e9cac5d
fix test
charlesdong1991 Jan 27, 2020
e03357e
fix
charlesdong1991 Jan 28, 2020
a1df393
fix up tests
charlesdong1991 Jan 28, 2020
8743c47
preserve order
charlesdong1991 Jan 28, 2020
1e10d71
add comment
charlesdong1991 Jan 28, 2020
905b3a5
better test
charlesdong1991 Jan 28, 2020
c5d670b
fixup
charlesdong1991 Jan 28, 2020
916d9b2
remove blank line
charlesdong1991 Jan 28, 2020
2208fc2
fix test
charlesdong1991 Jan 28, 2020
86a254c
style
charlesdong1991 Jan 28, 2020
c36d97b
linting
charlesdong1991 Jan 28, 2020
3f8ea8f
wip
TomAugspurger Jan 29, 2020
a6ad1a2
alternative
TomAugspurger Jan 29, 2020
c7daa46
Merge remote-tracking branch 'upstream/master' into fix_issue_31256
TomAugspurger Jan 29, 2020
c4ebfa9
Fixups
TomAugspurger Jan 29, 2020
bbad886
revert extranesou
TomAugspurger Jan 29, 2020
a6a498e
non-numeric
TomAugspurger Jan 29, 2020
2a3f5a2
xfailing test
TomAugspurger Jan 29, 2020
ceef95e
release note
TomAugspurger Jan 29, 2020
ed91cc1
fixup
TomAugspurger Jan 29, 2020
9c7af0f
fixup
TomAugspurger Jan 29, 2020
ca35648
fixup
TomAugspurger Jan 29, 2020
1b826bb
fixup
TomAugspurger Jan 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -813,9 +813,10 @@ def _try_cast(self, result, obj, numeric_only: bool = False):
# datetime64tz is handled correctly in agg_series,
# so is excluded here.

# return the same type (Series) as our caller
cls = dtype.construct_array_type()
result = try_cast_to_ea(cls, result, dtype=dtype)
if len(result) and isinstance(result[0], dtype.type):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 it looks like this change caused the regression in #32194

cls = dtype.construct_array_type()
result = try_cast_to_ea(cls, result, dtype=dtype)

elif numeric_only and is_numeric_dtype(dtype) or not numeric_only:
result = maybe_downcast_to_dtype(result, dtype)

Expand Down
28 changes: 25 additions & 3 deletions pandas/tests/groupby/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -737,9 +737,10 @@ def test_preserve_on_ordered_ops(func, values):
df = pd.DataFrame({"payload": [-1, -2, -1, -2], "col": c})
g = df.groupby("payload")
result = getattr(g, func)()
expected = pd.DataFrame(
{"payload": [-2, -1], "col": pd.Series(values, dtype=c.dtype)}
).set_index("payload")
expected_col = pd.Categorical(values, dtype=c.dtype)
expected = pd.DataFrame({"payload": [-2, -1], "col": expected_col}).set_index(
"payload"
)
tm.assert_frame_equal(result, expected)


Expand Down Expand Up @@ -1342,3 +1343,24 @@ def test_series_groupby_categorical_aggregation_getitem():
result = groups["foo"].agg("mean")
expected = groups.agg("mean")["foo"]
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize(
"func, expected_values",
[(pd.Series.nunique, [1, 1, 2]), (pd.Series.count, [1, 2, 2])],
)
def test_groupby_agg_categorical_columns(func, expected_values):
# 31256
df = pd.DataFrame(
{
"id": [0, 1, 2, 3, 4],
"groups": [0, 1, 1, 2, 2],
"value": pd.Categorical([0, 0, 0, 0, 1]),
}
).set_index("id")
result = df.groupby("groups").agg(func)

expected = pd.DataFrame(
{"value": expected_values}, index=pd.Index([0, 1, 2], name="groups"),
)
tm.assert_frame_equal(result, expected)