Skip to content

BUG: groupby agg fails silently with mixed dtypes #43213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Sep 29, 2021
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
91aa285
BUG: groupby agg fails silently with mixed dtypes
debnathshoham Aug 25, 2021
9b9b7af
updated whatsnew
debnathshoham Aug 25, 2021
d968e57
added tests for var and std
debnathshoham Aug 25, 2021
46a29e0
reorganized tests
debnathshoham Aug 30, 2021
195db31
Merge branch 'master' into gh43209
debnathshoham Aug 30, 2021
16e28db
specified int64 in result
debnathshoham Aug 30, 2021
854ecda
added copy=False to astype
debnathshoham Aug 31, 2021
7dd27f3
updated whatsnew
debnathshoham Aug 31, 2021
309ee59
typo corrected
debnathshoham Aug 31, 2021
8d1bfb1
added issue ref
debnathshoham Sep 1, 2021
41471aa
added issue ref to test
debnathshoham Sep 2, 2021
3647f53
resolved conflict
debnathshoham Sep 5, 2021
d6992e5
reverted old; raising DataError as 1.2.5
debnathshoham Sep 5, 2021
ab1fd87
used _selected_obj instead of mgr
debnathshoham Sep 5, 2021
23d2989
Merge branch 'master' into gh43209
debnathshoham Sep 5, 2021
9039124
=0
debnathshoham Sep 5, 2021
d75c57b
merge master
debnathshoham Sep 7, 2021
89a6bc7
Merge branch 'master' into gh43209
debnathshoham Sep 8, 2021
64ca85a
draft
debnathshoham Sep 9, 2021
8a0f5fe
Merge branch 'master' into gh43209
debnathshoham Sep 9, 2021
4ae0d6b
cast mi groupbysum
debnathshoham Sep 9, 2021
ac51170
Merge branch 'master' into gh43209
debnathshoham Sep 9, 2021
572f23c
dropped na
debnathshoham Sep 10, 2021
183f245
Merge branch 'master' into gh43209
debnathshoham Sep 10, 2021
62b5aac
Merge branch 'master' into gh43209
debnathshoham Sep 10, 2021
625a751
try casting in _wrap_agged_manager
debnathshoham Sep 11, 2021
9b0acd7
added test axis=1
debnathshoham Sep 11, 2021
4b4618a
Merge branch 'master' into gh43209
debnathshoham Sep 11, 2021
265b3bb
overrid int64; failing in 32bit
debnathshoham Sep 11, 2021
a55211a
Merge branch 'master' into gh43209
debnathshoham Sep 13, 2021
753c7df
updated whatsnew
debnathshoham Sep 13, 2021
de86f72
astype for std
debnathshoham Sep 13, 2021
2a29451
smaller patch for 1.3.x
debnathshoham Sep 14, 2021
a37f1f9
resolved merge master conflict
debnathshoham Sep 14, 2021
00838be
changes wrt 1.2.5
debnathshoham Sep 16, 2021
a04d021
Merge branch 'master' into gh43209
debnathshoham Sep 16, 2021
c9d6658
Merge branch 'master' into gh43209
debnathshoham Sep 24, 2021
5db7155
Merge branch 'master' into gh43209
debnathshoham Sep 25, 2021
5ccf385
Merge branch 'master' into gh43209
debnathshoham Sep 27, 2021
9afd465
Merge branch 'master' into gh43209
debnathshoham Sep 29, 2021
600b71e
removed comments highlighting diff with 1.2.5 from test
debnathshoham Sep 29, 2021
fe13278
removed comments from test2
debnathshoham Sep 29, 2021
1aaf326
moved tests to test_aggregate
debnathshoham Sep 29, 2021
4ba8511
Merge branch 'master' into gh43209
debnathshoham Sep 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ including other versions of pandas.

Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :meth:`.GroupBy.agg` where it was failing silently with mixed data types along ``axis=1`` and :class:`MultiIndex` (:issue:`43209`)
- Fixed regression in :meth:`merge` with integer and ``NaN`` keys failing with ``outer`` merge (:issue:`43550`)
- Fixed regression in :meth:`DataFrame.corr` raising ``ValueError`` with ``method="spearman"`` on 32-bit platforms (:issue:`43588`)
- Fixed performance regression in :meth:`MultiIndex.equals` (:issue:`43549`)
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1213,7 +1213,10 @@ def _resolve_numeric_only(self, numeric_only: bool | lib.NoDefault) -> bool:
numeric_only = True
# GH#42395 GH#43108 GH#43154
# Regression from 1.2.5 to 1.3 caused object columns to be dropped
obj = self._obj_with_exclusions
if self.axis:
obj = self._obj_with_exclusions.T
else:
obj = self._obj_with_exclusions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im a little late here, but could potentially use _get_data_to_aggregate here

check = obj._get_numeric_data()
if len(obj.columns) and not len(check.columns) and not obj.empty:
numeric_only = False
Expand Down
50 changes: 50 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2509,3 +2509,53 @@ def test_rolling_wrong_param_min_period():
result_error_msg = r"__init__\(\) got an unexpected keyword argument 'min_period'"
with pytest.raises(TypeError, match=result_error_msg):
test_df.groupby("name")["val"].rolling(window=2, min_period=1).sum()


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move these to pandas/tests/groupby/aggregate/test_aggreagte.py (or maybe test_cython) see where similar are.

"func, expected, dtype, result_dtype_dict",
[
("sum", [5, 7, 9], "int64", {}),
("std", [4.5 ** 0.5] * 3, int, {"i": float, "j": float, "k": float}),
("var", [4.5] * 3, int, {"i": float, "j": float, "k": float}),
("sum", [5, 7, 9], "Int64", {"j": "int64"}),
("std", [4.5 ** 0.5] * 3, "Int64", {"i": float, "j": float, "k": float}),
("var", [4.5] * 3, "Int64", {"i": "float64", "j": "float64", "k": "float64"}),
],
)
def test_multiindex_groupby_mixed_cols_axis1(func, expected, dtype, result_dtype_dict):
# GH#43209
df = DataFrame(
[[1, 2, 3, 4, 5, 6]] * 3,
columns=MultiIndex.from_product([["a", "b"], ["i", "j", "k"]]),
).astype({("a", "j"): dtype, ("b", "j"): dtype})
result = df.groupby(level=1, axis=1).agg(func)
expected = DataFrame([expected] * 3, columns=["i", "j", "k"]).astype(
result_dtype_dict
)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"func, expected_data, result_dtype_dict",
[
("sum", [[2, 4], [10, 12], [18, 20]], {10: "int64", 20: "int64"}),
# std should ideally return Int64 / Float64 #43330
("std", [[2 ** 0.5] * 2] * 3, "float64"),
("var", [[2] * 2] * 3, {10: "float64", 20: "float64"}),
],
)
def test_groupby_mixed_cols_axis1(func, expected_data, result_dtype_dict):
# GH#43209
df = DataFrame(
np.arange(12).reshape(3, 4),
index=Index([0, 1, 0], name="y"),
columns=Index([10, 20, 10, 20], name="x"),
dtype="int64",
).astype({10: "Int64"})
result = df.groupby("x", axis=1).agg(func)
expected = DataFrame(
data=expected_data,
index=Index([0, 1, 0], name="y"),
columns=Index([10, 20], name="x"),
).astype(result_dtype_dict)
tm.assert_frame_equal(result, expected)