Skip to content

BUG: groupby agg fails silently with mixed dtypes #43213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Sep 29, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
91aa285
BUG: groupby agg fails silently with mixed dtypes
debnathshoham Aug 25, 2021
9b9b7af
updated whatsnew
debnathshoham Aug 25, 2021
d968e57
added tests for var and std
debnathshoham Aug 25, 2021
46a29e0
reorganized tests
debnathshoham Aug 30, 2021
195db31
Merge branch 'master' into gh43209
debnathshoham Aug 30, 2021
16e28db
specified int64 in result
debnathshoham Aug 30, 2021
854ecda
added copy=False to astype
debnathshoham Aug 31, 2021
7dd27f3
updated whatsnew
debnathshoham Aug 31, 2021
309ee59
typo corrected
debnathshoham Aug 31, 2021
8d1bfb1
added issue ref
debnathshoham Sep 1, 2021
41471aa
added issue ref to test
debnathshoham Sep 2, 2021
3647f53
resolved conflict
debnathshoham Sep 5, 2021
d6992e5
reverted old; raising DataError as 1.2.5
debnathshoham Sep 5, 2021
ab1fd87
used _selected_obj instead of mgr
debnathshoham Sep 5, 2021
23d2989
Merge branch 'master' into gh43209
debnathshoham Sep 5, 2021
9039124
=0
debnathshoham Sep 5, 2021
d75c57b
merge master
debnathshoham Sep 7, 2021
89a6bc7
Merge branch 'master' into gh43209
debnathshoham Sep 8, 2021
64ca85a
draft
debnathshoham Sep 9, 2021
8a0f5fe
Merge branch 'master' into gh43209
debnathshoham Sep 9, 2021
4ae0d6b
cast mi groupbysum
debnathshoham Sep 9, 2021
ac51170
Merge branch 'master' into gh43209
debnathshoham Sep 9, 2021
572f23c
dropped na
debnathshoham Sep 10, 2021
183f245
Merge branch 'master' into gh43209
debnathshoham Sep 10, 2021
62b5aac
Merge branch 'master' into gh43209
debnathshoham Sep 10, 2021
625a751
try casting in _wrap_agged_manager
debnathshoham Sep 11, 2021
9b0acd7
added test axis=1
debnathshoham Sep 11, 2021
4b4618a
Merge branch 'master' into gh43209
debnathshoham Sep 11, 2021
265b3bb
overrid int64; failing in 32bit
debnathshoham Sep 11, 2021
a55211a
Merge branch 'master' into gh43209
debnathshoham Sep 13, 2021
753c7df
updated whatsnew
debnathshoham Sep 13, 2021
de86f72
astype for std
debnathshoham Sep 13, 2021
2a29451
smaller patch for 1.3.x
debnathshoham Sep 14, 2021
a37f1f9
resolved merge master conflict
debnathshoham Sep 14, 2021
00838be
changes wrt 1.2.5
debnathshoham Sep 16, 2021
a04d021
Merge branch 'master' into gh43209
debnathshoham Sep 16, 2021
c9d6658
Merge branch 'master' into gh43209
debnathshoham Sep 24, 2021
5db7155
Merge branch 'master' into gh43209
debnathshoham Sep 25, 2021
5ccf385
Merge branch 'master' into gh43209
debnathshoham Sep 27, 2021
9afd465
Merge branch 'master' into gh43209
debnathshoham Sep 29, 2021
600b71e
removed comments highlighting diff with 1.2.5 from test
debnathshoham Sep 29, 2021
fe13278
removed comments from test2
debnathshoham Sep 29, 2021
1aaf326
moved tests to test_aggregate
debnathshoham Sep 29, 2021
4ba8511
Merge branch 'master' into gh43209
debnathshoham Sep 29, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Fixed regressions
- Fixed regression in :class:`DataFrame` constructor failing to broadcast for defined :class:`Index` and len one list of :class:`Timestamp` (:issue:`42810`)
- Performance regression in :meth:`core.window.ewm.ExponentialMovingWindow.mean` (:issue:`42333`)
- Fixed regression in :meth:`.GroupBy.agg` incorrectly raising in some cases (:issue:`42390`)
- Fixed regression in :meth:`.GroupBy.agg` where it was failing silently with mixed data types (:issue:`43209`)
- Fixed regression in :meth:`RangeIndex.where` and :meth:`RangeIndex.putmask` raising ``AssertionError`` when result did not represent a :class:`RangeIndex` (:issue:`43240`)

.. ---------------------------------------------------------------------------
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
doc,
)

from pandas.core.dtypes.cast import find_common_type
from pandas.core.dtypes.common import (
ensure_int64,
is_bool,
Expand Down Expand Up @@ -1056,7 +1057,8 @@ def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs)
def _iterate_slices(self) -> Iterable[Series]:
obj = self._selected_obj
if self.axis == 1:
obj = obj.T
transposed_dtype = find_common_type(obj.dtypes.tolist())
obj = obj.T.astype(transposed_dtype, copy=False)

if isinstance(obj, Series) and obj.name not in self.exclusions:
# Occurs when doing DataFrameGroupBy(...)["X"]
Expand Down Expand Up @@ -1597,7 +1599,8 @@ def _gotitem(self, key, ndim: int, subset=None):
def _get_data_to_aggregate(self) -> Manager2D:
obj = self._obj_with_exclusions
if self.axis == 1:
return obj.T._mgr
transposed_dtype = find_common_type(obj.dtypes.tolist())
return obj.T.astype(transposed_dtype, copy=False)._mgr
else:
return obj._mgr

Expand Down
23 changes: 23 additions & 0 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2419,3 +2419,26 @@ def test_rolling_wrong_param_min_period():
result_error_msg = r"__init__\(\) got an unexpected keyword argument 'min_period'"
with pytest.raises(TypeError, match=result_error_msg):
test_df.groupby("name")["val"].rolling(window=2, min_period=1).sum()


@pytest.mark.parametrize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move these to pandas/tests/groupby/aggregate/test_aggreagte.py (or maybe test_cython) see where similar are.

"func, expected, dtype, result_dtype",
[
("sum", [5, 7, 9], int, "int64"),
("std", [4.5 ** 0.5] * 3, int, float),
("var", [4.5] * 3, int, float),
("sum", [5, 7, 9], "Int64", "Int64"),
# result_dtype should ideally be Float64
("std", [4.5 ** 0.5] * 3, "Int64", float),
("var", [4.5] * 3, "Int64", "Float64"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually casts to Float64? what were the results in 1.2.5? we should match them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. 1.2.5 was int64, float64, int64
I will try to do that (didn't get a chance yet)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, yeah we want to have an exact match to 1.2.5; this can be changed going forwawrd for 1.4, but not on a regression patch.

],
)
def test_multiindex_groupby(func, expected, dtype, result_dtype):
# GH#43209
df = DataFrame(
[[1, 2, 3, 4, 5, 6]] * 3,
columns=MultiIndex.from_product([["a", "b"], ["i", "j", "k"]]),
).astype({("a", "j"): dtype, ("b", "j"): dtype})
result = df.groupby(level=1, axis=1).agg(func)
expected = DataFrame([expected] * 3, columns=["i", "j", "k"], dtype=result_dtype)
tm.assert_frame_equal(result, expected)