-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: groupby agg fails silently with mixed dtypes #43213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 42 commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
91aa285
BUG: groupby agg fails silently with mixed dtypes
debnathshoham 9b9b7af
updated whatsnew
debnathshoham d968e57
added tests for var and std
debnathshoham 46a29e0
reorganized tests
debnathshoham 195db31
Merge branch 'master' into gh43209
debnathshoham 16e28db
specified int64 in result
debnathshoham 854ecda
added copy=False to astype
debnathshoham 7dd27f3
updated whatsnew
debnathshoham 309ee59
typo corrected
debnathshoham 8d1bfb1
added issue ref
debnathshoham 41471aa
added issue ref to test
debnathshoham 3647f53
resolved conflict
debnathshoham d6992e5
reverted old; raising DataError as 1.2.5
debnathshoham ab1fd87
used _selected_obj instead of mgr
debnathshoham 23d2989
Merge branch 'master' into gh43209
debnathshoham 9039124
=0
debnathshoham d75c57b
merge master
debnathshoham 89a6bc7
Merge branch 'master' into gh43209
debnathshoham 64ca85a
draft
debnathshoham 8a0f5fe
Merge branch 'master' into gh43209
debnathshoham 4ae0d6b
cast mi groupbysum
debnathshoham ac51170
Merge branch 'master' into gh43209
debnathshoham 572f23c
dropped na
debnathshoham 183f245
Merge branch 'master' into gh43209
debnathshoham 62b5aac
Merge branch 'master' into gh43209
debnathshoham 625a751
try casting in _wrap_agged_manager
debnathshoham 9b0acd7
added test axis=1
debnathshoham 4b4618a
Merge branch 'master' into gh43209
debnathshoham 265b3bb
overrid int64; failing in 32bit
debnathshoham a55211a
Merge branch 'master' into gh43209
debnathshoham 753c7df
updated whatsnew
debnathshoham de86f72
astype for std
debnathshoham 2a29451
smaller patch for 1.3.x
debnathshoham a37f1f9
resolved merge master conflict
debnathshoham 00838be
changes wrt 1.2.5
debnathshoham a04d021
Merge branch 'master' into gh43209
debnathshoham c9d6658
Merge branch 'master' into gh43209
debnathshoham 5db7155
Merge branch 'master' into gh43209
debnathshoham 5ccf385
Merge branch 'master' into gh43209
debnathshoham 9afd465
Merge branch 'master' into gh43209
debnathshoham 600b71e
removed comments highlighting diff with 1.2.5 from test
debnathshoham fe13278
removed comments from test2
debnathshoham 1aaf326
moved tests to test_aggregate
debnathshoham 4ba8511
Merge branch 'master' into gh43209
debnathshoham File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2509,3 +2509,53 @@ def test_rolling_wrong_param_min_period(): | |
result_error_msg = r"__init__\(\) got an unexpected keyword argument 'min_period'" | ||
with pytest.raises(TypeError, match=result_error_msg): | ||
test_df.groupby("name")["val"].rolling(window=2, min_period=1).sum() | ||
|
||
|
||
@pytest.mark.parametrize( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. move these to pandas/tests/groupby/aggregate/test_aggreagte.py (or maybe test_cython) see where similar are. |
||
"func, expected, dtype, result_dtype_dict", | ||
[ | ||
("sum", [5, 7, 9], "int64", {}), | ||
("std", [4.5 ** 0.5] * 3, int, {"i": float, "j": float, "k": float}), | ||
("var", [4.5] * 3, int, {"i": float, "j": float, "k": float}), | ||
("sum", [5, 7, 9], "Int64", {"j": "int64"}), | ||
("std", [4.5 ** 0.5] * 3, "Int64", {"i": float, "j": float, "k": float}), | ||
("var", [4.5] * 3, "Int64", {"i": "float64", "j": "float64", "k": "float64"}), | ||
], | ||
) | ||
def test_multiindex_groupby_mixed_cols_axis1(func, expected, dtype, result_dtype_dict): | ||
# GH#43209 | ||
df = DataFrame( | ||
[[1, 2, 3, 4, 5, 6]] * 3, | ||
columns=MultiIndex.from_product([["a", "b"], ["i", "j", "k"]]), | ||
).astype({("a", "j"): dtype, ("b", "j"): dtype}) | ||
result = df.groupby(level=1, axis=1).agg(func) | ||
expected = DataFrame([expected] * 3, columns=["i", "j", "k"]).astype( | ||
result_dtype_dict | ||
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"func, expected_data, result_dtype_dict", | ||
[ | ||
("sum", [[2, 4], [10, 12], [18, 20]], {10: "int64", 20: "int64"}), | ||
# std should ideally return Int64 / Float64 #43330 | ||
("std", [[2 ** 0.5] * 2] * 3, "float64"), | ||
("var", [[2] * 2] * 3, {10: "float64", 20: "float64"}), | ||
], | ||
) | ||
def test_groupby_mixed_cols_axis1(func, expected_data, result_dtype_dict): | ||
# GH#43209 | ||
df = DataFrame( | ||
np.arange(12).reshape(3, 4), | ||
index=Index([0, 1, 0], name="y"), | ||
columns=Index([10, 20, 10, 20], name="x"), | ||
dtype="int64", | ||
).astype({10: "Int64"}) | ||
result = df.groupby("x", axis=1).agg(func) | ||
expected = DataFrame( | ||
data=expected_data, | ||
index=Index([0, 1, 0], name="y"), | ||
columns=Index([10, 20], name="x"), | ||
).astype(result_dtype_dict) | ||
tm.assert_frame_equal(result, expected) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im a little late here, but could potentially use _get_data_to_aggregate here