-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
added test to indexing on groupby, #32464 #44046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
239d42a
b42f6e5
b2df552
57f86c3
c89f4ad
84e8330
4bf0563
be119e6
8bd532f
9ee9c8e
c1ed828
28fbe02
0df3f33
de65783
e22b830
cd2d3fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1167,6 +1167,38 @@ def test_groupby_sum_below_mincount_nullable_integer(): | |
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_if_is_multiindex(): | ||
# GH 32464 | ||
# Test if index after groupby with more then one column is always MultiIndex | ||
a = DataFrame({"a": [1, 2], "b": [5, 6], "c": [8, 9]}) | ||
|
||
agg_1 = a.groupby(["a", "b"]).sum() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead, assert the index of these two results independently. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I don't get what you mean... tm.assert_index_equal(a.groupby(["a", "b", "c"]).sum().index, pd.MultiIndex.from_arrays([[], [], []], names=("a", "b", "c")))
tm.assert_index_equal(a.groupby(["a", "b"]).sum().index, pd.MultiIndex.from_arrays([[], []], names=("a", "b"))) I tried but it doesn't work, it says There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But I tried this and it passed: def test_if_is_multiindex():
# GH 32464
# Test if index after groupby with more then one column is always MultiIndex
a = DataFrame({"a": [1, 2], "b": [5, 6], "c": [8, 9]})
result = a.groupby(["a", "b"]).sum().index
expected = pd.MultiIndex.from_arrays([[1,2],[5,6]], names=("a", "b"))
tm.assert_index_equal(result, expected)
result = a.groupby(["a", "b", "c"]).sum().index
expected = pd.MultiIndex.from_arrays([[1,2],[5,6],[8,9]], names=("a", "b", "c"))
tm.assert_index_equal(result, expected) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can build the index using the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you just test in the case of an empty frame here - nonempty is covered elsewhere. Indeed it is tricky to construct an empty MultiIndex of the expected type; this works:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gabrieldi95 can you replace the test with basically this example |
||
agg_2 = a.groupby(["a", "b", "c"]).sum() | ||
|
||
expected = MultiIndex( | ||
levels=[[1, 2], [5, 6]], codes=[[0, 1], [0, 1]], names=["a", "b"] | ||
) | ||
result = agg_1.index | ||
|
||
tm.assert_index_equal(expected, result) | ||
|
||
expected = MultiIndex( | ||
levels=[[1, 2], [5, 6], [8, 9]], | ||
codes=[[0, 1], [0, 1], [0, 1]], | ||
names=["a", "b", "c"], | ||
) | ||
result = agg_2.index | ||
|
||
# Tests if group by with all columns has a MultiIndex | ||
tm.assert_index_equal(expected, result) | ||
|
||
index_1 = agg_1.iloc[:, :0].index | ||
index_2 = agg_2.droplevel("c").index | ||
|
||
# Tests if both agreggations have multiindex | ||
tm.assert_index_equal(index_1, index_2) | ||
|
||
|
||
def test_mean_on_timedelta(): | ||
# GH 17382 | ||
df = DataFrame({"time": pd.to_timedelta(range(10)), "cat": ["A", "B"] * 5}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename to something like test_empty_multiindex