-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
added test to indexing on groupby, #32464 #44046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a = pd.DataFrame({"a": [], "b": [], "c": []}) | ||
|
||
index_1 = a.groupby(["a", "b"]).sum().index | ||
index_2 = a.groupby(["a", "b", "c"]).sum().index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead could you construct the full DataFrame result of a.groupby(["a", "b", "c"]).sum()
and a.groupby(["a", "b"]).sum()
and use tm.assert_frame_equal
?
Also could you move this test to a more appropriate groupby testing file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mroeschke, I changed it so it uses tm.assert_frame_equal
, but I'm not sure what the appropriate testing file would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably pandas/tests/groupby/test_function.py
I tried using a = DataFrame({"a": [], "b": [], "c": []})
result = a.groupby(["a", "b", "c"]).sum()
expected_index = MultiIndex.from_arrays([[], [], []], names=("a", "b", "c"))
expected = DataFrame(
np.ndarray((0, 0)), dtype="float64", index=expected_index, columns=Index([])
)
# Tests if groupby with all columns has a multiindex
tm.assert_frame_equal(result, expected) But it didn't work, so I'm keeping just the other assertion, that tests if both groupby's are equal. |
a = DataFrame({"a": [], "b": [], "c": []}) | ||
|
||
agg_1 = a.groupby(["a", "b"]).sum().iloc[:, :0] | ||
agg_2 = a.groupby(["a", "b", "c"]).sum().droplevel("c") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay could you change the assertion back to testing the MultiIndex
result if the new assertion can't test the whole frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Like this?
# GH 32464
# Test if index after groupby with more then one column is always MultiIndex
a = DataFrame({"a": [], "b": [], "c": []})
agg_1 = a.groupby(["a", "b"]).sum()
agg_2 = a.groupby(["a", "b", "c"]).sum()
# Tests if group by with all columns has a MultiIndex
assert isinstance(agg_2.index, pd.core.indexes.multi.MultiIndex)
result = agg_1.iloc[:, :0]
expected = agg_2.droplevel("c")
# Tests if both agreggations have multiindex
tm.assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use tm.assert_index_equal
agg_2 = a.groupby(["a", "b", "c"]).sum() | ||
|
||
# Tests if group by with all columns has a MultiIndex | ||
assert isinstance(agg_2.index, pd.core.indexes.multi.MultiIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert_index_equal
already checks this
# Test if index after groupby with more then one column is always MultiIndex | ||
a = DataFrame({"a": [], "b": [], "c": []}) | ||
|
||
agg_1 = a.groupby(["a", "b"]).sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, assert the index of these two results independently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get what you mean...
You mean something like this?
tm.assert_index_equal(a.groupby(["a", "b", "c"]).sum().index, pd.MultiIndex.from_arrays([[], [], []], names=("a", "b", "c")))
tm.assert_index_equal(a.groupby(["a", "b"]).sum().index, pd.MultiIndex.from_arrays([[], []], names=("a", "b")))
I tried but it doesn't work, it says AssertionError: MultiIndex level [0] are different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I tried this and it passed:
def test_if_is_multiindex():
# GH 32464
# Test if index after groupby with more then one column is always MultiIndex
a = DataFrame({"a": [1, 2], "b": [5, 6], "c": [8, 9]})
result = a.groupby(["a", "b"]).sum().index
expected = pd.MultiIndex.from_arrays([[1,2],[5,6]], names=("a", "b"))
tm.assert_index_equal(result, expected)
result = a.groupby(["a", "b", "c"]).sum().index
expected = pd.MultiIndex.from_arrays([[1,2],[5,6],[8,9]], names=("a", "b", "c"))
tm.assert_index_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can build the index using the pd.MultiIndex
constructor directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just test in the case of an empty frame here - nonempty is covered elsewhere. Indeed it is tricky to construct an empty MultiIndex of the expected type; this works:
df = DataFrame({"a": [1], "b": [2], "c": [3]}).set_index(['a', 'b', 'c'])
result = df.groupby(["a", "b", "c"]).sum().index[:0]
expected = df.index[:0]
tm.assert_index_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabrieldi95 can you replace the test with basically this example
can you merge master and update to comments |
Yes, I was working on other stuff, will work on it now! |
Hm I don't really know why the check fail, can someone help me? |
can you merge master again and ping on green |
@jreback can you help me with the error? |
is worthwhile but needs to be passing (and pls merge master) |
@gabrieldi95 - The CI failure appears unrelated to me; can you try merging master again |
@gabrieldi95 can you merge master and address comments |
Closing to take this off the queue. @gabrieldi95 please let me know if you'd like to continue this! |
@rhshadrach sorry for taking so long! Yes, I would like to continue |
@@ -1167,6 +1167,38 @@ def test_groupby_sum_below_mincount_nullable_integer(): | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
def test_if_is_multiindex(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename to something like test_empty_multiindex
can you merge master |
can you merge master and update to comments |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in main and address the review and we can reopen. |
This makes sure an index of a groupby with more then one column is always a MultiIndex.