Skip to content

added test to indexing on groupby, #32464 #44046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from
Closed
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -1167,6 +1167,38 @@ def test_groupby_sum_below_mincount_nullable_integer():
tm.assert_frame_equal(result, expected)


def test_if_is_multiindex():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename to something like test_empty_multiindex

# GH 32464
# Test if index after groupby with more then one column is always MultiIndex
a = DataFrame({"a": [1, 2], "b": [5, 6], "c": [8, 9]})

agg_1 = a.groupby(["a", "b"]).sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, assert the index of these two results independently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't get what you mean...
You mean something like this?

tm.assert_index_equal(a.groupby(["a", "b", "c"]).sum().index, pd.MultiIndex.from_arrays([[], [], []], names=("a", "b", "c")))
tm.assert_index_equal(a.groupby(["a", "b"]).sum().index, pd.MultiIndex.from_arrays([[], []], names=("a", "b")))

I tried but it doesn't work, it says AssertionError: MultiIndex level [0] are different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I tried this and it passed:

def test_if_is_multiindex():
    # GH 32464
    # Test if index after groupby with more then one column is always MultiIndex
    a = DataFrame({"a": [1, 2], "b": [5, 6], "c": [8, 9]})

    result = a.groupby(["a", "b"]).sum().index
    expected = pd.MultiIndex.from_arrays([[1,2],[5,6]], names=("a", "b"))
    
    tm.assert_index_equal(result, expected)

    result = a.groupby(["a", "b", "c"]).sum().index
    expected = pd.MultiIndex.from_arrays([[1,2],[5,6],[8,9]], names=("a", "b", "c"))

    tm.assert_index_equal(result, expected)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can build the index using the pd.MultiIndex constructor directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just test in the case of an empty frame here - nonempty is covered elsewhere. Indeed it is tricky to construct an empty MultiIndex of the expected type; this works:

df = DataFrame({"a": [1], "b": [2], "c": [3]}).set_index(['a', 'b', 'c'])
result = df.groupby(["a", "b", "c"]).sum().index[:0]
expected = df.index[:0]
tm.assert_index_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrieldi95 can you replace the test with basically this example

agg_2 = a.groupby(["a", "b", "c"]).sum()

expected = MultiIndex(
levels=[[1, 2], [5, 6]], codes=[[0, 1], [0, 1]], names=["a", "b"]
)
result = agg_1.index

tm.assert_index_equal(expected, result)

expected = MultiIndex(
levels=[[1, 2], [5, 6], [8, 9]],
codes=[[0, 1], [0, 1], [0, 1]],
names=["a", "b", "c"],
)
result = agg_2.index

# Tests if group by with all columns has a MultiIndex
tm.assert_index_equal(expected, result)

index_1 = agg_1.iloc[:, :0].index
index_2 = agg_2.droplevel("c").index

# Tests if both agreggations have multiindex
tm.assert_index_equal(index_1, index_2)


def test_mean_on_timedelta():
# GH 17382
df = DataFrame({"time": pd.to_timedelta(range(10)), "cat": ["A", "B"] * 5})
Expand Down