Skip to content

BUG: loc with empty multiindex raises exception #38711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 28, 2020
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions pandas/tests/indexing/multiindex/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -695,3 +695,22 @@ def test_loc_getitem_index_differently_ordered_slice_none():
columns=["a", "b"],
)
tm.assert_frame_equal(result, expected)


def test_loc_empty_multiindex():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you co-locate this with similar tests

jreback@lhs2:~/pandas-dev$ grep empty pandas/tests/indexing/multiindex/*
pandas/tests/indexing/multiindex/test_getitem.py:def test_frame_getitem_multicolumn_empty_level():
pandas/tests/indexing/multiindex/test_getitem.py:def test_frame_mi_empty_slice():
pandas/tests/indexing/multiindex/test_loc.py:        empty = Series(data=[], dtype=np.float64)
pandas/tests/indexing/multiindex/test_loc.py:        result = x.loc[empty]
pandas/tests/indexing/multiindex/test_loc.py:        # empty array:
pandas/tests/indexing/multiindex/test_loc.py:        empty = np.array([])
pandas/tests/indexing/multiindex/test_loc.py:        result = x.loc[empty]
pandas/tests/indexing/multiindex/test_loc.py:        ([], []),  # empty ok
pandas/tests/indexing/multiindex/test_loc.py:def test_loc_getitem_duplicates_multiindex_empty_indexer(columns_indexer):
pandas/tests/indexing/multiindex/test_loc.py:    # empty indexer

IOW there are 2 tests in the test_getitem.py that are testing basically the same, so locate there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the test to pandas/tests/indexing/multiindex/test_getitem.py

# GH#36936
arrays = [["a", "a", "b", "a"], ["a", "a", "b", "b"]]
index = MultiIndex.from_arrays(arrays, names=("idx1", "idx2"))
df = DataFrame([1, 2, 3, 4], index=index, columns=["value"])

# loc on empty multiindex == loc with False mask
empty_multiindex = df.loc[df.loc[:, "value"] == 0, :].index
result = df.loc[empty_multiindex, :]
expected = df.loc[[False] * len(df.index), :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you construct the expected frame explicityly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constructing an empty DataFrame named expected with Multiindex assigns the expected.index.inferred_type attribute to object by default.
The value of result.index.inferred_type attribute is string as inferred from the index variable created earlier.
This fails the tm.assert_frame_equal test

I used the following code to create an empty dataframe with multiindex:

def test_loc_empty_multiindex():
    # GH#36936
    arrays = [["a", "a", "b", "a"], ["a", "a", "b", "b"]]
    index = MultiIndex.from_arrays(arrays, names=("idx1", "idx2"))
    df = DataFrame([1, 2, 3, 4], index=index, columns=["value"])

    # loc on empty multiindex == loc with False mask
    empty_multiindex = df.loc[df.loc[:, "value"] == 0, :].index
    result = df.loc[empty_multiindex, :]
    index2 = MultiIndex(levels=[[], []], codes=[[], []], names=["idx1", "idx2"])
    expected = DataFrame([], index=index2, columns=["value"], dtype="string")
    expected = expected.astype({"value": "int"})
    tm.assert_frame_equal(result, expected) # this test fails

    # replacing value with loc on empty multiindex
    df.loc[df.loc[df.loc[:, "value"] == 0].index, "value"] = 5
    result = df
    expected = DataFrame([1, 2, 3, 4], index=index, columns=["value"])
    tm.assert_frame_equal(result, expected)

If the index variable is used directly to create the empty dataframe, it introduces NaN values

tm.assert_equal(result, expected)

# replacing value with loc on empty multiindex
df.loc[df.loc[df.loc[:, "value"] == 0].index, "value"] = 5
result = df
expected = DataFrame([1, 2, 3, 4], index=index, columns=["value"])
tm.assert_equal(result, expected)
Copy link
Member

@MarcoGorelli MarcoGorelli Dec 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does tm.assert_frame_equal not work here? (and above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion method is changed to tm.assert_frame_equal in the most recent commit.