Skip to content

BUG: Added test cases to check loc on multiindex with NaNs #29751 #38772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 7, 2021
22 changes: 22 additions & 0 deletions pandas/tests/indexing/multiindex/test_getitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,3 +279,25 @@ def test_loc_empty_multiindex():
result = df
expected = DataFrame([1, 2, 3, 4], index=index, columns=["value"])
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("idx1,idx2", [(0, 1), (1, 4), (1, 3)])
def test_loc_nan_multiindex(nulls_fixture, idx1, idx2):
# GH#29751
# loc on a multiindex containing nan values
arr = [
[11, nulls_fixture, 13],
[21, nulls_fixture, 23],
[31, nulls_fixture, 33],
[41, nulls_fixture, 43],
[51, nulls_fixture, 53],
]
cols = ["a", "b", "c"]
df = DataFrame(arr, columns=cols, dtype="int64").set_index(["a", "b"])
start = df.index[idx1]
end = df.index[idx2]
result = df.loc[start:end, :]
expected = DataFrame(arr[idx1 : (idx2 + 1)], columns=cols, dtype="int64").set_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you assert each of the 4 sliced results in the OP. also pls construct the expected by hard coding as much as possible (clearly the nan needs to come from the fixtures), but hard code the actual values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original post contains the first two sliced results (a.loc) as the reference for the next two sliced results (b.loc). I have replaced the previous tests with the two relevant sliced tests from the original post.

I also had to remove the parameterized test cases because the mypy type checker fails when using a tuple to slice a dataframe. The code I used is as follows:

@pytest.mark.parametrize(
    "indexer,expected_arr",
    [
        (
            lambda df, null: df.loc[:(21, null)],
            lambda null: [[11, null, 13], [21, null, 23]]
        ),
        (
            lambda df, null: df.loc[(21, null):],
            lambda null: [[21, null, 23], [31, null, 33], [41, null, 43]]
        ),
        (
            lambda df, null: df.loc[(21, null):(31, null)],
            lambda null: [[21, null, 23], [31, null, 33]]
        ),
    ],
)
def test_frame_getitem_nan_multiindex(nulls_fixture, indexer, expected_arr):
    # GH#29751
    # loc on a multiindex containing nan values
    arr = [
        [11, nulls_fixture, 13],
        [21, nulls_fixture, 23],
        [31, nulls_fixture, 33],
        [41, nulls_fixture, 43]
    ]
    cols = ["a", "b", "c"]
    df = DataFrame(arr, columns=cols, dtype="int64").set_index(["a", "b"])

    result = indexer(df, nulls_fixture)
    arr1 = expected_arr(nulls_fixture)
    expected = DataFrame(arr1, columns=cols, dtype="int64").set_index(
        ["a", "b"]
    )
    tm.assert_frame_equal(result, expected)

The mypy type checker failed on lines 5, 9, & 13.
I would love to hear any workarounds for multiindex slicing to parameterize the tests.
Instead, I assigned the tuple to a variable and used it to slice the dataframe which passed the mypy type check.

Also, I hardcoded the expected array and the index tuple as requested.

["a", "b"]
)
tm.assert_frame_equal(result, expected)