-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Added test cases to check loc on multiindex with NaNs #29751 #38772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
expected = DataFrame(arr[:2], columns=cols, dtype="int").set_index(["a", "b"]) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
result = df.loc[idx:, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you try to parametrize with slices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have parametrized the slices in the recent commit
@@ -279,3 +280,32 @@ def test_loc_empty_multiindex(): | |||
result = df | |||
expected = DataFrame([1, 2, 3, 4], index=index, columns=["value"]) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize("nan", [np.nan, pd.NA, None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a fixture for all NaNs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nulls_fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the test case to use nulls_fixture in the most recent commit
[51, nan, 53], | ||
] | ||
cols = ["a", "b", "c"] | ||
df = DataFrame(arr, columns=cols, dtype="int").set_index(["a", "b"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use dtype='int64' to avoid the 32-bit failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed dtype to int64 in the recent commit
start = df.index[idx1] | ||
end = df.index[idx2] | ||
result = df.loc[start:end, :] | ||
expected = DataFrame(arr[idx1 : (idx2 + 1)], columns=cols, dtype="int64").set_index( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you assert each of the 4 sliced results in the OP. also pls construct the expected by hard coding as much as possible (clearly the nan needs to come from the fixtures), but hard code the actual values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original post contains the first two sliced results (a.loc
) as the reference for the next two sliced results (b.loc
). I have replaced the previous tests with the two relevant sliced tests from the original post.
I also had to remove the parameterized test cases because the mypy type checker fails when using a tuple to slice a dataframe. The code I used is as follows:
@pytest.mark.parametrize(
"indexer,expected_arr",
[
(
lambda df, null: df.loc[:(21, null)],
lambda null: [[11, null, 13], [21, null, 23]]
),
(
lambda df, null: df.loc[(21, null):],
lambda null: [[21, null, 23], [31, null, 33], [41, null, 43]]
),
(
lambda df, null: df.loc[(21, null):(31, null)],
lambda null: [[21, null, 23], [31, null, 33]]
),
],
)
def test_frame_getitem_nan_multiindex(nulls_fixture, indexer, expected_arr):
# GH#29751
# loc on a multiindex containing nan values
arr = [
[11, nulls_fixture, 13],
[21, nulls_fixture, 23],
[31, nulls_fixture, 33],
[41, nulls_fixture, 43]
]
cols = ["a", "b", "c"]
df = DataFrame(arr, columns=cols, dtype="int64").set_index(["a", "b"])
result = indexer(df, nulls_fixture)
arr1 = expected_arr(nulls_fixture)
expected = DataFrame(arr1, columns=cols, dtype="int64").set_index(
["a", "b"]
)
tm.assert_frame_equal(result, expected)
The mypy type checker failed on lines 5, 9, & 13.
I would love to hear any workarounds for multiindex slicing to parameterize the tests.
Instead, I assigned the tuple to a variable and used it to slice the dataframe which passed the mypy type check.
Also, I hardcoded the expected array and the index tuple as requested.
thanks @kasim95 |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Added test cases to check loc on multiindex containing NaN values using
np.nan
,pd.NA
, andNone