-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TST: Multiindex slicing with NaNs, unexpected results for #25154 #39356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
a9a4ff4
c0cb975
da9a5f0
df14980
071256c
548ca8a
ada8989
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -229,6 +229,26 @@ def test_frame_getitem_nan_multiindex(nulls_fixture): | |
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_frame_getitem_nan_cols_multiindex(nulls_fixture): | ||
# Slicing MultiIndex including levels with nan values, for more information | ||
# see GH#25154 | ||
data = [[1, 2, 3], [4, 5, 6]] | ||
index = ["First", nulls_fixture] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Define the index only once if it is the same. You could remove the other definitions and write the data directly into the object. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comments. |
||
columns = MultiIndex.from_tuples([("a", "foo"), ("b", "foo"), ("b", nulls_fixture)]) | ||
df = DataFrame(data=data, columns=columns, index=index, dtype="int64") | ||
|
||
# Slicing out 'b', ['foo', nan] | ||
cols = (["b"], ["foo", nulls_fixture]) | ||
result = df.loc[:, cols] | ||
expected_columns = MultiIndex.from_tuples([("b", "foo"), ("b", nulls_fixture)]) | ||
expected_index = ["First", nulls_fixture] | ||
expected = DataFrame( | ||
[[2, 3], [5, 6]], columns=expected_columns, index=expected_index, dtype="int64" | ||
) | ||
|
||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
# ---------------------------------------------------------------------------- | ||
# test indexing of DataFrame with multi-level Index with duplicates | ||
# ---------------------------------------------------------------------------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add all of the cases in the original post (working an non-working), there are 5 cases i think. pls parameterize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. The additional cases were added and parameterized.
Keyword argument
check_column_type
was needed when slicing with(["b"], [np.nan]),
in test case 5, the reason is explained below:Test Case 5:
Slicing out
(["b"], [np.nan])
:When asserting the types of the columns of the actual and expected result for the second level of the multiindex, on
_check_types
ofassert_index_equal
onasserters.py
the left argument that is based on the sliced multiindex (argumentleft
in_check_types
) is:Index(['bar', 'foo'], dtype='object')
while the right argument that is based on the expected result is:
Index([], dtype='object')
Based on that the left has "string" as inferred type, while the right one is empty which causes the test to fail, even though the resulting dataframes are identical.
I think this is because when slicing a dataframe with a multi-index the resulting levels of columns are the initial ones present in the original dataframe and are not updated.
In order to avoid that comparison I passed
check_column_type=False
as keyword argument.