-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: Multiindex slicing with NaNs, unexpected results for #25154 #39356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx :)
# Slicing MultiIndex including levels with nan values, for more information | ||
# see GH#25154 | ||
data = [[1, 2, 3], [4, 5, 6]] | ||
index = ["First", nulls_fixture] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define the index only once if it is the same. You could remove the other definitions and write the data directly into the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments.
Suggested changes were implemented.
I see that one python test, namely |
This test is unreliable, you can ignore this one |
@@ -229,6 +229,30 @@ def test_frame_getitem_nan_multiindex(nulls_fixture): | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
def test_frame_getitem_nan_cols_multiindex(nulls_fixture): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add all of the cases in the original post (working an non-working), there are 5 cases i think. pls parameterize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. The additional cases were added and parameterized.
Keyword argument check_column_type
was needed when slicing with (["b"], [np.nan]),
in test case 5, the reason is explained below:
Test Case 5:
Slicing out (["b"], [np.nan])
:
When asserting the types of the columns of the actual and expected result for the second level of the multiindex, on _check_types
of assert_index_equal
on asserters.py
the left argument that is based on the sliced multiindex (argument left
in _check_types
) is:
- left:
Index(['bar', 'foo'], dtype='object')
while the right argument that is based on the expected result is:
- right:
Index([], dtype='object')
Based on that the left has "string" as inferred type, while the right one is empty which causes the test to fail, even though the resulting dataframes are identical.
I think this is because when slicing a dataframe with a multi-index the resulting levels of columns are the initial ones present in the original dataframe and are not updated.
In order to avoid that comparison I passed check_column_type=False
as keyword argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry 6 cases
if isinstance(result, DataFrame): | ||
tm.assert_frame_equal(result, expected, check_column_type=False) | ||
elif isinstance(result, Series): | ||
tm.assert_series_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_equal(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but unfortunately I couldn't pass check_column_type=False
because it's not accepted in assert_series_equal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use
MultiIndex(codes=[[1], [-1]], levels=[["a", "b"], ["bar", "foo"]])
to create your columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion. Requested changes were implemented.
@@ -87,7 +86,8 @@ def test_series_getitem_returns_scalar( | |||
(lambda s: s[(2000, 3, 4)], KeyError, r"^\(2000, 3, 4\)$"), | |||
(lambda s: s.loc[(2000, 3, 4)], KeyError, r"^\(2000, 3, 4\)$"), | |||
(lambda s: s.loc[(2000, 3, 4, 5)], IndexingError, "Too many indexers"), | |||
(lambda s: s.__getitem__(len(s)), KeyError, ""), # match should include len(s) | |||
(lambda s: s.__getitem__(len(s)), KeyError, ""), | |||
# match should include len(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you are doing this and removed the line above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed by accident. Sorry for missing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I revert those changes in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out. Requested changes were implemented.
thanks @theodorju |
Added test case for Multiindex slicing with NaNs.
Executed linting test based on the instructions provided here and 23 errors were reported, however none regarding the python test file modified.