-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST (string dtype): fix IO dtype_backend tests for storage of str dtype of columns' Index #59509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST (string dtype): fix IO dtype_backend tests for storage of str dtype of columns' Index #59509
Conversation
expected = DataFrame( | ||
{ | ||
"a": pd.array(["a", "b"], dtype=pd.StringDtype(string_storage)), | ||
"b": pd.array(["x", pd.NA], dtype=pd.StringDtype(string_storage)), | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this is changing is that the expected result is also created in the with pd.option_context("mode.string_storage", string_storage):
context. What this actually changes is that when future.infer_string is enabled, that affects the storage used by the Index object with str dtype used for the columns
of the resulting dataframe.
tm.assert_frame_equal(result, expected) | ||
# the storage of the str columns' Index is also affected by the | ||
# string_storage setting -> ignore that for checking the result | ||
tm.assert_frame_equal(result, expected, check_column_type=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the tests to essentially just ignore the issue.
We could also verify it explicitly (although it's not the main thing that this test is meant to test) by doing something like:
if using_infer_string:
expected.columns = expected.columns.astype(pd.StringDtype(string_storage, na_value=np.nan))
Which, now I comment that here, is maybe also perfectly fine (not too verbose, compared to the comment I had to add :))
Thanks @jorisvandenbossche |
…pe of columns' Index (pandas-dev#59509) * TST (string dtype): fix failure in csv test_dtype_backend_string test * ignore storage of Index str dtype
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (pandas-dev#59509)
…pe of columns' Index (#59509)
Quick follow-up on #59481 (I missed that with the last merge of main there was actually one relevant failure, maybe some interaction with one of my other merged PRs)
This was caused because of the merge of #59488, that changed to honor the
mode.string_storage
option for the future default string dtype. That meant that DataFrames with string column names that were created in awith pd.option_context("mode.string_storage", string_storage):
context would now also follow the specified storage for the column names.