-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: [ArrowStringArray] more parameterised testing - part 2 #40749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: [ArrowStringArray] more parameterised testing - part 2 #40749
Conversation
@@ -1146,6 +1146,8 @@ def nullable_string_dtype(request): | |||
* 'string' | |||
* 'arrow_string' | |||
""" | |||
from pandas.core.arrays.string_arrow import ArrowStringDtype # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this to force the registration of the arrow_string
dtype. running some tests individually sometimes fails. will be removed in #39908
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to handle the import error here and xfail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate or I can just remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would raise an import error right on some builds? if so simply xfail if this errors and the tests would also xfail
that way u can use this fixture w/o worrying too much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are skipped with the fixture. It's passing tests that may fail if the arrow_string
dtype is not registered, it's not part of the public api yet. and won't be since we are going to use the parameterized dtype in #39908 instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see also #39908 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would raise an import error right on some builds?
no. importing the ArrowStringDtype is safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i c, ok then
@@ -584,6 +583,20 @@ def test_astype_ignores_errors_for_extension_dtypes(self, df, errors): | |||
with pytest.raises((ValueError, TypeError), match=msg): | |||
df.astype(float, errors=errors) | |||
|
|||
@pytest.mark.parametrize("errors", ["raise", "ignore"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may want to consider using a shared function as the impl to avoid repeating things. though maybe not worth it (e.g. just copy in the cases where this is needed if there are not that many)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a fixture for errors? if not can u create one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could you parametrize the test above, but have the dtype be a separate param, so it can be skipped for arrow_string?
Something like
@pytest.mark.parametrize(
"data, dtype",
[
(["x", "y", "z"], "string"),
pytest.param((["x", "y", "z"], "string_arrow"), mark=..skip...),
(["x", "y", "z"], "category"),
(3 * [Timestamp("2020-01-01", tz="UTC")], None),
(3 * [Interval(0, 1)], None),
],
)
and then inside the test create the dataframe (df = DataFrame(Series(data, dtype=dtype))
).
That could also avoid having to duplicate the full test body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(although for the other tests where pd.array is used in the parametrization, this might get a bit more annoying to do it in this way ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a fixture for errors? if not can u create one
we have no fixture yet in conftest.py (common arguments section) for errors
we have 4 tests parameterized on errors...
test_astype_ignores_errors_for_extension_dtypes
in pandas/tests/frame/methods/test_astype.py
and pandas/tests/series/methods/test_astype.py
parameterised on ["raise", "ignore"]
test_custom_dateparsing_error
in pandas/tests/io/test_sql.py
parameterised on ["ignore", "raise", "coerce"]
and
test_coerce_outside_ns_bounds
in pandas/tests/tslibs/test_array_to_datetime.py
parameterised on ["coerce", "raise"]
will be pushing a commit shortly with @jorisvandenbossche suggested changes to avoid duplication so doing anything with errors parameterisation is now outside the scope of this PR.
I don't think worth opening a separate PR for that yet, especially considering the differences in the parameterisation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(although for the other tests where pd.array is used in the parametrization, this might get a bit more annoying to do it in this way ..)
the other two tests have simpler code so duplication is probably better than complex setup
test_astype_ignores_errors_for_extension_dtypes is already duplicated in frame and series methods modules so makes more sense to avoid adding more duplication there. have updated with suggestion.
The failure looks like a flaky ResourceWarning related one. |
yes. but I restarted anyway. |
only a few tests updated in this PR to ensure this approach is accepted.
if we try to add directly to the parametrization of the existing tests we get
ImportError
on environments without pyarrow installed. #40679 (comment)