Skip to content

TST: [ArrowStringArray] more parameterised testing - part 2 #40749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

simonjayhawkins
Copy link
Member

only a few tests updated in this PR to ensure this approach is accepted.

if we try to add directly to the parametrization of the existing tests we get ImportError on environments without pyarrow installed. #40679 (comment)

@simonjayhawkins simonjayhawkins added Testing pandas testing functions or related to the test suite Strings String extension data type and string data labels Apr 2, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone Apr 2, 2021
@@ -1146,6 +1146,8 @@ def nullable_string_dtype(request):
* 'string'
* 'arrow_string'
"""
from pandas.core.arrays.string_arrow import ArrowStringDtype # noqa: F401
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this to force the registration of the arrow_string dtype. running some tests individually sometimes fails. will be removed in #39908

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to handle the import error here and xfail

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate or I can just remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would raise an import error right on some builds? if so simply xfail if this errors and the tests would also xfail

that way u can use this fixture w/o worrying too much

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are skipped with the fixture. It's passing tests that may fail if the arrow_string dtype is not registered, it's not part of the public api yet. and won't be since we are going to use the parameterized dtype in #39908 instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see also #39908 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would raise an import error right on some builds?

no. importing the ArrowStringDtype is safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i c, ok then

@@ -584,6 +583,20 @@ def test_astype_ignores_errors_for_extension_dtypes(self, df, errors):
with pytest.raises((ValueError, TypeError), match=msg):
df.astype(float, errors=errors)

@pytest.mark.parametrize("errors", ["raise", "ignore"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to consider using a shared function as the impl to avoid repeating things. though maybe not worth it (e.g. just copy in the cases where this is needed if there are not that many)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a fixture for errors? if not can u create one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could you parametrize the test above, but have the dtype be a separate param, so it can be skipped for arrow_string?

Something like

    @pytest.mark.parametrize(
        "data, dtype",
        [
            (["x", "y", "z"], "string"),
            pytest.param((["x", "y", "z"], "string_arrow"), mark=..skip...),
            (["x", "y", "z"], "category"),
            (3 * [Timestamp("2020-01-01", tz="UTC")], None),
            (3 * [Interval(0, 1)], None),
        ],
    )

and then inside the test create the dataframe (df = DataFrame(Series(data, dtype=dtype))).

That could also avoid having to duplicate the full test body.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(although for the other tests where pd.array is used in the parametrization, this might get a bit more annoying to do it in this way ..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a fixture for errors? if not can u create one

we have no fixture yet in conftest.py (common arguments section) for errors

we have 4 tests parameterized on errors...

test_astype_ignores_errors_for_extension_dtypes in pandas/tests/frame/methods/test_astype.py and pandas/tests/series/methods/test_astype.py parameterised on ["raise", "ignore"]

test_custom_dateparsing_error in pandas/tests/io/test_sql.py parameterised on ["ignore", "raise", "coerce"]

and

test_coerce_outside_ns_bounds in pandas/tests/tslibs/test_array_to_datetime.py parameterised on ["coerce", "raise"]

will be pushing a commit shortly with @jorisvandenbossche suggested changes to avoid duplication so doing anything with errors parameterisation is now outside the scope of this PR.

I don't think worth opening a separate PR for that yet, especially considering the differences in the parameterisation

Copy link
Member Author

@simonjayhawkins simonjayhawkins Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(although for the other tests where pd.array is used in the parametrization, this might get a bit more annoying to do it in this way ..)

the other two tests have simpler code so duplication is probably better than complex setup

test_astype_ignores_errors_for_extension_dtypes is already duplicated in frame and series methods modules so makes more sense to avoid adding more duplication there. have updated with suggestion.

@jorisvandenbossche
Copy link
Member

The failure looks like a flaky ResourceWarning related one.

@simonjayhawkins
Copy link
Member Author

yes. but I restarted anyway.

@jorisvandenbossche jorisvandenbossche merged commit 1e94d9f into pandas-dev:master Apr 15, 2021
@simonjayhawkins simonjayhawkins deleted the nullable-string-testing-2 branch April 15, 2021 10:35
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request Apr 21, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants