Skip to content

TST / string dtype: add env variable to enable future_string and add test build #58459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

xref #54792

Taking a similar approach as we did for CoW: allow enabling the option with an env variable (also useful for testing locally), and add an extra build on CI with that option enabled.

Not all tests are passing yet, so will see how to mark this build as "allow failures", but I think it would already be useful to set up the infra so we can fix the tests piecewise in follow-ups.

@jorisvandenbossche jorisvandenbossche added Testing pandas testing functions or related to the test suite Strings String extension data type and string data labels Apr 27, 2024
@jorisvandenbossche jorisvandenbossche requested review from phofl and mroeschke and removed request for mroeschke April 27, 2024 19:12
@@ -858,7 +858,7 @@ def register_converter_cb(key: str) -> None:
with cf.config_prefix("future"):
cf.register_option(
"infer_string",
False,
True if os.environ.get("PANDAS_FUTURE_INFER_STRING", "0") == "1" else False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this something that is explicitly FOR_TESTING_ONLY or something to clarify users shouldn't use it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer to keep the name a bit more readable. We did the same for CoW, I don't think we had any problems with it?
I also want to use this for testing in downstream packages, and would encourage others to do this as well (of course, that is still in a context of testing, but I feel that FOR_TESTING_ONLY would indicate it's only for testing for pandas devs)

(in general, there is not really a back compat issue with the env variable, in pandas 3.0 we will just start ignoring it, which is even easier than the explicit pd.options option, which will start raising an error at some point)

@jbrockmendel
Copy link
Member

jbrockmendel commented May 9, 2024

started a branch similar to this to get the tests passing; i count 15921615 failures locally. Of these, 529 are in tests/io (with -m "not db and not slow" so may be missing some). 810 are in pandas/tests/groupby/test_raises.py::test_groupby_raises_string.

@jorisvandenbossche
Copy link
Member Author

810 are in pandas/tests/groupby/test_raises.py::test_groupby_raises_string.

Impressive that a single test can give 810 failures ;) But then that set of failures should also be relatively easy to solve I suppose

This comment was marked as outdated.

Comment on lines +19 to +23
# temporarily let pytest always succeed (many tests are not yet passing in the
# build enabling the future string dtype)
if [[ "$PANDAS_FUTURE_INFER_STRING" == "1" ]]; then
PYTEST_CMD="$PYTEST_CMD || true"
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also add a flag to pytest and then change the exit code from conftest.py (like in https://github.com/yashtodi94/pytest-custom_exit_code/blob/master/pytest_custom_exit_code.py), but since this is only temporary, this seems the easiest short term hack.

@jorisvandenbossche
Copy link
Member Author

I am going to merge this as this makes testing further string dtype related PRs a lot easier. But happy to further bikeshed on the name and easy to change that later.

@jorisvandenbossche jorisvandenbossche merged commit ebc60f2 into pandas-dev:main Jul 26, 2024
46 checks passed
@jorisvandenbossche jorisvandenbossche deleted the string-option-enable-env branch July 26, 2024 09:36
WillAyd pushed a commit that referenced this pull request Aug 13, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 14, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 27, 2024
WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Sep 20, 2024
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Sep 20, 2024
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024
jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported Strings String extension data type and string data Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants