-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow ArrowDtype(pa.string()) to be compatable with str accessor #51207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like dask-related doctests are failing. do we need to pin a version somewhere? |
pandas/core/groupby/ops.py
Outdated
@@ -405,7 +406,7 @@ def _reconstruct_ea_result( | |||
""" | |||
dtype: BaseMaskedDtype | StringDtype | |||
|
|||
if isinstance(values.dtype, StringDtype): | |||
if is_string_dtype(values.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt make a difference here bc we know we have EA, but we should have a func like is_string_dtype that excludes object
pandas/core/groupby/ops.py
Outdated
@@ -96,6 +96,7 @@ | |||
) | |||
|
|||
if TYPE_CHECKING: | |||
from pandas.core.arrays.string_ import StringDtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the annotation that uses this may no longer be accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added to | ArrowDtype
to the annotation where this is used
I think the docbuild will fail until dask releases a new version. @jrbourbeau typically how often does dask release a new version? |
We release every two weeks. In fact, the latest release was pushed out a couple of hours ago |
Oh perfect looks like this PR picked up the dask update and the tests are passing now! |
Any other comments here? (Failures are unrelated) |
pandas/tests/extension/test_arrow.py
Outdated
) | ||
def test_str_unsupported_methods(method, args): | ||
ser = pd.Series(["abc", None], dtype=ArrowDtype(pa.string())) | ||
with pytest.raises(NotImplementedError, match=f"str.{method} not supported."): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the message be string[pyarrow]-specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I noted it specifically with pd.ArrowDtype(pa.string())
def _str_fullmatch( | ||
self, pat, case: bool = True, flags: int = 0, na: Scalar | None = None | ||
): | ||
if not pat.endswith("$") or pat.endswith("//$"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see this is already in the string_arrow code, but why "//$"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, as you mentioned I just copied it over from the string_arrow code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Best guess is it should be backslashes, meant to exclude escaped dollar signs. OK to consider out of scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice! thx |
Reboot of #50325
Notable change was that
pandas.core.strings.BaseStringArrayMethods
in the__init__.py
was causing a circular import when used inArrowExtensionArray
, so needed to remove the import into__init__.py.
This breaks one of our dask tests but it has been fixed downstream dask/dask#9907