Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ENH: [ArrowStringArray] Enable the string methods for the arrow-backed StringArray #40708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: [ArrowStringArray] Enable the string methods for the arrow-backed StringArray #40708
Changes from 14 commits
9b6c054
c0fedcd
e603321
37035b4
c823953
76292ec
50db876
374924b
23b40e3
4b86f67
7bd82f9
aaf54ca
b1cf83d
36d0034
f479819
d1c8a3e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal is to inherit from BaseStringArrayMethods. (I recall this being mentioned somewhere). For now use ObjectStringArrayMixin similar to fletcher xhochy/fletcher#196
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because you want to have the object-dtype based methods as fallback for the ones that pyarrow doesn't yet support, I suppose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. This PR just gets the string methods working for the existing tests we have StringArray. So just converting to object and not using native pyarrow functions yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small comment about that (eg below where the class is created) about why
ObjectStringArrayMixin
is mixed in?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moreless cut and paste from StringArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be shared? (eg move it a common helper function or mixin?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed. I think the de-duplication is better as an immediate follow-up to keep changes to existing code paths (i.e. StringArray) in a separate PR and keep this one scoped to just additions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either is fine for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or have a common base class. #35169 (comment)
so better as a follow-on to allow for more discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special case here for partition/rpartition/split where the array method returns an object array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could either special case, refactor to dispatch to array or leave as follow-up and xfail for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the xfail is fine for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. keeps the scope of this PR limited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another dup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe can add "isnumeric" instead, which doesn't seem to be tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. "isnumeric" and other "is" functions are tested to some degree by _any_string_method fixture used in test_string_array