-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[ArrowStringArray] API: StringArray -> ObjectStringArray #40962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrowStringArray] API: StringArray -> ObjectStringArray #40962
Conversation
more doctstings also need updating. will update shortly. |
Yes, I would do that, let's keep the existing |
I would tend to agree. Most of the changes here are the name change. I'll close this for now. we can maybe consider renaming after the base class is in play. i'll do that in another PR just in case we want to come back to this. |
looking at this again. I think we should do this, not averse to leaving |
Why is it that confusing to have a BaseStringArray for shared code, and then two concrete subclasses StringArray and ArrowStringArray? |
simple, since we are planning dtypes named |
from the comments here and elsewhere, I think we want a BaseStringArray (which cannot be called StringArray for back compat) we want to keep StringArray for now but also want PythonStringArray with StringArray being an alias for PythonStringArray. we could maybe also deprecate StringArray too? will reopen this and make those changes. |
As mentioned above, I don't see the upside of changing the "StringArray" name. Let's just make a BaseStringArray class if needed, and leave the other names as is? |
That's quite straightforward, so will open another PR with just that (maybe could even add to #39908 to address #39908 (comment) directly) and keep this PR for the name change discussion. |
pandas/core/arrays/string_.py
Outdated
|
||
class StringArray(ExtensionArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will rename this StringArrayBase in next commit
pandas/core/strings/object_array.py
Outdated
@@ -186,7 +186,7 @@ def rep(x, r): | |||
|
|||
repeats = np.asarray(repeats, dtype=object) | |||
result = libops.vec_binop(np.asarray(self), repeats, rep) | |||
if isinstance(self, (StringArray, ArrowStringArray)): | |||
if isinstance(self, (PythonStringArray, ArrowStringArray)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could now use the base class. will change in next commit
pandas/core/construction.py
Outdated
@@ -119,7 +119,7 @@ def array( | |||
:class:`datetime.timedelta` :class:`pandas.arrays.TimedeltaArray` | |||
:class:`int` :class:`pandas.arrays.IntegerArray` | |||
:class:`float` :class:`pandas.arrays.FloatingArray` | |||
:class:`str` :class:`pandas.arrays.StringArray` | |||
:class:`str` :class:`pandas.arrays.PythonStringArray` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is overlap here with the dtype change. could add the docstring to the baseclass and no need to reference the two array types.
maybe let's call this ObjectStringArray? (as more accurate / easier to digest name) |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
This PR needs are merge with master to be able to see clearly the changes. will hopefully circle back to this this week. This basically was creating a StringArray class which dispatches to either ArrowStringArray or ObjectStringArray which are both EAs in their own right but hidden from the user and also addressed the astype issue #41856 |
this is quite old, happen to reopen if actively worked on. |
xref #35169 (comment)
this PR is draft as this change has not yet been discussed. (maybe could keep the StringArray the same and create a StringArrayBase instead)
also probably want to do this after #39908 (which is not yet ready) and then move the common base class into a separate module along with the dtype. (this could be string_base.py or create subdir with maybe base.py, string_arrow.py and string_python.py)
we should leave moving things around for a follow-up to keep the diff simpler to review.
This PR could be a precusor to a follow-up to #40708 to remove duplication #40708 (comment) (or could be done with common helper function or mixin #40708 (comment))
No deduplication here. just creating base class and changes to make the tests pass
This is API breaking, but for experimental API, so we would probably need some documentation or add to #40747 if we do this after #39908