-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix regression when using Series with arrow string array #52076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/_libs/lib.pyx
Outdated
arr = arr.to_numpy() | ||
if hasattr(arr, "type"): | ||
# pyarrow array | ||
arr = np.array(arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- can we handle this at a higher level? 2) why doesnt the pyarrow object's to_numpy work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sets zero_copy_only to True which causes raising here.
we have to do a check like this somewhere as long as arrow is not a hard dependency, can do it at a higher level as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it to a higher level, not sure if it is better though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks @phofl |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon! Remember to remove the If these instructions are inaccurate, feel free to suggest an improvement. |
…-dev#52076) * BUG: Fix regression when using Series with arrow string array * Move
@@ -352,6 +352,9 @@ def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy: bool = Fal | |||
result[na_values] = libmissing.NA | |||
|
|||
else: | |||
if hasattr(scalars, "type"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we do an actual isinstance check?
If we have some helper method is_pyarrow_array
like the following, this should be quite easy?
def is_pyarrow_array(obj):
if _pyarrow_installed:
return isinstance(obj, pa.Array)
return False
with variables _pyarrow_installed
and pa
filled in once on first import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR that does this -> #52830
pd.Series
throws ArrowInvalidError on 2.0rc #51844 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.