-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[ArrowStringArray] BUG: fix test_astype_string for Float32Dtype #40998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
3bb9750
acfb5f5
98b3a5f
56d3717
c095cd4
88b05e8
f337018
d861895
1f35f06
dd59832
8dddaef
770b018
e391075
2d6c835
c28036c
e211b75
45b93c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -227,10 +227,22 @@ def _chk_pyarrow_available(cls) -> None: | |
|
||
@classmethod | ||
def _from_sequence(cls, scalars, dtype: Dtype | None = None, copy: bool = False): | ||
from pandas.core.arrays.masked import BaseMaskedArray | ||
|
||
cls._chk_pyarrow_available() | ||
# convert non-na-likes to str, and nan-likes to ArrowStringDtype.na_value | ||
scalars = lib.ensure_string_array(scalars, copy=False) | ||
return cls(pa.array(scalars, type=pa.string(), from_pandas=True)) | ||
|
||
if isinstance(scalars, BaseMaskedArray): | ||
# avoid costly conversion to object dtype in ensure_string_array and | ||
# numerical issues with Float32Dtype | ||
na_values = scalars._mask | ||
result = scalars._data | ||
result = lib.ensure_string_array(result, copy=False, convert_na_value=False) | ||
result[na_values] = ArrowStringDtype.na_value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, might also be able to do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done. |
||
else: | ||
# convert non-na-likes to str, and nan-likes to StringDtype.na_value | ||
result = lib.ensure_string_array(scalars, copy=False) | ||
|
||
return cls(pa.array(result, type=pa.string(), from_pandas=True)) | ||
|
||
@classmethod | ||
def _from_sequence_of_strings( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm this doesn't copy? (its fine but then you are mutating)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not a change in this PR. will need to look back why we do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to be consistent with StringArray. I guess that the assumption was that a copy would probably have been created in most cases (unless an object array was passed) and explicitly avoid the additional copy made in ensure_string_array since that does not track if a copy/new array has been already been created before making a copy. The default for copy is False.