-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[ArrowStringArray] BUG: fix test_astype_string for Float32Dtype #40998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrowStringArray] BUG: fix test_astype_string for Float32Dtype #40998
Conversation
pandas/core/arrays/string_arrow.py
Outdated
# numerical issues with Float32Dtype | ||
na_values = scalars._mask | ||
result = scalars._data | ||
result = lib.ensure_string_array(result, copy=False, convert_na_value=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm this doesn't copy? (its fine but then you are mutating)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's not a change in this PR. will need to look back why we do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to be consistent with StringArray. I guess that the assumption was that a copy would probably have been created in most cases (unless an object array was passed) and explicitly avoid the additional copy made in ensure_string_array since that does not track if a copy/new array has been already been created before making a copy. The default for copy is False.
pandas/core/arrays/string_arrow.py
Outdated
na_values = scalars._mask | ||
result = scalars._data | ||
result = lib.ensure_string_array(result, copy=False, convert_na_value=False) | ||
result[na_values] = ArrowStringDtype.na_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, might also be able to do pa.array(result, mask=mask, type=pa.string())
, then pyarrow will use the mask for missing values, and we don't need the setitem operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
thanks @simonjayhawkins |
this also fixes Interval, but that test doesn't fail in #39908 for a single StringDtype parameterized by storage.