Skip to content

[ArrowStringArray] BUG: fix test_astype_string for Float32Dtype #40998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 30, 2021

Conversation

simonjayhawkins
Copy link
Member

this also fixes Interval, but that test doesn't fail in #39908 for a single StringDtype parameterized by storage.

@simonjayhawkins simonjayhawkins added the Strings String extension data type and string data label Apr 17, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3 milestone Apr 17, 2021
# numerical issues with Float32Dtype
na_values = scalars._mask
result = scalars._data
result = lib.ensure_string_array(result, copy=False, convert_na_value=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm this doesn't copy? (its fine but then you are mutating)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's not a change in this PR. will need to look back why we do this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to be consistent with StringArray. I guess that the assumption was that a copy would probably have been created in most cases (unless an object array was passed) and explicitly avoid the additional copy made in ensure_string_array since that does not track if a copy/new array has been already been created before making a copy. The default for copy is False.

na_values = scalars._mask
result = scalars._data
result = lib.ensure_string_array(result, copy=False, convert_na_value=False)
result[na_values] = ArrowStringDtype.na_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, might also be able to do pa.array(result, mask=mask, type=pa.string()), then pyarrow will use the mask for missing values, and we don't need the setitem operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@jorisvandenbossche jorisvandenbossche changed the title [ArrowStringArray] fix test_astype_string for Float32Dtype [ArrowStringArray] BUG: fix test_astype_string for Float32Dtype Apr 20, 2021
@jreback jreback merged commit f72f02f into pandas-dev:master Apr 30, 2021
@jreback
Copy link
Contributor

jreback commented Apr 30, 2021

thanks @simonjayhawkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants