Skip to content

REGR: Fix conversion of mixed dtype DataFrame to numpy str #35473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 7, 2020
Merged

REGR: Fix conversion of mixed dtype DataFrame to numpy str #35473

merged 5 commits into from
Aug 7, 2020

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Jul 30, 2020

@@ -847,7 +847,7 @@ def _interleave(self, dtype=None, na_value=lib.no_default) -> np.ndarray:
# Give EAs some input on what happens here. Sparse needs this.
if isinstance(dtype, SparseDtype):
dtype = dtype.subtype
elif is_extension_array_dtype(dtype):
elif is_extension_array_dtype(dtype) or is_dtype_equal(dtype, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure this is actually hit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is to avoid initializing result as np.empty(dtype=str) which was creating an array with dtype "<U1" and then breaking things.

Before the change that caused this regression the dtype was always being set to the inferred dtype from _interleaved_dtype (object in this case), so here I'm trying to make sure that this still happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a separate elif as it is very confusing here the way it is written

@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Jul 30, 2020
@simonjayhawkins simonjayhawkins added Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions labels Jul 30, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok, pls merge master and a comment, ping on green.

@@ -847,7 +847,7 @@ def _interleave(self, dtype=None, na_value=lib.no_default) -> np.ndarray:
# Give EAs some input on what happens here. Sparse needs this.
if isinstance(dtype, SparseDtype):
dtype = dtype.subtype
elif is_extension_array_dtype(dtype):
elif is_extension_array_dtype(dtype) or is_dtype_equal(dtype, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a separate elif as it is very confusing here the way it is written

@jreback
Copy link
Contributor

jreback commented Aug 7, 2020

lgtm. ping on green.

@dsaxton
Copy link
Member Author

dsaxton commented Aug 7, 2020

lgtm. ping on green.

@jreback Green, thanks for reviewing

@jreback jreback merged commit 47c17cb into pandas-dev:master Aug 7, 2020
@jreback
Copy link
Contributor

jreback commented Aug 7, 2020

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants