Skip to content

DataFrame constructor is inconsistent when coercing values to strings with dtype=str. #24388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Dec 21, 2018 · 3 comments
Labels
DataFrame DataFrame data structure Dtype Conversions Unexpected or buggy dtype conversions

Comments

@TomAugspurger
Copy link
Contributor

When providing dtype=str to the DataFrame constructor, we're inconsistent about coercing values to strings.

When there's no overlap between the keys of data and columns, things are probably OK.

In [4]: pd.DataFrame(index=[0, 1], columns=[0, 1], dtype=str)
Out[4]:
     0    1
0  NaN  NaN
1  NaN  NaN

(those values are np.nan).

But when there is an overlap between keys of data and columns, the newly introduced values are coerced to strings.

In [8]: pd.DataFrame({'A': [1, 2]}, index=[0, 1], columns=['A', 'B'], dtype=str)
Out[8]:
   A    B
0  1  nan
1  2  nan

(everything in that dataframe is a string, like "1" or "nan")

That's be cause init_dict relies on arrays_to_mgr to coerce the values to the dtype, and arrays_to_mgr only gets a single dtype.

@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Dec 21, 2018
@TomAugspurger
Copy link
Contributor Author

This is not fixed in #24387

@TomAugspurger TomAugspurger added the DataFrame DataFrame data structure label Dec 21, 2018
@FR4NKESTI3N
Copy link

I am not able to reproduce this in pandas 0.23.4:

>>> pd.DataFrame({'A': [1, 2]}, index=[0, 1], columns=['A', 'B'], dtype=str)['B'][0] is np.nan
True

@TomAugspurger
Copy link
Contributor Author

Huh I must have been on the wrong branch again. Thanks @FR4NKESTI3N.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

No branches or pull requests

2 participants