Skip to content

API: More permissive conversion to StringDtype #33421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Apr 9, 2020

This is a proposal to make using StringDtype more permissive and be usable inplace of dtype=str.

ATM converting to StringDtype will only accept arrays that are str already, meaning you will often have to use astype(str).astype("string") to be sure not to get errors, which can be tedious. For example these fail in master and work in this PR:

>>> pd.Series([1,2, np.nan], dtype="string")
0       1
1       2
2    <NA>
dtype: string
>>> pd.array([1,2, np.nan], dtype="string")
<StringArray>
['1', '2', <NA>]
Length: 3, dtype: string
>>> pd.Series([1,2, np.nan]).astype("string")
0     1.0
1     2.0
2    <NA>
dtype: string
>>> pd.Series([1,2, np.nan], dtype="Int64").astype("string")
0       1
1       2
2    <NA>
dtype: string

etc. now work. Previously the above all gave errors.

Obviously tests and doc updates are still missing, but I would appreciate feedback if this solution looks ok.

@topper-123 topper-123 closed this Apr 9, 2020
@jorisvandenbossche
Copy link
Member

@topper-123 is there a reason you closed this?

@topper-123
Copy link
Contributor Author

topper-123 commented Apr 10, 2020

I had only tested against a subset of the tests, and it didn't pass all tests when I pushed to upstream.

In general, I've found that type conversions in Pandas are quite complex, so I'm still working on getting something to work that is more than a temporary solution.

@topper-123 topper-123 deleted the more_permissive_string_conversion branch May 25, 2020 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

convert numeric column to dedicated pd.StringDtype()
2 participants