Skip to content

API: astype("string") behavior #41856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
simonjayhawkins opened this issue Jun 7, 2021 · 0 comments
Open

API: astype("string") behavior #41856

simonjayhawkins opened this issue Jun 7, 2021 · 0 comments
Labels
API Design Astype Dtype Conversions Unexpected or buggy dtype conversions Strings String extension data type and string data

Comments

@simonjayhawkins
Copy link
Member

in #39908 (comment) @jorisvandenbossche wrote


Bringing the "deferred" storage mode lookup for StringDtype discussion (originally here: #39908 (comment)) in the main thread, and trying to recap.

Currently, doing pd.StringDtype() (without specifying the storage), will already look up the option. In the default case, you get:

>>> pd.StringDtype().storage
'python'

which also means that pandas_dtype() already "fully initializes" the string dtype:

>>> pd.api.types.pandas_dtype("string")
string[python]

As a consequence, doing astype("string") will actually convert the values if your string dtype doesn't match the globab setting:

>>> s = pd.Series(['a', 'b'], dtype=pd.StringDtype(storage="pyarrow"))
>>> s.dtype
string[pyarrow]
>>> s.astype("string").dtype
string[python]

While I think it could make sense for .astype("string") to mean: "ensure I have a string dtype", and thus don't convert to another storage backend if I already had a string dtype to start with.

We do something similar for CategoricalDtype ("category" means a categorical dtype with no categories, but astype("category") does not remove your categories, it preserves any existing categorical dtype as is).

We could still have the astype("string") behave in the way I suggest by special casing this in the astype implementations (as suggested in #39908 (comment))), but I think that's something we would ideally avoid (any astype implementation accepting string dtype values as input would need to handle this case?)

@simonjayhawkins simonjayhawkins added API Design Strings String extension data type and string data labels Jun 7, 2021
@mroeschke mroeschke added the Dtype Conversions Unexpected or buggy dtype conversions label Aug 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Astype Dtype Conversions Unexpected or buggy dtype conversions Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

3 participants