API: astype("string") behavior #41856
Labels
API Design
Astype
Dtype Conversions
Unexpected or buggy dtype conversions
Strings
String extension data type and string data
in #39908 (comment) @jorisvandenbossche wrote
Bringing the "deferred" storage mode lookup for
StringDtype
discussion (originally here: #39908 (comment)) in the main thread, and trying to recap.Currently, doing
pd.StringDtype()
(without specifying the storage), will already look up the option. In the default case, you get:which also means that
pandas_dtype()
already "fully initializes" the string dtype:As a consequence, doing
astype("string")
will actually convert the values if your string dtype doesn't match the globab setting:While I think it could make sense for
.astype("string")
to mean: "ensure I have a string dtype", and thus don't convert to another storage backend if I already had a string dtype to start with.We do something similar for CategoricalDtype (
"category"
means a categorical dtype with no categories, butastype("category")
does not remove your categories, it preserves any existing categorical dtype as is).We could still have the
astype("string")
behave in the way I suggest by special casing this in theastype
implementations (as suggested in #39908 (comment))), but I think that's something we would ideally avoid (anyastype
implementation accepting string dtype values as input would need to handle this case?)The text was updated successfully, but these errors were encountered: