-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: astyping Categorical to nullable integer dtype #39616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, that's a bug. It could be fixed in |
Hi, I have made a small investigation of this bug. I am not familiar with your type system, so I will need some suggestions from you on how I should proceed to solve it.
How should it be solved?
|
Thanks for taking a look!
Yes, that is expected (see https://pandas.pydata.org/docs/dev/user_guide/integer_na.html for some explanation about this, the case sensitivity is intended to differentiate between numpy's dtype and our nullable dtype)
Ah, I didn't notice above that it were integer-like strings. So in that case, the issue can actually be simplified to the following case that also fails, i.e. creating an integer array from strings: >>> pd.array(np.array(['1', '2'], dtype=object), dtype="Int64")
...
~/scipy/pandas/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
170 "mixed-integer-float",
171 ]:
--> 172 raise TypeError(f"{values.dtype} cannot be converted to an IntegerDtype")
173
174 elif is_bool_dtype(values) and is_integer_dtype(dtype):
TypeError: object cannot be converted to an IntegerDtype
>>> pd.array(np.array['1', '2'], dtype="Int64")
...
~/scipy/pandas/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
176
177 elif not (is_integer_dtype(values) or is_float_dtype(values)):
--> 178 raise TypeError(f"{values.dtype} cannot be converted to an IntegerDtype")
179
180 if mask is None:
TypeError: <U1 cannot be converted to an IntegerDtype So I think the first question we need to decide on is if we want to support converting strings to integers like that in general. I think the answer is yes (numpy supports it, and so we already support it for the non-nullable dtypes as well, eg
This can be updated in
|
Thanks for the prompt reply, I have found out that besides if is_extension_array_dtype(dtype):
cls = cast(ExtensionDtype, dtype).construct_array_type()
inferred_dtype = lib.infer_dtype(data, skipna=True)
if inferred_dtype == "string":
return cls._from_sequence_of_strings(data, dtype=dtype, copy=copy)
else:
return cls._from_sequence(data, dtype=dtype, copy=copy) The problem with this particular case that we need to find out the underlying data type of CategoricalDtype, e.g. to infer that categorical entries are strings. Also, not all extension classes have the Otherwise, it is possible to change |
This astyping op works for
int64
but throws forInt64
. It should work the same for bothThe text was updated successfully, but these errors were encountered: