You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected behaviour would have been that astype is called on the ExtensionArray which then can do the casting on its own. Currently I have the problem that my underlying storage (ExtensionArray backed by Arrow) is not numpy-compatible and thus everything turns into np.array(…, dtype=object) before it is casted.
Happy to fix this on my own but I would need a pointer on what the correct approach is, i.e. where one should delegate to ExtensionArray.astype.
Output of pd.show_versions()
Ran into this with 0.23.0 but the code has not changed in master in this area.
I don't think this will fully fixed by #21185. #21185 will fix it for the case of pd.Series(..).astype(EAtype), but not for the case of pd.Series(EA).astype(numpy_type)
Happy to fix this on my own but I would need a pointer on what the correct approach is, i.e. where one should delegate to ExtensionArray.astype.
I think a fix in pandas.core.internals.Block._astype is probably appropriate (to ensure get_values is not called).
Calling
pd.Series(EA).astype(object)
will always generate an intermediate NumPy array and never delegate to theastype
method of theExtensionArray
.Callstack is as follows:
pd.Series(EA).astype()
pandas.core.internals.Block._astype
callsself.get_values()
inpandas/pandas/core/internals.py
Line 661 in 4274b84
(introducted by https://github.com/pandas-dev/pandas/pull/20581/files)
pandas.core.internals.ExtensionBlock.get_values
then casts the ExtensionArray to anumpy.array
:pandas/pandas/core/internals.py
Line 1937 in 4274b84
Expected behaviour would have been that
astype
is called on theExtensionArray
which then can do the casting on its own. Currently I have the problem that my underlying storage (ExtensionArray backed by Arrow) is not numpy-compatible and thus everything turns intonp.array(…, dtype=object)
before it is casted.Happy to fix this on my own but I would need a pointer on what the correct approach is, i.e. where one should delegate to
ExtensionArray.astype
.Output of
pd.show_versions()
Ran into this with 0.23.0 but the code has not changed in master in this area.
The text was updated successfully, but these errors were encountered: