Converting ExtensionArrays to indices causes coersion to object #29426
Labels
Duplicate Report
Duplicate issue or pull request
ExtensionArray
Extending pandas with custom dtypes or arrays.
Index
Related to the Index class or subclasses
Code Sample (from pandas source)
Problem description
Right now, when you create an index from an ExtensionArray, it causes a coercion to a numpy array of Python objects. See above. For many use cases, this is an expensive operation - many people switch to ExtensionArrays to avoid having numpy lists of objects to begin with! Ideally, it would be possible to store ExtensionArrays inside indices (or provide an interface for ExtensionArrays to implement) so that a conversion to object types would not be necessary.
For my use case, I have many 8-byte python objects that I want to do fast operations on. I use ExtensionArrays to store a contiguous np.uint64 array that has the proper repr inside a Series, and that get lazily converted on access. I want to be able to use them inside indices without a massive performance degradation!
Expected Output
A version of pandas where ExtensionArrays are stored directly inside indices, as opposed to coercing to object.
Output of
pd.show_versions()
A big thank you to all of the pandas maintainers. Without your work, I wouldn't be here in the first place! Thanks for considering this issue.
The text was updated successfully, but these errors were encountered: