-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: implement ArrowExtensionArray base class #46102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thoughts on where to locate things to avoid ImportErrors when pyarrow is not present? In principle could go in _mixins |
Does it make sense to just have this array in it's own file and do the same
trick |
Hi! Once this is released would you recommend third party developers of extension arrays use this class or is it intended to only be used privately? Thanks! |
@sterlinm the intention I think is for this to be private such that users can have the typical numeric, string, etc. dtype columns backed by pyarrow instead of numpy (e.g. |
@mroeschke Thanks! It looks to me like a really useful framework for anybody who wants to build an extension array that is backed by an Arrow array. I think the use case is pretty much the same as the general use case for Extension Arrays in the first place. The example I was thinking of was that if you wanted to have a datetime array that supported microsecond precision. I know this will probably be in Pandas itself eventually, but basically any time I want to build an extension array I'm going to want to have it be backed by an Arrow array. It looks like you've worked out the quirks of mapping key lookups onto pa.ChunkedArray, and I think it would be nice to be able to build off of that rather than re-implementing myself :) It would serve a similar purpose to the Fletcher library, I think. https://fletcher.readthedocs.io/en/latest/ |
It looks like in the next release of pyarrow there's going to be support for customizing ExtensionScalar's to control what's returned by as_py(). It would be great if that played nicely with the ArrowExtensionArray and ArrowDtype classes so it could be used non-privately. I'd be happy to look into helping out with that once the pyarrow release is done. |
I haven't tested an ExtensionScalar/Type while developing the arrow stuff yet, but ideally this should work when calling If you want to test drive the existing https://github.com/pandas-dev/pandas/blob/main/pandas/core/arrays/arrow/array.py |
Thanks! I'm going to experiment with using these to implement EA's for my own classes. If I run into issues or have suggestions for making that easier, what's the best place for that discussion? Continue commenting here or open a new issue? Or none of the above? 🙂 I'd be happy to attempt to help contribute to address any issues I might run into. |
It would be easier to open up separate Github issues for any feedback. Ideally one issue per topic. Thanks! |
xref #46008