Skip to content

ENH: Add StringArray.__arrow_array__ for conversion to Arrow #29182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Similar as #28368 (which did this for IntegerArray)

0.15.0), which means that it is supported in writing to the Parquet file
format when using the ``pyarrow`` engine. It is currently not yet supported
when converting back to pandas (so it will become an integer or float
(depending on the presence of missing data) or object dtype column).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maya have an extra parenthesis here.

@TomAugspurger TomAugspurger added this to the 1.0 milestone Oct 23, 2019
@@ -182,6 +182,16 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
def _from_sequence_of_strings(cls, strings, dtype=None, copy=False):
return cls._from_sequence(strings, dtype=dtype, copy=copy)

def __arrow_array__(self, type=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we make a new base EA mixin class for this? e.g. this is pretty generic code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and have String/Integer array use this? (maybe actually just put in the EA base class itself?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this one is different than the one in IntegerArray, so it cannot directly be shared with that. I thought about putting this in the PandasArray class, but the StringArray is the only subclass of it that is actually used. So therefore I kept it here. But open to other options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe actually just put in the EA base class itself?

That's not possible, because it depends on the internals (eg StringArray uses _ndarray for the data, IntegerArray uses _data and _mask). The only thing that could be put in the base EA class is the signature with a NotImplementedError, but I need to check how that interacts with pyarrow (as the code there does a hasattr check, so if the method is present, it expects it to work)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool. yeah this can certainly be handled in a followup; but i think we DO want to generally enable these types of conversions.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Oct 23, 2019
@jorisvandenbossche
Copy link
Member Author

The failing check is codecov, so ignoring that

@jorisvandenbossche jorisvandenbossche merged commit 82df98a into pandas-dev:master Oct 25, 2019
@jorisvandenbossche jorisvandenbossche deleted the string-array-arrow branch October 25, 2019 13:34
@jorisvandenbossche
Copy link
Member Author

The failing check is codecov, so ignoring that

However, we should probably ensure we are testing somewhere pyarrow 0.15 / latest release (from a quick look that doesn't seem to be included on any build). Will open an issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants