Skip to content

Handle construction of string ExtensionArray from lists #27674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 2, 2019

Conversation

xhochy
Copy link
Contributor

@xhochy xhochy commented Jul 31, 2019

I had to add a string-based Arrow Extension array to trigger the bug but did not run the same test suite as we do on the boolean array as I don't see it adding value but just runtime at the moment.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 31, 2019

Thanks.

I'm wondering a bit more general fix. What about something like

# in sanitize_array. Somewhere before `data = extract_array(...)`

if not is_array_like(data) and is_extension_array_dtype(dtype):
    data = dtype.construct_array_type()._from_sequence(data, dtype=dtype, copy=copy)

Would that work? Then we get to the EA handling earlier on, and don't require special casing later on.

@xhochy
Copy link
Contributor Author

xhochy commented Aug 1, 2019

I moved the check slightly up but didn't move it significantly up as the above code could actually be taken by ExtensionArray data. If I run into other issues, I would revisit this if I run into other problems while workin on fletcher but moving it to a more broad scope at the moment seems to me a bit risky as sanitize_array is not directly tested but only indirectly via Series constructions.

@TomAugspurger
Copy link
Contributor

OK, thanks.

@jbrockmendel can you take a look at this?

self._dtype = ArrowBoolDtype()

def __repr__(self):
return "ArrowBoolArray({})".format(repr(self._data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might as well share this method by using "{cls}({data})".format(cls=type(self).__name__, data=repr(self._data))

@jbrockmendel
Copy link
Member

small comment, otherwise looks good

@xhochy xhochy force-pushed the ea-str-construction branch from da473b4 to 2f53e34 Compare August 1, 2019 17:25
@WillAyd WillAyd added the ExtensionArray Extending pandas with custom dtypes or arrays. label Aug 1, 2019
@xhochy
Copy link
Contributor Author

xhochy commented Aug 1, 2019

@jbrockmendel Adapted that but Travis keeps faling for unrelated reasons.

@xhochy xhochy force-pushed the ea-str-construction branch from 2f53e34 to 3f30c15 Compare August 2, 2019 09:36
@xhochy
Copy link
Contributor Author

xhochy commented Aug 2, 2019

Rebased on master and everything is green now.

@jorisvandenbossche jorisvandenbossche merged commit c8f040d into pandas-dev:master Aug 2, 2019
@jorisvandenbossche
Copy link
Member

Thanks @xhochy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String-based ExtensionArrays force cast to numpy in list based construction
5 participants