-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: use EA.astype in ExtensionBlock.to_native_types #28841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: use EA.astype in ExtensionBlock.to_native_types #28841
Conversation
It is also annoying that this does not work for eg SparseArray, so currently I am falling back to the base implementation in case of any error ... |
pandas/core/internals/blocks.py
Outdated
def to_native_types(self, slicer=None, na_rep="nan", quoting=None, **kwargs): | ||
"""override to use ExtensionArray astype for the conversion""" | ||
try: | ||
values = self.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of this can be build outside of the try/except right?
to_native_types has always been a weird name to me. in practice it seems to be used for preparing Also, IIRC there is another similar piece of code in this file dealing with |
There is indeed a
Completely agree. We actually also have a public |
I get a mypy error: pandas/core/internals/blocks.py:2274: error: Definition of "to_native_types" in base class "ExtensionBlock" is incompatible with definition in base class "DatetimeBlock" which is very strange, DatetimeBlock is not at all a base class of ExtensionBlock. The only relationship is that they both derive from the base Block class .. cc @WillAyd @simonjayhawkins any idea how I can get this error go away? |
Ah, I understand now. I didn't look at the location of the error. It is reporting that about DatetimeTZBlock, which indeed subclasses both DatetimeBlock and ExtensionBlock |
Solved it (I think) by also overriding it in DatetimeTZBlock, which I think we needed to do anyway to pick DatetimeBlock's version (although the fact that no tests were failing might be a sign that the specialized DatetimeBlock.to_native_types is either not necessary or not properly tested) |
I fully agree it would be nice to make that better, but right now, formatting on the one hand and csv on the other (which uses to_native_types) are two separate things, and fixing that seems out of scope for this PR. |
ok fair enough. do we have an issue for this distinction? |
The main remaining question is whether we want tests for this (or if we are happy with it being indirectly tested on geopandas). Certainly, ideally we test it here as well. But, we currently don't have any EA that has that problem. But will check if I can do a minimal ListArray EA (without implementing it fully) to reproduce this issue. |
OK, added a minimal ListArray and a test based on that just for this issue. And I checked that the test actually fails without the overriden |
pandas/tests/extension/list/array.py
Outdated
@@ -0,0 +1,131 @@ | |||
"""Test extension array for storing nested data in a pandas container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think newline after """
pandas/tests/extension/list/array.py
Outdated
|
||
@classmethod | ||
def construct_array_type(cls): | ||
"""Return the array type associated with this dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline, period
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LIstArray tests look good
if copy: | ||
return self.copy() | ||
return self | ||
elif pd.api.types.is_string_dtype(dtype) and not pd.api.types.is_object_dtype( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could import at the top
Any opinions about including this in 0.25.2 ? Normally it would not meet my requirements to be included in a bug fix release, but in this case it solves a regression in geopandas (due to moving to ExtensionArray, and it's not something I can fix in geopandas). But not fully how risky this change is for other EAs. |
seems ok for 0.25.2 |
Fine to include in 0.25, but you should do it soon :)
… On Oct 14, 2019, at 17:48, Jeff Reback ***@***.***> wrote:
seems ok for 0.25.2
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
OK, then I will merge ;) |
@meeseeksdev backport 0.25.x |
Owee, I'm MrMeeseeks, Look at me. There seem to be a conflict, please backport manually. Here are approximate instructions:
And apply the correct labels and milestones. Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon! If these instruction are inaccurate, feel free to suggest an improvement. |
Manual backport: #28985 |
Closes #28840
This implements a
ExtensionBlock.to_native_types
to override the base class which converts to a numpy array.To test this is a bit tricky. The current JSONArray we have in the extension tests does not have this issue. I would need to implement and add tests for a new ListArray to check this (I can ensure that this fixes the geopandas issue)