Skip to content

BUG/PERF: Avoid listifying in dispatch_to_extension_op #23155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 19, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions doc/source/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,18 @@ or not that succeeds depends on whether the operation returns a result
that's valid for the ``ExtensionArray``. If an ``ExtensionArray`` cannot
be reconstructed, an ndarray containing the scalars returned instead.

For ease of implementation and consistency with operations between pandas
and NumPy ndarrays, we recommend *not* handling Series and DataFrame in
your binary ops. Instead, you should detect these cases and return ``NotImplemented``.
When pandas encounters an operation like ``op(Series, ExtensionArray)``, pandas
will

1. unbox the array from the ``Series`` (roughly ``Series.values``)
2. call ``result = op(values, ExtensionArray)``
3. re-box the result in a ``Series``

Similar for DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above seems the good logic to me. But, shouldn't then the _create_comparison_method be updated to actually do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to delete the DataFrame comment. I think it's not so relevant since DataFrames are 2D. I'm assuming most arrays will want to match NumPy's broadcasting behavior.

Which _create_comparison_method do you mean? The one used in ExtensionScalarOpsMixin`?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(my comment was about the full block, not especially DataFrame)

Which _create_comparison_method do you mean? The one used inExtensionScalarOpsMixin`?

The one I was looking at in the diff, is the IntegerArray one I think. But I assume for the base class mixin, the same is true.


.. _extending.extension.testing:

Testing Extension Arrays
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ def _coerce_to_ndarray(self):
data[self._mask] = self._na_value
return data

__array_priority__ = 1 # higher than ndarray so ops dispatch to us
__array_priority__ = 1000 # higher than ndarray so ops dispatch to us
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just put this in the base class? (for the ops mixin)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a little too invasive for a base class. I’d rather leave that up to the subclasser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so what arithmetic subclass would not want this set?

is there an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I'm not sure if there's a way to unset it, if you don't want to set it in a subclass (you don't want to opt into numpy's array stuff at all).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just find this a detail which would likely be forgotten in any subclass, I don't see a harm and much upset in setting it onthe base class (you can always unset if you really really think you need to).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you unset it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really know if setting __array_priority__ = 0 is enough to "unset" it, and I don't know what all setting __array_priority__ in the first place opts you into.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document this in the Mixin itself though (if you are not going to set it by defaulrt). It is so non-obvious that you need to do this.


def __array__(self, dtype=None):
"""
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/extension/decimal/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def _is_numeric(self):


class DecimalArray(ExtensionArray, ExtensionScalarOpsMixin):
__array_priority__ = 1000

def __init__(self, values, dtype=None, copy=False, context=None):
for val in values:
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/extension/json/array.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ def construct_from_string(cls, string):

class JSONArray(ExtensionArray):
dtype = JSONDtype()
__array_priority__ = 1000

def __init__(self, values, dtype=None, copy=False):
for val in values:
Expand Down