-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add __array_ufunc__ to Series / Array #23293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
2fffac3
c5a4664
dd332a4
71c058e
a0d11d9
4cfeb9b
607f8a6
c4fcae7
65dea1b
134df14
5239b70
3d91885
429f15c
41f4158
0d6a663
8f46391
44e3c7e
e179913
27208c1
0b1e745
9be1dff
775c2ef
4d7f249
0b359d7
bbbf269
64d8908
d1788b0
ef5d508
fe0ee4e
971e347
95e8aef
7bfd584
06e5739
feee015
3702b9b
a0f84ed
d83fe7a
edad466
e4ae8dc
db60f6c
a9bd6ef
1a8b807
0b0466d
1f67866
d3089bd
6e770e8
15a3fb1
b5e7f45
4f4bd93
5dbff49
b623be2
2237233
5b5c547
10bc2cc
5380b77
9f4d110
6c15ee7
ab48bd8
30fced8
7486d26
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
import numbers | ||
import sys | ||
import warnings | ||
import copy | ||
|
@@ -10,6 +11,7 @@ | |
from pandas.compat import set_function_name | ||
|
||
from pandas.core import nanops | ||
from pandas.core import ops | ||
from pandas.core.dtypes.cast import astype_nansafe | ||
from pandas.core.dtypes.generic import ABCSeries, ABCIndexClass | ||
from pandas.core.dtypes.common import ( | ||
|
@@ -293,6 +295,44 @@ def __array__(self, dtype=None): | |
""" | ||
return self._coerce_to_ndarray() | ||
|
||
_HANDLED_TYPES = (np.ndarray, numbers.Number) | ||
|
||
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): | ||
|
||
out = kwargs.get('out', ()) | ||
|
||
for x in inputs + out: | ||
if not isinstance(x, self._HANDLED_TYPES + (IntegerArray,)): | ||
return NotImplemented | ||
|
||
# for binary ops, use our custom dunder methods | ||
result = ops.maybe_dispatch_ufunc_to_dunder_op( | ||
self, ufunc, method, *inputs, **kwargs) | ||
if result is not None: | ||
return result | ||
|
||
if (method == '__call__' | ||
and ufunc.signature is None | ||
and ufunc.nout == 1): | ||
# only supports IntegerArray for now | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should check for |
||
args = [a._data for a in inputs] | ||
masks = [a._mask for a in inputs] | ||
result = ufunc(*args, **kwargs) | ||
mask = np.logical_or.reduce(masks) | ||
if result.dtype.kind in ('i', 'u'): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blank line |
||
return IntegerArray(result, mask) | ||
else: | ||
result[mask] = np.nan | ||
return result | ||
|
||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# fall back to array for other ufuncs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be a generic function that one can just call (e.g. put it in ops.py and pass in self, ufuc, method, inputs, kwargs)? IOW is this called anywhere else |
||
inputs = tuple( | ||
np.array(x) if isinstance(x, type(self)) else x | ||
for x in inputs | ||
) | ||
return np.array(self).__array_ufunc__( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You probably want to call the public ufunc again here, e.g., See NDArrayOperatorsMixin for an example implementation. |
||
ufunc, method, *inputs, **kwargs) | ||
|
||
def __iter__(self): | ||
for i in range(len(self)): | ||
if self._mask[i]: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -623,6 +623,29 @@ def view(self, dtype=None): | |
return self._constructor(self._values.view(dtype), | ||
index=self.index).__finalize__(self) | ||
|
||
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs): | ||
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Normally we recommend doing a type check and returning For example, consider another library like xarray that implements its own version of a This might even be an issue for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking somehow: the The problem is that for now the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I doubt you need to list types dependent on the operators -- it's probably enough to list the types that can be used in valid operations with It's true that you would possibly need to recurse into extension arrays to figure this out all the valid types, and it may not be easy to enumerate all scalar types, e.g., if IPAddressExtensionArray can handle arithmetic with IPAddress, how could pandas.Series know about that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think it are mainly the scalar values which might be hard to list (apart from those, we would probably accept Series, Index, array, and a bunch of array-likes (list, tuple, ..) for the Series case). But what you mention about custom ExtensionArrays is then even another problem, that we don't know what they would accept. By "recurse into extension arrays", would you mean something like checking the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, that's about the best I can imagine. But I agree, it doesn't seem easy! There's no general way to detect what types another object's It is probably reasonable to trade-off correctness for generality here, but we should be cognizant that we are making this choice. Alternatively, we might maintain an explicit "blacklist" of types which pandas can't handle, though this gets tricky with non-required imports (e.g., There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @shoyer in light of the above, how would you define the handled types for an object dtyped Series, in which you in principle can put any python object you want? For numpy arrays, things like addition work element-wise for object dtype, so the following works:
Doing a more strict check for allowed types, then the above will no longer work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is numpy actually dealing with the above? It makes an exception for object dtype? So two ideas that might be useful here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are good questions. In principle NumPy follows the rules described at http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#behavior-in-combination-with-python-s-binary-operations but I think there is something missing for unknown scalars / object arrays... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See numpy/numpy#12258 for a description of how NumPy does things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To make sure I understand this thread: @shoyer kicked it off by saying
We can handle the scalar types of EAs by requiring that they implement a object-dtype complicates things. Did we decide what to do there? Skip the check?
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
inputs = tuple( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a few comments here on what you are doing |
||
x._values if isinstance(x, type(self)) else x | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: could move |
||
for x in inputs | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. e.g. I think you are calling the same here (as i commented above) |
||
# for binary ops, use our custom dunder methods | ||
result = ops.maybe_dispatch_ufunc_to_dunder_op( | ||
self, ufunc, method, *inputs, **kwargs) | ||
if result is not None: | ||
return result | ||
|
||
if hasattr(self._values, '__array_ufunc__'): | ||
result = self._values.__array_ufunc__( | ||
ufunc, method, *inputs, **kwargs) | ||
else: | ||
result = np.array(self._values).__array_ufunc__( | ||
ufunc, method, *inputs, **kwargs) | ||
if result is NotImplemented: | ||
raise TypeError("The '{0}' operation is not supported for " | ||
"dtype {1}.".format(ufunc.__name__, self.dtype)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of all this (checking for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, that sounds like a better idea :) |
||
return self._constructor(result, index=self.index, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that not every ufunc returns one result, so you should either handle or explicitly error for the other cases. See NDArrayOperatorsMixin for how to handle this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Out of curiosity, do you have the same problem with the current implementation of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To answer my own question: yes, at least for ufuncs that return a scalar:
But for ufuncs that return multiple values it seems to work with pandas master (wrapping each return element I suppose). |
||
copy=False).__finalize__(self) | ||
|
||
def __array__(self, result=None): | ||
""" | ||
the array interface, return my values | ||
|
@@ -640,10 +663,9 @@ def __array_prepare__(self, result, context=None): | |
""" | ||
Gets called prior to a ufunc | ||
""" | ||
|
||
# nice error message for non-ufunc types | ||
if (context is not None and | ||
not isinstance(self._values, (np.ndarray, ABCSparseArray))): | ||
not isinstance(self._values, (np.ndarray, ExtensionArray))): | ||
obj = context[1][0] | ||
raise TypeError("{obj} with dtype {dtype} cannot perform " | ||
"the numpy op {op}".format( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why you use a different sentinel value
None
rather than Python's standardNotImplemented
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a specific reason, just that I didn't return anything in the
maybe_dispatch_ufunc_to_dunder_op
, which means the result is None (and our own dunder ops will never return None).But can make it more explicit.