Skip to content

Commit 02b552d

Browse files
jorisvandenbosschejreback
authored andcommitted
Add __array_ufunc__ to Series / Array (#23293)
1 parent 7ceefb3 commit 02b552d

File tree

17 files changed

+553
-99
lines changed

17 files changed

+553
-99
lines changed

doc/source/development/extending.rst

+19
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,25 @@ will
208208
2. call ``result = op(values, ExtensionArray)``
209209
3. re-box the result in a ``Series``
210210

211+
.. _extending.extension.ufunc:
212+
213+
NumPy Universal Functions
214+
^^^^^^^^^^^^^^^^^^^^^^^^^
215+
216+
:class:`Series` implements ``__array_ufunc__``. As part of the implementation,
217+
pandas unboxes the ``ExtensionArray`` from the :class:`Series`, applies the ufunc,
218+
and re-boxes it if necessary.
219+
220+
If applicable, we highly recommend that you implement ``__array_ufunc__`` in your
221+
extension array to avoid coercion to an ndarray. See
222+
`the numpy documentation <https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
223+
for an example.
224+
225+
As part of your implementation, we require that you defer to pandas when a pandas
226+
container (:class:`Series`, :class:`DataFrame`, :class:`Index`) is detected in ``inputs``.
227+
If any of those is present, you should return ``NotImplemented``. Pandas will take care of
228+
unboxing the array from the container and re-calling the ufunc with the unwrapped input.
229+
211230
.. _extending.extension.testing:
212231

213232
Testing extension arrays

doc/source/getting_started/dsintro.rst

+42-8
Original file line numberDiff line numberDiff line change
@@ -731,28 +731,62 @@ DataFrame interoperability with NumPy functions
731731
.. _dsintro.numpy_interop:
732732

733733
Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions
734-
can be used with no issues on DataFrame, assuming the data within are numeric:
734+
can be used with no issues on Series and DataFrame, assuming the data within
735+
are numeric:
735736

736737
.. ipython:: python
737738
738739
np.exp(df)
739740
np.asarray(df)
740741
741-
The dot method on DataFrame implements matrix multiplication:
742+
DataFrame is not intended to be a drop-in replacement for ndarray as its
743+
indexing semantics and data model are quite different in places from an n-dimensional
744+
array.
745+
746+
:class:`Series` implements ``__array_ufunc__``, which allows it to work with NumPy's
747+
`universal functions <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>`_.
748+
749+
The ufunc is applied to the underlying array in a Series.
742750

743751
.. ipython:: python
744752
745-
df.T.dot(df)
753+
ser = pd.Series([1, 2, 3, 4])
754+
np.exp(ser)
746755
747-
Similarly, the dot method on Series implements dot product:
756+
Like other parts of the library, pandas will automatically align labeled inputs
757+
as part of a ufunc with multiple inputs. For example, using :meth:`numpy.remainder`
758+
on two :class:`Series` with differently ordered labels will align before the operation.
748759

749760
.. ipython:: python
750761
751-
s1 = pd.Series(np.arange(5, 10))
752-
s1.dot(s1)
762+
ser1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
763+
ser2 = pd.Series([1, 3, 5], index=['b', 'a', 'c'])
764+
ser1
765+
ser2
766+
np.remainder(ser1, ser2)
753767
754-
DataFrame is not intended to be a drop-in replacement for ndarray as its
755-
indexing semantics are quite different in places from a matrix.
768+
As usual, the union of the two indices is taken, and non-overlapping values are filled
769+
with missing values.
770+
771+
.. ipython:: python
772+
773+
ser3 = pd.Series([2, 4, 6], index=['b', 'c', 'd'])
774+
ser3
775+
np.remainder(ser1, ser3)
776+
777+
When a binary ufunc is applied to a :class:`Series` and :class:`Index`, the Series
778+
implementation takes precedence and a Series is returned.
779+
780+
.. ipython:: python
781+
782+
ser = pd.Series([1, 2, 3])
783+
idx = pd.Index([4, 5, 6])
784+
785+
np.maximum(ser, idx)
786+
787+
NumPy ufuncs are safe to apply to :class:`Series` backed by non-ndarray arrays,
788+
for example :class:`SparseArray` (see :ref:`sparse.calculation`). If possible,
789+
the ufunc is applied without converting the underlying data to an ndarray.
756790

757791
Console display
758792
~~~~~~~~~~~~~~~

doc/source/user_guide/computation.rst

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
Computational tools
66
===================
77

8+
89
Statistical functions
910
---------------------
1011

doc/source/whatsnew/v0.25.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -886,6 +886,7 @@ Sparse
886886
- Introduce a better error message in :meth:`Series.sparse.from_coo` so it returns a ``TypeError`` for inputs that are not coo matrices (:issue:`26554`)
887887
- Bug in :func:`numpy.modf` on a :class:`SparseArray`. Now a tuple of :class:`SparseArray` is returned (:issue:`26946`).
888888

889+
889890
Build Changes
890891
^^^^^^^^^^^^^
891892

@@ -896,6 +897,7 @@ ExtensionArray
896897

897898
- Bug in :func:`factorize` when passing an ``ExtensionArray`` with a custom ``na_sentinel`` (:issue:`25696`).
898899
- :meth:`Series.count` miscounts NA values in ExtensionArrays (:issue:`26835`)
900+
- Added ``Series.__array_ufunc__`` to better handle NumPy ufuncs applied to Series backed by extension arrays (:issue:`23293`).
899901
- Keyword argument ``deep`` has been removed from :meth:`ExtensionArray.copy` (:issue:`27083`)
900902

901903
Other

pandas/core/arrays/base.py

+11
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,17 @@ class ExtensionArray:
107107
attributes called ``.values`` or ``._values`` to ensure full compatibility
108108
with pandas internals. But other names as ``.data``, ``._data``,
109109
``._items``, ... can be freely used.
110+
111+
If implementing NumPy's ``__array_ufunc__`` interface, pandas expects
112+
that
113+
114+
1. You defer by raising ``NotImplemented`` when any Series are present
115+
in `inputs`. Pandas will extract the arrays and call the ufunc again.
116+
2. You define a ``_HANDLED_TYPES`` tuple as an attribute on the class.
117+
Pandas inspect this to determine whether the ufunc is valid for the
118+
types present.
119+
120+
See :ref:`extending.extension.ufunc` for more.
110121
"""
111122
# '_typ' is for pandas.core.dtypes.generic.ABCExtensionArray.
112123
# Don't override this.

pandas/core/arrays/categorical.py

+15
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
from pandas.core.dtypes.inference import is_hashable
2727
from pandas.core.dtypes.missing import isna, notna
2828

29+
from pandas.core import ops
2930
from pandas.core.accessor import PandasDelegate, delegate_names
3031
import pandas.core.algorithms as algorithms
3132
from pandas.core.algorithms import factorize, take, take_1d, unique1d
@@ -1292,6 +1293,20 @@ def __array__(self, dtype=None):
12921293
ret = np.asarray(ret)
12931294
return ret
12941295

1296+
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
1297+
# for binary ops, use our custom dunder methods
1298+
result = ops.maybe_dispatch_ufunc_to_dunder_op(
1299+
self, ufunc, method, *inputs, **kwargs)
1300+
if result is not NotImplemented:
1301+
return result
1302+
1303+
# for all other cases, raise for now (similarly as what happens in
1304+
# Series.__array_prepare__)
1305+
raise TypeError("Object with dtype {dtype} cannot perform "
1306+
"the numpy op {op}".format(
1307+
dtype=self.dtype,
1308+
op=ufunc.__name__))
1309+
12951310
def __setstate__(self, state):
12961311
"""Necessary for making this object picklable"""
12971312
if not isinstance(state, dict):

pandas/core/arrays/integer.py

+48-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import numbers
12
import sys
23
from typing import Type
34
import warnings
@@ -17,7 +18,7 @@
1718
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
1819
from pandas.core.dtypes.missing import isna, notna
1920

20-
from pandas.core import nanops
21+
from pandas.core import nanops, ops
2122
from pandas.core.arrays import ExtensionArray, ExtensionOpsMixin
2223
from pandas.core.tools.numeric import to_numeric
2324

@@ -344,6 +345,52 @@ def __array__(self, dtype=None):
344345
"""
345346
return self._coerce_to_ndarray()
346347

348+
_HANDLED_TYPES = (np.ndarray, numbers.Number)
349+
350+
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
351+
# For IntegerArray inputs, we apply the ufunc to ._data
352+
# and mask the result.
353+
if method == 'reduce':
354+
# Not clear how to handle missing values in reductions. Raise.
355+
raise NotImplementedError("The 'reduce' method is not supported.")
356+
out = kwargs.get('out', ())
357+
358+
for x in inputs + out:
359+
if not isinstance(x, self._HANDLED_TYPES + (IntegerArray,)):
360+
return NotImplemented
361+
362+
# for binary ops, use our custom dunder methods
363+
result = ops.maybe_dispatch_ufunc_to_dunder_op(
364+
self, ufunc, method, *inputs, **kwargs)
365+
if result is not NotImplemented:
366+
return result
367+
368+
mask = np.zeros(len(self), dtype=bool)
369+
inputs2 = []
370+
for x in inputs:
371+
if isinstance(x, IntegerArray):
372+
mask |= x._mask
373+
inputs2.append(x._data)
374+
else:
375+
inputs2.append(x)
376+
377+
def reconstruct(x):
378+
# we don't worry about scalar `x` here, since we
379+
# raise for reduce up above.
380+
381+
if is_integer_dtype(x.dtype):
382+
m = mask.copy()
383+
return IntegerArray(x, m)
384+
else:
385+
x[mask] = np.nan
386+
return x
387+
388+
result = getattr(ufunc, method)(*inputs2, **kwargs)
389+
if isinstance(result, tuple):
390+
tuple(reconstruct(x) for x in result)
391+
else:
392+
return reconstruct(result)
393+
347394
def __iter__(self):
348395
for i in range(len(self)):
349396
if self._mask[i]:

pandas/core/arrays/sparse.py

+6-36
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
from pandas.core.base import PandasObject
3939
import pandas.core.common as com
4040
from pandas.core.missing import interpolate_2d
41+
import pandas.core.ops as ops
4142

4243
import pandas.io.formats.printing as printing
4344

@@ -1665,42 +1666,11 @@ def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
16651666
if not isinstance(x, self._HANDLED_TYPES + (SparseArray,)):
16661667
return NotImplemented
16671668

1668-
special = {'add', 'sub', 'mul', 'pow', 'mod', 'floordiv', 'truediv',
1669-
'divmod', 'eq', 'ne', 'lt', 'gt', 'le', 'ge', 'remainder'}
1670-
aliases = {
1671-
'subtract': 'sub',
1672-
'multiply': 'mul',
1673-
'floor_divide': 'floordiv',
1674-
'true_divide': 'truediv',
1675-
'power': 'pow',
1676-
'remainder': 'mod',
1677-
'divide': 'div',
1678-
'equal': 'eq',
1679-
'not_equal': 'ne',
1680-
'less': 'lt',
1681-
'less_equal': 'le',
1682-
'greater': 'gt',
1683-
'greater_equal': 'ge',
1684-
}
1685-
1686-
flipped = {
1687-
'lt': '__gt__',
1688-
'le': '__ge__',
1689-
'gt': '__lt__',
1690-
'ge': '__le__',
1691-
'eq': '__eq__',
1692-
'ne': '__ne__',
1693-
}
1694-
1695-
op_name = ufunc.__name__
1696-
op_name = aliases.get(op_name, op_name)
1697-
1698-
if op_name in special and kwargs.get('out') is None:
1699-
if isinstance(inputs[0], type(self)):
1700-
return getattr(self, '__{}__'.format(op_name))(inputs[1])
1701-
else:
1702-
name = flipped.get(op_name, '__r{}__'.format(op_name))
1703-
return getattr(self, name)(inputs[0])
1669+
# for binary ops, use our custom dunder methods
1670+
result = ops.maybe_dispatch_ufunc_to_dunder_op(
1671+
self, ufunc, method, *inputs, **kwargs)
1672+
if result is not NotImplemented:
1673+
return result
17041674

17051675
if len(inputs) == 1:
17061676
# No alignment necessary.

0 commit comments

Comments
 (0)