Skip to content

Commit 69f4f96

Browse files
Backport PR #39239: DEPR: raise deprecation warning in numpy ufuncs on DataFrames if not aligned + fallback to <1.2.0 behaviour (#39288)
Co-authored-by: Joris Van den Bossche <[email protected]>
1 parent b4c0110 commit 69f4f96

File tree

4 files changed

+292
-15
lines changed

4 files changed

+292
-15
lines changed

doc/source/whatsnew/v1.2.0.rst

+10
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,8 @@ Other enhancements
286286
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`)
287287
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now preserves the extension types when possible (:issue:`23743`)
288288
- Calling a binary-input NumPy ufunc on multiple ``DataFrame`` objects now aligns, matching the behavior of binary operations and ufuncs on ``Series`` (:issue:`23743`).
289+
This change has been reverted in pandas 1.2.1, and the behaviour to not align DataFrames
290+
is deprecated instead, see the :ref:`the 1.2.1 release notes <whatsnew_121.ufunc_deprecation>`.
289291
- Where possible :meth:`RangeIndex.difference` and :meth:`RangeIndex.symmetric_difference` will return :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`36564`)
290292
- :meth:`DataFrame.to_parquet` now supports :class:`MultiIndex` for columns in parquet format (:issue:`34777`)
291293
- :func:`read_parquet` gained a ``use_nullable_dtypes=True`` option to use nullable dtypes that use ``pd.NA`` as missing value indicator where possible for the resulting DataFrame (default is ``False``, and only applicable for ``engine="pyarrow"``) (:issue:`31242`)
@@ -536,6 +538,14 @@ Deprecations
536538
- The ``inplace`` parameter of :meth:`Categorical.remove_unused_categories` is deprecated and will be removed in a future version (:issue:`37643`)
537539
- The ``null_counts`` parameter of :meth:`DataFrame.info` is deprecated and replaced by ``show_counts``. It will be removed in a future version (:issue:`37999`)
538540

541+
**Calling NumPy ufuncs on non-aligned DataFrames**
542+
543+
Calling NumPy ufuncs on non-aligned DataFrames changed behaviour in pandas
544+
1.2.0 (to align the inputs before calling the ufunc), but this change is
545+
reverted in pandas 1.2.1. The behaviour to not align is now deprecated instead,
546+
see the :ref:`the 1.2.1 release notes <whatsnew_121.ufunc_deprecation>` for
547+
more details.
548+
539549
.. ---------------------------------------------------------------------------
540550
541551

doc/source/whatsnew/v1.2.1.rst

+74-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _whatsnew_121:
22

3-
What's new in 1.2.1 (January 18, 2021)
3+
What's new in 1.2.1 (January 20, 2021)
44
--------------------------------------
55

66
These are the changes in pandas 1.2.1. See :ref:`release` for a full changelog
@@ -42,6 +42,79 @@ As a result, bugs reported as fixed in pandas 1.2.0 related to inconsistent tick
4242

4343
.. ---------------------------------------------------------------------------
4444
45+
.. _whatsnew_121.ufunc_deprecation:
46+
47+
Calling NumPy ufuncs on non-aligned DataFrames
48+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
49+
50+
Before pandas 1.2.0, calling a NumPy ufunc on non-aligned DataFrames (or
51+
DataFrame / Series combination) would ignore the indices, only match
52+
the inputs by shape, and use the index/columns of the first DataFrame for
53+
the result:
54+
55+
.. code-block:: python
56+
57+
>>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1])
58+
... df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2])
59+
>>> df1
60+
a b
61+
0 1 3
62+
1 2 4
63+
>>> df2
64+
a b
65+
1 1 3
66+
2 2 4
67+
68+
>>> np.add(df1, df2)
69+
a b
70+
0 2 6
71+
1 4 8
72+
73+
This contrasts with how other pandas operations work, which first align
74+
the inputs:
75+
76+
.. code-block:: python
77+
78+
>>> df1 + df2
79+
a b
80+
0 NaN NaN
81+
1 3.0 7.0
82+
2 NaN NaN
83+
84+
In pandas 1.2.0, we refactored how NumPy ufuncs are called on DataFrames, and
85+
this started to align the inputs first (:issue:`39184`), as happens in other
86+
pandas operations and as it happens for ufuncs called on Series objects.
87+
88+
For pandas 1.2.1, we restored the previous behaviour to avoid a breaking
89+
change, but the above example of ``np.add(df1, df2)`` with non-aligned inputs
90+
will now to raise a warning, and a future pandas 2.0 release will start
91+
aligning the inputs first (:issue:`39184`). Calling a NumPy ufunc on Series
92+
objects (eg ``np.add(s1, s2)``) already aligns and continues to do so.
93+
94+
To avoid the warning and keep the current behaviour of ignoring the indices,
95+
convert one of the arguments to a NumPy array:
96+
97+
.. code-block:: python
98+
99+
>>> np.add(df1, np.asarray(df2))
100+
a b
101+
0 2 6
102+
1 4 8
103+
104+
To obtain the future behaviour and silence the warning, you can align manually
105+
before passing the arguments to the ufunc:
106+
107+
.. code-block:: python
108+
109+
>>> df1, df2 = df1.align(df2)
110+
>>> np.add(df1, df2)
111+
a b
112+
0 NaN NaN
113+
1 3.0 7.0
114+
2 NaN NaN
115+
116+
.. ---------------------------------------------------------------------------
117+
45118
.. _whatsnew_121.bug_fixes:
46119

47120
Bug fixes

pandas/core/arraylike.py

+84
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,85 @@ def __rpow__(self, other):
149149
return self._arith_method(other, roperator.rpow)
150150

151151

152+
# -----------------------------------------------------------------------------
153+
# Helpers to implement __array_ufunc__
154+
155+
156+
def _is_aligned(frame, other):
157+
"""
158+
Helper to check if a DataFrame is aligned with another DataFrame or Series.
159+
"""
160+
from pandas import DataFrame
161+
162+
if isinstance(other, DataFrame):
163+
return frame._indexed_same(other)
164+
else:
165+
# Series -> match index
166+
return frame.columns.equals(other.index)
167+
168+
169+
def _maybe_fallback(ufunc: Callable, method: str, *inputs: Any, **kwargs: Any):
170+
"""
171+
In the future DataFrame, inputs to ufuncs will be aligned before applying
172+
the ufunc, but for now we ignore the index but raise a warning if behaviour
173+
would change in the future.
174+
This helper detects the case where a warning is needed and then fallbacks
175+
to applying the ufunc on arrays to avoid alignment.
176+
177+
See https://github.com/pandas-dev/pandas/pull/39239
178+
"""
179+
from pandas import DataFrame
180+
from pandas.core.generic import NDFrame
181+
182+
n_alignable = sum(isinstance(x, NDFrame) for x in inputs)
183+
n_frames = sum(isinstance(x, DataFrame) for x in inputs)
184+
185+
if n_alignable >= 2 and n_frames >= 1:
186+
# if there are 2 alignable inputs (Series or DataFrame), of which at least 1
187+
# is a DataFrame -> we would have had no alignment before -> warn that this
188+
# will align in the future
189+
190+
# the first frame is what determines the output index/columns in pandas < 1.2
191+
first_frame = next(x for x in inputs if isinstance(x, DataFrame))
192+
193+
# check if the objects are aligned or not
194+
non_aligned = sum(
195+
not _is_aligned(first_frame, x) for x in inputs if isinstance(x, NDFrame)
196+
)
197+
198+
# if at least one is not aligned -> warn and fallback to array behaviour
199+
if non_aligned:
200+
warnings.warn(
201+
"Calling a ufunc on non-aligned DataFrames (or DataFrame/Series "
202+
"combination). Currently, the indices are ignored and the result "
203+
"takes the index/columns of the first DataFrame. In the future , "
204+
"the DataFrames/Series will be aligned before applying the ufunc.\n"
205+
"Convert one of the arguments to a NumPy array "
206+
"(eg 'ufunc(df1, np.asarray(df2)') to keep the current behaviour, "
207+
"or align manually (eg 'df1, df2 = df1.align(df2)') before passing to "
208+
"the ufunc to obtain the future behaviour and silence this warning.",
209+
FutureWarning,
210+
stacklevel=4,
211+
)
212+
213+
# keep the first dataframe of the inputs, other DataFrame/Series is
214+
# converted to array for fallback behaviour
215+
new_inputs = []
216+
for x in inputs:
217+
if x is first_frame:
218+
new_inputs.append(x)
219+
elif isinstance(x, NDFrame):
220+
new_inputs.append(np.asarray(x))
221+
else:
222+
new_inputs.append(x)
223+
224+
# call the ufunc on those transformed inputs
225+
return getattr(ufunc, method)(*new_inputs, **kwargs)
226+
227+
# signal that we didn't fallback / execute the ufunc yet
228+
return NotImplemented
229+
230+
152231
def array_ufunc(self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any):
153232
"""
154233
Compatibility with numpy ufuncs.
@@ -162,6 +241,11 @@ def array_ufunc(self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any)
162241

163242
cls = type(self)
164243

244+
# for backwards compatibility check and potentially fallback for non-aligned frames
245+
result = _maybe_fallback(ufunc, method, *inputs, **kwargs)
246+
if result is not NotImplemented:
247+
return result
248+
165249
# for binary ops, use our custom dunder methods
166250
result = maybe_dispatch_ufunc_to_dunder_op(self, ufunc, method, *inputs, **kwargs)
167251
if result is not NotImplemented:

pandas/tests/frame/test_ufunc.py

+124-14
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
import numpy as np
22
import pytest
33

4+
import pandas.util._test_decorators as td
5+
46
import pandas as pd
57
import pandas._testing as tm
68

@@ -70,12 +72,19 @@ def test_binary_input_aligns_columns(dtype_a, dtype_b):
7072
dtype_b["C"] = dtype_b.pop("B")
7173

7274
df2 = pd.DataFrame({"A": [1, 2], "C": [3, 4]}).astype(dtype_b)
73-
result = np.heaviside(df1, df2)
74-
expected = np.heaviside(
75-
np.array([[1, 3, np.nan], [2, 4, np.nan]]),
76-
np.array([[1, np.nan, 3], [2, np.nan, 4]]),
77-
)
78-
expected = pd.DataFrame(expected, index=[0, 1], columns=["A", "B", "C"])
75+
with tm.assert_produces_warning(FutureWarning):
76+
result = np.heaviside(df1, df2)
77+
# Expected future behaviour:
78+
# expected = np.heaviside(
79+
# np.array([[1, 3, np.nan], [2, 4, np.nan]]),
80+
# np.array([[1, np.nan, 3], [2, np.nan, 4]]),
81+
# )
82+
# expected = pd.DataFrame(expected, index=[0, 1], columns=["A", "B", "C"])
83+
expected = pd.DataFrame([[1.0, 1.0], [1.0, 1.0]], columns=["A", "B"])
84+
tm.assert_frame_equal(result, expected)
85+
86+
# ensure the expected is the same when applying with numpy array
87+
result = np.heaviside(df1, df2.values)
7988
tm.assert_frame_equal(result, expected)
8089

8190

@@ -85,23 +94,35 @@ def test_binary_input_aligns_index(dtype):
8594
pytest.xfail(reason="Extension / mixed with multiple inputs not implemented.")
8695
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=["a", "b"]).astype(dtype)
8796
df2 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=["a", "c"]).astype(dtype)
88-
result = np.heaviside(df1, df2)
89-
expected = np.heaviside(
90-
np.array([[1, 3], [3, 4], [np.nan, np.nan]]),
91-
np.array([[1, 3], [np.nan, np.nan], [3, 4]]),
97+
with tm.assert_produces_warning(FutureWarning):
98+
result = np.heaviside(df1, df2)
99+
# Expected future behaviour:
100+
# expected = np.heaviside(
101+
# np.array([[1, 3], [3, 4], [np.nan, np.nan]]),
102+
# np.array([[1, 3], [np.nan, np.nan], [3, 4]]),
103+
# )
104+
# # TODO(FloatArray): this will be Float64Dtype.
105+
# expected = pd.DataFrame(expected, index=["a", "b", "c"], columns=["A", "B"])
106+
expected = pd.DataFrame(
107+
[[1.0, 1.0], [1.0, 1.0]], columns=["A", "B"], index=["a", "b"]
92108
)
93-
# TODO(FloatArray): this will be Float64Dtype.
94-
expected = pd.DataFrame(expected, index=["a", "b", "c"], columns=["A", "B"])
109+
tm.assert_frame_equal(result, expected)
110+
111+
# ensure the expected is the same when applying with numpy array
112+
result = np.heaviside(df1, df2.values)
95113
tm.assert_frame_equal(result, expected)
96114

97115

116+
@pytest.mark.filterwarnings("ignore:Calling a ufunc on non-aligned:FutureWarning")
98117
def test_binary_frame_series_raises():
99118
# We don't currently implement
100119
df = pd.DataFrame({"A": [1, 2]})
101-
with pytest.raises(NotImplementedError, match="logaddexp"):
120+
# with pytest.raises(NotImplementedError, match="logaddexp"):
121+
with pytest.raises(ValueError, match=""):
102122
np.logaddexp(df, df["A"])
103123

104-
with pytest.raises(NotImplementedError, match="logaddexp"):
124+
# with pytest.raises(NotImplementedError, match="logaddexp"):
125+
with pytest.raises(ValueError, match=""):
105126
np.logaddexp(df["A"], df)
106127

107128

@@ -130,3 +151,92 @@ def test_frame_outer_deprecated():
130151
df = pd.DataFrame({"A": [1, 2]})
131152
with tm.assert_produces_warning(FutureWarning):
132153
np.subtract.outer(df, df)
154+
155+
156+
def test_alignment_deprecation():
157+
# https://github.com/pandas-dev/pandas/issues/39184
158+
df1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
159+
df2 = pd.DataFrame({"b": [1, 2, 3], "c": [4, 5, 6]})
160+
s1 = pd.Series([1, 2], index=["a", "b"])
161+
s2 = pd.Series([1, 2], index=["b", "c"])
162+
163+
# binary dataframe / dataframe
164+
expected = pd.DataFrame({"a": [2, 4, 6], "b": [8, 10, 12]})
165+
166+
with tm.assert_produces_warning(None):
167+
# aligned -> no warning!
168+
result = np.add(df1, df1)
169+
tm.assert_frame_equal(result, expected)
170+
171+
with tm.assert_produces_warning(FutureWarning):
172+
# non-aligned -> warns
173+
result = np.add(df1, df2)
174+
tm.assert_frame_equal(result, expected)
175+
176+
result = np.add(df1, df2.values)
177+
tm.assert_frame_equal(result, expected)
178+
179+
result = np.add(df1.values, df2)
180+
expected = pd.DataFrame({"b": [2, 4, 6], "c": [8, 10, 12]})
181+
tm.assert_frame_equal(result, expected)
182+
183+
# binary dataframe / series
184+
expected = pd.DataFrame({"a": [2, 3, 4], "b": [6, 7, 8]})
185+
186+
with tm.assert_produces_warning(None):
187+
# aligned -> no warning!
188+
result = np.add(df1, s1)
189+
tm.assert_frame_equal(result, expected)
190+
191+
with tm.assert_produces_warning(FutureWarning):
192+
result = np.add(df1, s2)
193+
tm.assert_frame_equal(result, expected)
194+
195+
with tm.assert_produces_warning(FutureWarning):
196+
result = np.add(s2, df1)
197+
tm.assert_frame_equal(result, expected)
198+
199+
result = np.add(df1, s2.values)
200+
tm.assert_frame_equal(result, expected)
201+
202+
203+
@td.skip_if_no("numba", "0.46.0")
204+
def test_alignment_deprecation_many_inputs():
205+
# https://github.com/pandas-dev/pandas/issues/39184
206+
# test that the deprecation also works with > 2 inputs -> using a numba
207+
# written ufunc for this because numpy itself doesn't have such ufuncs
208+
from numba import float64, vectorize
209+
210+
@vectorize([float64(float64, float64, float64)])
211+
def my_ufunc(x, y, z):
212+
return x + y + z
213+
214+
df1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
215+
df2 = pd.DataFrame({"b": [1, 2, 3], "c": [4, 5, 6]})
216+
df3 = pd.DataFrame({"a": [1, 2, 3], "c": [4, 5, 6]})
217+
218+
with tm.assert_produces_warning(FutureWarning):
219+
result = my_ufunc(df1, df2, df3)
220+
expected = pd.DataFrame([[3.0, 12.0], [6.0, 15.0], [9.0, 18.0]], columns=["a", "b"])
221+
tm.assert_frame_equal(result, expected)
222+
223+
# all aligned -> no warning
224+
with tm.assert_produces_warning(None):
225+
result = my_ufunc(df1, df1, df1)
226+
tm.assert_frame_equal(result, expected)
227+
228+
# mixed frame / arrays
229+
with tm.assert_produces_warning(FutureWarning):
230+
result = my_ufunc(df1, df2, df3.values)
231+
tm.assert_frame_equal(result, expected)
232+
233+
# single frame -> no warning
234+
with tm.assert_produces_warning(None):
235+
result = my_ufunc(df1, df2.values, df3.values)
236+
tm.assert_frame_equal(result, expected)
237+
238+
# takes indices of first frame
239+
with tm.assert_produces_warning(FutureWarning):
240+
result = my_ufunc(df1.values, df2, df3)
241+
expected = expected.set_axis(["b", "c"], axis=1)
242+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)