Skip to content

Commit 05d82ac

Browse files
jorisvandenbosschenofarmish
authored andcommitted
DEPR: raise deprecation warning in numpy ufuncs on DataFrames if not aligned + fallback to <1.2.0 behaviour (pandas-dev#39239)
1 parent 85e4004 commit 05d82ac

File tree

4 files changed

+292
-15
lines changed

4 files changed

+292
-15
lines changed

doc/source/whatsnew/v1.2.0.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,8 @@ Other enhancements
286286
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`)
287287
- Calling a NumPy ufunc on a ``DataFrame`` with extension types now preserves the extension types when possible (:issue:`23743`)
288288
- Calling a binary-input NumPy ufunc on multiple ``DataFrame`` objects now aligns, matching the behavior of binary operations and ufuncs on ``Series`` (:issue:`23743`).
289+
This change has been reverted in pandas 1.2.1, and the behaviour to not align DataFrames
290+
is deprecated instead, see the :ref:`the 1.2.1 release notes <whatsnew_121.ufunc_deprecation>`.
289291
- Where possible :meth:`RangeIndex.difference` and :meth:`RangeIndex.symmetric_difference` will return :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`36564`)
290292
- :meth:`DataFrame.to_parquet` now supports :class:`MultiIndex` for columns in parquet format (:issue:`34777`)
291293
- :func:`read_parquet` gained a ``use_nullable_dtypes=True`` option to use nullable dtypes that use ``pd.NA`` as missing value indicator where possible for the resulting DataFrame (default is ``False``, and only applicable for ``engine="pyarrow"``) (:issue:`31242`)
@@ -536,6 +538,14 @@ Deprecations
536538
- The ``inplace`` parameter of :meth:`Categorical.remove_unused_categories` is deprecated and will be removed in a future version (:issue:`37643`)
537539
- The ``null_counts`` parameter of :meth:`DataFrame.info` is deprecated and replaced by ``show_counts``. It will be removed in a future version (:issue:`37999`)
538540

541+
**Calling NumPy ufuncs on non-aligned DataFrames**
542+
543+
Calling NumPy ufuncs on non-aligned DataFrames changed behaviour in pandas
544+
1.2.0 (to align the inputs before calling the ufunc), but this change is
545+
reverted in pandas 1.2.1. The behaviour to not align is now deprecated instead,
546+
see the :ref:`the 1.2.1 release notes <whatsnew_121.ufunc_deprecation>` for
547+
more details.
548+
539549
.. ---------------------------------------------------------------------------
540550
541551

doc/source/whatsnew/v1.2.1.rst

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
.. _whatsnew_121:
22

3-
What's new in 1.2.1 (January 18, 2021)
3+
What's new in 1.2.1 (January 20, 2021)
44
--------------------------------------
55

66
These are the changes in pandas 1.2.1. See :ref:`release` for a full changelog
@@ -42,6 +42,79 @@ As a result, bugs reported as fixed in pandas 1.2.0 related to inconsistent tick
4242

4343
.. ---------------------------------------------------------------------------
4444
45+
.. _whatsnew_121.ufunc_deprecation:
46+
47+
Calling NumPy ufuncs on non-aligned DataFrames
48+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
49+
50+
Before pandas 1.2.0, calling a NumPy ufunc on non-aligned DataFrames (or
51+
DataFrame / Series combination) would ignore the indices, only match
52+
the inputs by shape, and use the index/columns of the first DataFrame for
53+
the result:
54+
55+
.. code-block:: python
56+
57+
>>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1])
58+
... df2 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2])
59+
>>> df1
60+
a b
61+
0 1 3
62+
1 2 4
63+
>>> df2
64+
a b
65+
1 1 3
66+
2 2 4
67+
68+
>>> np.add(df1, df2)
69+
a b
70+
0 2 6
71+
1 4 8
72+
73+
This contrasts with how other pandas operations work, which first align
74+
the inputs:
75+
76+
.. code-block:: python
77+
78+
>>> df1 + df2
79+
a b
80+
0 NaN NaN
81+
1 3.0 7.0
82+
2 NaN NaN
83+
84+
In pandas 1.2.0, we refactored how NumPy ufuncs are called on DataFrames, and
85+
this started to align the inputs first (:issue:`39184`), as happens in other
86+
pandas operations and as it happens for ufuncs called on Series objects.
87+
88+
For pandas 1.2.1, we restored the previous behaviour to avoid a breaking
89+
change, but the above example of ``np.add(df1, df2)`` with non-aligned inputs
90+
will now to raise a warning, and a future pandas 2.0 release will start
91+
aligning the inputs first (:issue:`39184`). Calling a NumPy ufunc on Series
92+
objects (eg ``np.add(s1, s2)``) already aligns and continues to do so.
93+
94+
To avoid the warning and keep the current behaviour of ignoring the indices,
95+
convert one of the arguments to a NumPy array:
96+
97+
.. code-block:: python
98+
99+
>>> np.add(df1, np.asarray(df2))
100+
a b
101+
0 2 6
102+
1 4 8
103+
104+
To obtain the future behaviour and silence the warning, you can align manually
105+
before passing the arguments to the ufunc:
106+
107+
.. code-block:: python
108+
109+
>>> df1, df2 = df1.align(df2)
110+
>>> np.add(df1, df2)
111+
a b
112+
0 NaN NaN
113+
1 3.0 7.0
114+
2 NaN NaN
115+
116+
.. ---------------------------------------------------------------------------
117+
45118
.. _whatsnew_121.bug_fixes:
46119

47120
Bug fixes

pandas/core/arraylike.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,85 @@ def __rpow__(self, other):
149149
return self._arith_method(other, roperator.rpow)
150150

151151

152+
# -----------------------------------------------------------------------------
153+
# Helpers to implement __array_ufunc__
154+
155+
156+
def _is_aligned(frame, other):
157+
"""
158+
Helper to check if a DataFrame is aligned with another DataFrame or Series.
159+
"""
160+
from pandas import DataFrame
161+
162+
if isinstance(other, DataFrame):
163+
return frame._indexed_same(other)
164+
else:
165+
# Series -> match index
166+
return frame.columns.equals(other.index)
167+
168+
169+
def _maybe_fallback(ufunc: Callable, method: str, *inputs: Any, **kwargs: Any):
170+
"""
171+
In the future DataFrame, inputs to ufuncs will be aligned before applying
172+
the ufunc, but for now we ignore the index but raise a warning if behaviour
173+
would change in the future.
174+
This helper detects the case where a warning is needed and then fallbacks
175+
to applying the ufunc on arrays to avoid alignment.
176+
177+
See https://github.com/pandas-dev/pandas/pull/39239
178+
"""
179+
from pandas import DataFrame
180+
from pandas.core.generic import NDFrame
181+
182+
n_alignable = sum(isinstance(x, NDFrame) for x in inputs)
183+
n_frames = sum(isinstance(x, DataFrame) for x in inputs)
184+
185+
if n_alignable >= 2 and n_frames >= 1:
186+
# if there are 2 alignable inputs (Series or DataFrame), of which at least 1
187+
# is a DataFrame -> we would have had no alignment before -> warn that this
188+
# will align in the future
189+
190+
# the first frame is what determines the output index/columns in pandas < 1.2
191+
first_frame = next(x for x in inputs if isinstance(x, DataFrame))
192+
193+
# check if the objects are aligned or not
194+
non_aligned = sum(
195+
not _is_aligned(first_frame, x) for x in inputs if isinstance(x, NDFrame)
196+
)
197+
198+
# if at least one is not aligned -> warn and fallback to array behaviour
199+
if non_aligned:
200+
warnings.warn(
201+
"Calling a ufunc on non-aligned DataFrames (or DataFrame/Series "
202+
"combination). Currently, the indices are ignored and the result "
203+
"takes the index/columns of the first DataFrame. In the future , "
204+
"the DataFrames/Series will be aligned before applying the ufunc.\n"
205+
"Convert one of the arguments to a NumPy array "
206+
"(eg 'ufunc(df1, np.asarray(df2)') to keep the current behaviour, "
207+
"or align manually (eg 'df1, df2 = df1.align(df2)') before passing to "
208+
"the ufunc to obtain the future behaviour and silence this warning.",
209+
FutureWarning,
210+
stacklevel=4,
211+
)
212+
213+
# keep the first dataframe of the inputs, other DataFrame/Series is
214+
# converted to array for fallback behaviour
215+
new_inputs = []
216+
for x in inputs:
217+
if x is first_frame:
218+
new_inputs.append(x)
219+
elif isinstance(x, NDFrame):
220+
new_inputs.append(np.asarray(x))
221+
else:
222+
new_inputs.append(x)
223+
224+
# call the ufunc on those transformed inputs
225+
return getattr(ufunc, method)(*new_inputs, **kwargs)
226+
227+
# signal that we didn't fallback / execute the ufunc yet
228+
return NotImplemented
229+
230+
152231
def array_ufunc(self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any):
153232
"""
154233
Compatibility with numpy ufuncs.
@@ -162,6 +241,11 @@ def array_ufunc(self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any)
162241

163242
cls = type(self)
164243

244+
# for backwards compatibility check and potentially fallback for non-aligned frames
245+
result = _maybe_fallback(ufunc, method, *inputs, **kwargs)
246+
if result is not NotImplemented:
247+
return result
248+
165249
# for binary ops, use our custom dunder methods
166250
result = maybe_dispatch_ufunc_to_dunder_op(self, ufunc, method, *inputs, **kwargs)
167251
if result is not NotImplemented:

pandas/tests/frame/test_ufunc.py

Lines changed: 124 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
import numpy as np
22
import pytest
33

4+
import pandas.util._test_decorators as td
5+
46
import pandas as pd
57
import pandas._testing as tm
68
from pandas.api.types import is_extension_array_dtype
@@ -79,12 +81,19 @@ def test_binary_input_aligns_columns(request, dtype_a, dtype_b):
7981
dtype_b["C"] = dtype_b.pop("B")
8082

8183
df2 = pd.DataFrame({"A": [1, 2], "C": [3, 4]}).astype(dtype_b)
82-
result = np.heaviside(df1, df2)
83-
expected = np.heaviside(
84-
np.array([[1, 3, np.nan], [2, 4, np.nan]]),
85-
np.array([[1, np.nan, 3], [2, np.nan, 4]]),
86-
)
87-
expected = pd.DataFrame(expected, index=[0, 1], columns=["A", "B", "C"])
84+
with tm.assert_produces_warning(FutureWarning):
85+
result = np.heaviside(df1, df2)
86+
# Expected future behaviour:
87+
# expected = np.heaviside(
88+
# np.array([[1, 3, np.nan], [2, 4, np.nan]]),
89+
# np.array([[1, np.nan, 3], [2, np.nan, 4]]),
90+
# )
91+
# expected = pd.DataFrame(expected, index=[0, 1], columns=["A", "B", "C"])
92+
expected = pd.DataFrame([[1.0, 1.0], [1.0, 1.0]], columns=["A", "B"])
93+
tm.assert_frame_equal(result, expected)
94+
95+
# ensure the expected is the same when applying with numpy array
96+
result = np.heaviside(df1, df2.values)
8897
tm.assert_frame_equal(result, expected)
8998

9099

@@ -98,23 +107,35 @@ def test_binary_input_aligns_index(request, dtype):
98107
)
99108
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=["a", "b"]).astype(dtype)
100109
df2 = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, index=["a", "c"]).astype(dtype)
101-
result = np.heaviside(df1, df2)
102-
expected = np.heaviside(
103-
np.array([[1, 3], [3, 4], [np.nan, np.nan]]),
104-
np.array([[1, 3], [np.nan, np.nan], [3, 4]]),
110+
with tm.assert_produces_warning(FutureWarning):
111+
result = np.heaviside(df1, df2)
112+
# Expected future behaviour:
113+
# expected = np.heaviside(
114+
# np.array([[1, 3], [3, 4], [np.nan, np.nan]]),
115+
# np.array([[1, 3], [np.nan, np.nan], [3, 4]]),
116+
# )
117+
# # TODO(FloatArray): this will be Float64Dtype.
118+
# expected = pd.DataFrame(expected, index=["a", "b", "c"], columns=["A", "B"])
119+
expected = pd.DataFrame(
120+
[[1.0, 1.0], [1.0, 1.0]], columns=["A", "B"], index=["a", "b"]
105121
)
106-
# TODO(FloatArray): this will be Float64Dtype.
107-
expected = pd.DataFrame(expected, index=["a", "b", "c"], columns=["A", "B"])
122+
tm.assert_frame_equal(result, expected)
123+
124+
# ensure the expected is the same when applying with numpy array
125+
result = np.heaviside(df1, df2.values)
108126
tm.assert_frame_equal(result, expected)
109127

110128

129+
@pytest.mark.filterwarnings("ignore:Calling a ufunc on non-aligned:FutureWarning")
111130
def test_binary_frame_series_raises():
112131
# We don't currently implement
113132
df = pd.DataFrame({"A": [1, 2]})
114-
with pytest.raises(NotImplementedError, match="logaddexp"):
133+
# with pytest.raises(NotImplementedError, match="logaddexp"):
134+
with pytest.raises(ValueError, match=""):
115135
np.logaddexp(df, df["A"])
116136

117-
with pytest.raises(NotImplementedError, match="logaddexp"):
137+
# with pytest.raises(NotImplementedError, match="logaddexp"):
138+
with pytest.raises(ValueError, match=""):
118139
np.logaddexp(df["A"], df)
119140

120141

@@ -143,3 +164,92 @@ def test_frame_outer_deprecated():
143164
df = pd.DataFrame({"A": [1, 2]})
144165
with tm.assert_produces_warning(FutureWarning):
145166
np.subtract.outer(df, df)
167+
168+
169+
def test_alignment_deprecation():
170+
# https://github.com/pandas-dev/pandas/issues/39184
171+
df1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
172+
df2 = pd.DataFrame({"b": [1, 2, 3], "c": [4, 5, 6]})
173+
s1 = pd.Series([1, 2], index=["a", "b"])
174+
s2 = pd.Series([1, 2], index=["b", "c"])
175+
176+
# binary dataframe / dataframe
177+
expected = pd.DataFrame({"a": [2, 4, 6], "b": [8, 10, 12]})
178+
179+
with tm.assert_produces_warning(None):
180+
# aligned -> no warning!
181+
result = np.add(df1, df1)
182+
tm.assert_frame_equal(result, expected)
183+
184+
with tm.assert_produces_warning(FutureWarning):
185+
# non-aligned -> warns
186+
result = np.add(df1, df2)
187+
tm.assert_frame_equal(result, expected)
188+
189+
result = np.add(df1, df2.values)
190+
tm.assert_frame_equal(result, expected)
191+
192+
result = np.add(df1.values, df2)
193+
expected = pd.DataFrame({"b": [2, 4, 6], "c": [8, 10, 12]})
194+
tm.assert_frame_equal(result, expected)
195+
196+
# binary dataframe / series
197+
expected = pd.DataFrame({"a": [2, 3, 4], "b": [6, 7, 8]})
198+
199+
with tm.assert_produces_warning(None):
200+
# aligned -> no warning!
201+
result = np.add(df1, s1)
202+
tm.assert_frame_equal(result, expected)
203+
204+
with tm.assert_produces_warning(FutureWarning):
205+
result = np.add(df1, s2)
206+
tm.assert_frame_equal(result, expected)
207+
208+
with tm.assert_produces_warning(FutureWarning):
209+
result = np.add(s2, df1)
210+
tm.assert_frame_equal(result, expected)
211+
212+
result = np.add(df1, s2.values)
213+
tm.assert_frame_equal(result, expected)
214+
215+
216+
@td.skip_if_no("numba", "0.46.0")
217+
def test_alignment_deprecation_many_inputs():
218+
# https://github.com/pandas-dev/pandas/issues/39184
219+
# test that the deprecation also works with > 2 inputs -> using a numba
220+
# written ufunc for this because numpy itself doesn't have such ufuncs
221+
from numba import float64, vectorize
222+
223+
@vectorize([float64(float64, float64, float64)])
224+
def my_ufunc(x, y, z):
225+
return x + y + z
226+
227+
df1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
228+
df2 = pd.DataFrame({"b": [1, 2, 3], "c": [4, 5, 6]})
229+
df3 = pd.DataFrame({"a": [1, 2, 3], "c": [4, 5, 6]})
230+
231+
with tm.assert_produces_warning(FutureWarning):
232+
result = my_ufunc(df1, df2, df3)
233+
expected = pd.DataFrame([[3.0, 12.0], [6.0, 15.0], [9.0, 18.0]], columns=["a", "b"])
234+
tm.assert_frame_equal(result, expected)
235+
236+
# all aligned -> no warning
237+
with tm.assert_produces_warning(None):
238+
result = my_ufunc(df1, df1, df1)
239+
tm.assert_frame_equal(result, expected)
240+
241+
# mixed frame / arrays
242+
with tm.assert_produces_warning(FutureWarning):
243+
result = my_ufunc(df1, df2, df3.values)
244+
tm.assert_frame_equal(result, expected)
245+
246+
# single frame -> no warning
247+
with tm.assert_produces_warning(None):
248+
result = my_ufunc(df1, df2.values, df3.values)
249+
tm.assert_frame_equal(result, expected)
250+
251+
# takes indices of first frame
252+
with tm.assert_produces_warning(FutureWarning):
253+
result = my_ufunc(df1.values, df2, df3)
254+
expected = expected.set_axis(["b", "c"], axis=1)
255+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)