Skip to content

REGR: ufunc with DataFrame input not passing all kwargs #40878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 12, 2021
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.where` not returning a copy in the case of an all True condition (:issue:`39595`)
- Fixed regression in :meth:`DataFrame.replace` raising ``IndexError`` when ``regex`` was a multi-key dictionary (:issue:`39338`)
- Fixed regression in repr of floats in an ``object`` column not respecting ``float_format`` when printed in the console or outputted through :meth:`DataFrame.to_string`, :meth:`DataFrame.to_html`, and :meth:`DataFrame.to_latex` (:issue:`40024`)
- Fixed regression in NumPy ufuncs such as ``np.add`` not passing through all arguments for :class:`DataFrame` (:issue:`40662`)

.. ---------------------------------------------------------------------------

Expand Down
6 changes: 4 additions & 2 deletions pandas/core/arraylike.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,15 +357,17 @@ def reconstruct(result):
# * len(inputs) > 1 is doable when we know that we have
# aligned blocks / dtypes.
inputs = tuple(np.asarray(x) for x in inputs)
result = getattr(ufunc, method)(*inputs)
result = getattr(ufunc, method)(*inputs, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the else clause is reached for a DataFrame when not (len(inputs) > 1 or ufunc.nout > 1).

The result = mgr.apply(getattr(ufunc, method)) there for __call__ also fails to pass along **kwargs

Is it straightforward to add to the paramterised here to exercise that path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think so...will let you know if I run into any issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test, but had to xfail because out is not written to correctly. The written result becomes transposed with the block-wise apply, I'm guessing because of the following relationship:

arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr)
print(df.values)
print(df._mgr.blocks[0].values)

gives

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]

Copy link
Member

@jorisvandenbossche jorisvandenbossche Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, basically when out is specified, I think we should simply not call mgr.apply, see #39275, #39260 for a recent similar case where certain additional keyword arguments cannot be handled on a block-by-block case.

elif self.ndim == 1:
# ufunc(series, ...)
inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
result = getattr(ufunc, method)(*inputs, **kwargs)
else:
# ufunc(dataframe)
if method == "__call__":
if method == "__call__" and not kwargs:
# for np.<ufunc>(..) calls
# kwargs cannot necessarily be handled block-by-block, so only
# take this path if there are no kwargs
mgr = inputs[0]._mgr
result = mgr.apply(getattr(ufunc, method))
else:
Expand Down
38 changes: 38 additions & 0 deletions pandas/tests/frame/test_ufunc.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from functools import partial

import numpy as np
import pytest

Expand Down Expand Up @@ -60,6 +62,42 @@ def test_binary_input_dispatch_binop(dtype):
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"func,arg,expected",
[
(np.add, 1, [2, 3, 4, 5]),
(
partial(np.add, where=[[False, True], [True, False]]),
np.array([[1, 1], [1, 1]]),
[0, 3, 4, 0],
),
(np.power, np.array([[1, 1], [2, 2]]), [1, 2, 9, 16]),
(np.subtract, 2, [-1, 0, 1, 2]),
(
partial(np.negative, where=np.array([[False, True], [True, False]])),
None,
[0, -2, -3, 0],
),
],
)
def test_ufunc_passes_args(func, arg, expected, request):
# GH#40662
arr = np.array([[1, 2], [3, 4]])
df = pd.DataFrame(arr)
result_inplace = np.zeros_like(arr)
# 1-argument ufunc
if arg is None:
result = func(df, out=result_inplace)
else:
result = func(df, arg, out=result_inplace)

expected = np.array(expected).reshape(2, 2)
tm.assert_numpy_array_equal(result_inplace, expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any documentation that out is actually a ndarray? this is a very strange result. At the very least document this, and let's open an issue. This should work with out=DataFrame, or simply raise (preferred)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. I thought this was strange. #40662 (comment)

should we defer to 1.2.5 to allow more discussion? or put 1.2.4 release on hold for a day or two?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah let's defer this. i dont' think this is the correct behavior and we should actually fix it (which may mean that we simply do this for 1.3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving to 1.2.5 (if we do it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work with out=DataFrame, or simply raise (preferred)

This does raise when passing a DataFrame to out (as numpy expects an array as out argument), but what is being tested here is the return value of the ufunc in case of passing an array to out. In that case, out is also being returned, and since out is an ndarray, the return value is also an ndarray.

(to me this seems correct behaviour, and thus I don't think this needs to hold up merging this for the release)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "terrible" behaviour is long-standing (and IMO correct) behaviour on which users rely.

how is this correct in any way? this was a bug from the original impl.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was not a bug, that's how numpy functions work: they coerce array-like input to arrays. So whether one of the arguments was a DataFrame vs an ndarray, did not have any effect on allowing an out argument or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fine. how about a follow to make this really clear (in docs) on master. I would however also deprecate / remove this as it not intuitive at all (e.g. we do not have an out argument anywhere else)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. we do not have an out argument anywhere else

This keyword is not in a pandas function or method, but in a NumPy function (which has out in many places)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know, but this is very confusing if someone is doing this (and doesnt; realize it).


expected = pd.DataFrame(expected)
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("dtype_a", dtypes)
@pytest.mark.parametrize("dtype_b", dtypes)
def test_binary_input_aligns_columns(request, dtype_a, dtype_b):
Expand Down