Skip to content

BUG: numpy functions (eg, np.add) on DataFrames with 'out' parameter no longer work properly #40662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
quant5 opened this issue Mar 28, 2021 · 8 comments · Fixed by #40878
Closed
2 of 3 tasks
Labels
Bug Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@quant5
Copy link

quant5 commented Mar 28, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

# works
foo = np.array([1,2,3,4]).reshape(2,2)
bar = np.zeros_like(foo)
np.add(foo, 1, out=bar)
print(bar)

# doesn't work. bar remains an array of zeros
foo = np.array([1,2,3,4]).reshape(2,2)
df = pd.DataFrame(foo)
bar = np.zeros_like(df)
np.add(df, 1, out=bar)
print(bar)

Problem description

Many numpy functions accept an out parameter as a way of storing the result. In the current pandas version (1.2.3), when the numpy function takes in a DataFrame as input, it no longer stores the result to the out array.

In pandas 1.0.1 (and earlier, on my other machine), the behavior works as expected. (see Expected Output). I am using numpy==1.19 for both machines.

Curiously, note that this problem is not seen for pd.Series. The below code works fine:

foo = np.array([1,2,3,4])
series = pd.Series(foo)
bar = np.zeros_like(series)
np.add(series, 1, out=bar)
print(bar)

One workaround is to instead use np.add(df.values, 1, out=bar) but I am using several packages that require the older behavior.

Expected Output

>>> print(bar)

[[2 3]
 [4 5]]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2c8480
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.1.1.el7.x86_64
Version : #1 SMP Fri Sep 13 22:55:44 UTC 2019
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.2.3
numpy : 1.19.5
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.4.0
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : 0.8.2
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.13.0
pyxlsb : None
s3fs : 0.5.1
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.1

@quant5 quant5 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 28, 2021
@quant5
Copy link
Author

quant5 commented Mar 28, 2021

Suggest adding the [Regression] label (appears I am unable to do so).

@attack68
Copy link
Contributor

I confirm this is a bug on master.

I think the issue is in pandas/core/arraylike.py

Changing lines 350-362 from:

    if self.ndim > 1 and (len(inputs) > 1 or ufunc.nout > 1):
        inputs = tuple(np.asarray(x) for x in inputs)
        result = getattr(ufunc, method)(*inputs)
    elif self.ndim == 1:
        inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
        result = getattr(ufunc, method)(*inputs, **kwargs)

to:

    if self.ndim > 1 and (len(inputs) > 1 or ufunc.nout > 1):
        inputs = tuple(np.asarray(x) for x in inputs)
        result = getattr(ufunc, method)(*inputs, **kwargs)   ## <--- NOTICE KWARGS
    elif self.ndim == 1:
        inputs = tuple(extract_array(x, extract_numpy=True) for x in inputs)
        result = getattr(ufunc, method)(*inputs, **kwargs)

seems to fix this problem.

@attack68 attack68 added Regression Functionality that used to work in a prior pandas version Internals Related to non-user accessible pandas implementation and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 28, 2021
@rhshadrach rhshadrach added this to the 1.2.4 milestone Mar 28, 2021
@fenguoerbian
Copy link

Just ran into this bug when using empyrical with DataFrame data. Hope to see the fix in the next release!

@jorisvandenbossche
Copy link
Member

@attack68 that seems like the correct fix, thanks for the investigation! Want to do a PR for that? (then we can include it for 1.2.4)

@simonjayhawkins
Copy link
Member

The change in 1.2 that maybe the cause the regression (I'll check shortly) is that np.add(df, 1, out=bar) now returns a DataFrame, whereas in 1.1.5 it returned a numpy array.

In 1.1.5, np.add(df, 1) without specifying out returned a DataFrame.

so it maybe that bar should only be allowed to be a DataFrame? (although that doesn't currently work either)

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Apr 11, 2021
@mzeitlin11
Copy link
Member

I think at least for the sake of consistency, passing kwargs for both the 1d and 2d case is an improvement. I guess the return value in a case like this might be a more complicated issue (but just passing **kwargs doesn't change that behavior)

@simonjayhawkins
Copy link
Member

In pandas 1.0.1 (and earlier, on my other machine), the behavior works as expected. (see Expected Output). I am using numpy==1.19 for both machines.

first bad commit: [c6eb725] Implement DataFrame.array_ufunc (#36955)

@jreback jreback modified the milestones: 1.2.4, 1.2.5 Apr 12, 2021
@jorisvandenbossche jorisvandenbossche modified the milestones: 1.2.5, 1.2.4 Apr 12, 2021
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 12, 2021

The change in 1.2 that maybe the cause the regression (I'll check shortly) is that np.add(df, 1, out=bar) now returns a DataFrame, whereas in 1.1.5 it returned a numpy array.

As mentioned on https://github.com/pandas-dev/pandas/pull/40878/files#r611582186, the reason that it started to return a DataFrame, is exactly because out was ignored, so it's a side effect of the actual bug/regression. (EDIT: was testing in the wrong env, those are not related (you can both return a DataFrame and update the out ndarray), so ignore my previous comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants