Skip to content

comment on numpy/pandas interaction when calling amin() on a pandas DataFrame #7080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
paciorek opened this issue May 8, 2014 · 1 comment

Comments

@paciorek
Copy link

paciorek commented May 8, 2014

I noticed the following in tracking down a problem a user of ours encountered. Not sure this is a bug per se and with the caveat that I'm not a Python expert, but I thought I would mention it.

This is occurring with pandas 0.12.0 but given the nature of the issue is probably present in more recent versions.

If a user calls numpy's amin() [or amax()] on a pandas DataFrame, then within amin() the following lines of code will often be called

amin = a.min
amin(axis = axis, out = out)

The problem is that if the first argument to amin() is a pandas dataframe then the min method for 'a' inside amin() is the pandas dataframe min method, which does not take 'out' as an argument. This is for numpy 1.8.0.

The above issue was disguised somewhat in earlier numpy versions (e.g. 1.6.1) because 'out' was not a named argument, instead the line above was
amin(axis,out)
so in many cases the call to amin() would return a result.

Perhaps users shouldn't be calling amin() or amax() on pandas dataframes but nothing prevents them from doing so, resulting in what is a fairly subtle bug in their code and (at least in later numpy versions) an error message that requires a bit of investigation of the code in amin() to understand.

-Chris


Chris Paciorek

Statistical Computing Consultant
Statistical Computing Facility and Econometrics Laboratory

Office: 495 Evans Hall Email: [email protected]
Mailing Address: Voice: 510-842-6670
Department of Statistics Fax: 510-642-7892
367 Evans Hall Skype: cjpaciorek
University of California, Berkeley WWW: www.stat.berkeley.edu/~paciorek
Berkeley, CA 94720 USA Permanent forward: [email protected]

@jreback
Copy link
Contributor

jreback commented May 8, 2014

in numpy >= 1.8.0 numpy started passing additional kwargs to the called function (e.g. for the primitives min,max,sum, etc) if the called object supported it (e.g. series))

meaning

np.sum(s) ends up calling s.sum(**kwargs) (which is actually a good thing)

pandas 0.12. didn't handle this and would complain if numpy passed additional args.

numpy 1.8.0 came out in oct 2013; pandas 0.12 in july 2013. at the time we didn't have tests for this so wasn't caught.

This is fixed in 0.13.0 (0.13.1) is the current, and 0.14 coming shortly.

here was the original issue: #4435

@jreback jreback closed this as completed May 8, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants