Skip to content

nanmin no longer works with series #8383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rstoneback opened this issue Sep 24, 2014 · 25 comments · Fixed by #19753
Closed

nanmin no longer works with series #8383

rstoneback opened this issue Sep 24, 2014 · 25 comments · Fixed by #19753
Labels
API Design Compat pandas objects compatability with Numpy or Python functions
Milestone

Comments

@rstoneback
Copy link

Using NumPy's nanmin on a pandas Series worked as expected, now however the Series is returned with each element set to the minimum, rather than just getting the minimum. Not sure if this is a Numpy of Pandas issue.

np.nanmin(pds.Series([1,2,3,4]))
Out[14]: 
0    1
1    1
2    1
3    1
dtype: int64

np.min(pds.Series([1,2,3,4]))
Out[15]: 1
@jreback
Copy link
Contributor

jreback commented Sep 24, 2014

its a numpy issue I think. It doesn't respect an array-like but non-ndarray subclass.

You can use s.min() is better/faster anyhow
or np.nanmin(s.values)

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Usage Question labels Sep 24, 2014
@jreback
Copy link
Contributor

jreback commented Sep 24, 2014

you should always post the versions of python/pandas/numpy you are using FYI

closing as a numpy issue

@jreback jreback closed this as completed Sep 24, 2014
@tacaswell
Copy link
Contributor

cross posted as numpy/numpy#5114

I would argue that this is a pandas issue as it was changes on this side that broke things, not changes on the numpy side.

@jreback
Copy link
Contributor

jreback commented Sep 24, 2014

@tacaswell disagree, if np.min acts one way and np.nanmin acts another?

This has to do with how numpy is or IS NOT following ndarray-like objects, e.g. calling (or not) __array_wrap__

@rstoneback
Copy link
Author

I reported the same issue to numpy and the immediate response was that it is likely a pandas issue.

On Sep 24, 2014, at 3:14 PM, jreback <[email protected]mailto:[email protected]> wrote:

you should always post the versions of python/pandas/numpy you are using FYI

closing as a numpy issue


Reply to this email directly or view it on GitHubhttps://github.com//issues/8383#issuecomment-56730533.

@jreback
Copy link
Contributor

jreback commented Sep 24, 2014

@rstoneback I know, read my comments. It is a numpy abusing the API issue. We didnt' change anything. They added the function and it has a different behavior/API than other functions.

In any event, you should simply use .min() and pass skipna=True if you don't want to skip NA's (the default is skipna=False).

@rstoneback
Copy link
Author

My last comment about reporting the issue to numpy didn't post for some time (I replied to github email). I've switched to using the pandas functions.

@rstoneback
Copy link
Author

Using the series min function works great. Thanks for the quick responses.

Cheers.

@dalejung
Copy link
Contributor

np.min works because it implicitly calls Series.min. If you remove the pd.Series.min you get the same behavior as nanmin.

s = pd.Series(range(10))
np.nanmin(s)

In the above example, __array_wrap__ actually gets a 0 ndim np scalar but we elongate it by maintaining index. I guess we're supposed to have a check for scalar types in __array_wrap__?

http://nbviewer.ipython.org/gist/dalejung/56b73c3cfd71b02ec414 also an ndarray subclass example that replicates the same issue we're having.

@jreback
Copy link
Contributor

jreback commented Sep 25, 2014

you can try changing array_wrap to not wrap scalars
prob easier just to define nanmin = min

@dalejung
Copy link
Contributor

Yeah, I'm gonna see about not wrapping scalars. Want to poke around and figure out the history on why a.min() exists.

@jreback
Copy link
Contributor

jreback commented Sep 26, 2014

@dalejung if you think best soln here is to simply add nanmim=min or change __array_wrap__, pls ping and we can reopen this issue (or if not, can defer to numpy to 'fix')

@jreback
Copy link
Contributor

jreback commented Sep 26, 2014

actually, will reopen and leave for 0.15.1

@jreback jreback reopened this Sep 26, 2014
@jreback jreback added this to the 0.15.1 milestone Sep 26, 2014
@dalejung
Copy link
Contributor

dalejung commented Oct 8, 2014

@jreback When you say nanmin=min what do you mean? I don't think np.nanmin == obj.nanmin if that's what you meant.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

numpy does this check on array-likes, something like:

if getattr(arr,'min'):
    return arr.min()

(and nanmin does this too)
so if you define nanmin = min then it will get called automatically when you do
np.nanmin(series) just like np.min(series) does now

@dalejung
Copy link
Contributor

dalejung commented Oct 8, 2014

@jreback check out http://nbviewer.ipython.org/gist/dalejung/1c378f7d2b149e2f3a40

I don't think that behavior is uniform, or at least I'm not sure where in the numpy code obj.namin gets called.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

yeh it might that their implementation is messed up too. I don't understand why they do it for min (and other funcs) in the first place anyhow.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

@dalejung you might be able to intercept in (the return values) __array_wrap__ and NOT wrap it if it returns a scalar, though I think we'd need some testing on this.

@dalejung
Copy link
Contributor

dalejung commented Oct 8, 2014

@jreback yeah I agree. If we could've opted out of the np.sum(obj) == obj.sum() that would've been great. Even with the __array_wrap__ fix you get this weird API mismatch because obj.sum() sums across the stat axis and np.sum(obj.values) gives you a scalar regardless of the ndim. You'd expect np.nansum(df) == np.sum(df).

either way, the __array_wrap__ stuff should fix at least a subset of issues. The overall api stuff would still be open. not sure if it'll be resolvable in the near future

@tacaswell
Copy link
Contributor

This is possibly related to the keepdims kwarg (numpy/numpy#4619).

Don't have time right now to run down if it is actually related, just dropping bread crumbs.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2014

@tacaswell maybe you can explain why np.min (and friends), though NOT np.nanmin attempts to call the passed_object.min() (since it exists)?

I conceptually get it, looking more for what problem does this solve in numpy (e.g. why was it added).
And maybe why nanmin DOES NOT do this?

I guess this might boil down to sub-class vs ndarray-like ?

@topper-123
Copy link
Contributor

This issue has been solved:

>>> np.nanmin(pds.Series([1,2,3,4]))
1

I'm on numpy 1.13.1 and pandas 0.22.

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

yup. though its possibly only for certainly versions of numpy. @topper-123 can you add some tests and see?

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 10, 2018
@topper-123
Copy link
Contributor

This is solved only for numpy >= 1.13. Where would tests for this be placed?

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

test_nanops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants