Skip to content

BUG: skipna parameter in series.any() returns wrong result #23109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kaybhutani opened this issue Oct 12, 2018 · 6 comments · Fixed by #24069
Closed

BUG: skipna parameter in series.any() returns wrong result #23109

kaybhutani opened this issue Oct 12, 2018 · 6 comments · Fixed by #24069

Comments

@kaybhutani
Copy link

#importing pandas module 
import pandas as pd 

#importing numpy module
import numpy as np

data=pd.DataFrame({'A':[1,2,3,4,0,np.nan,3],
                  'B':[3,1,4,5,0,np.nan,5]})

data.any(axis=1,skipna=True)

Expected output:
0 True
1 True
2 True
3 True
4 False
5 True
6 True
dtype: bool

Returned output:

0 True
1 True
2 True
3 True
4 False
5 False
6 True
dtype: bool

As written in documentation, If an entire row/column is NA, the result will be NA
But NA isn't returned in any of the cases (Keeping skipna True or False)

@TomAugspurger
Copy link
Contributor

As written in documentation, If an entire row/column is NA, the result will be NA

I think the docs at

pandas/pandas/core/generic.py

Lines 9728 to 9730 in 12a0dc4

skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result
will be NA.
are incorrect. Skipna should be the same as the operation on the values with NAs removed (is that right @jorisvandenbossche?).

@kaybhutani
Copy link
Author

Isn't this a better way? ↓
If skipana is None which is default then it returns NA on whole NaN rows/column
If True/False then return True/False respectively for whole NA

Because if the docs are incorrect then there is probably no way to return NA for Null values

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 12, 2018 via email

@dsaxton
Copy link
Member

dsaxton commented Oct 14, 2018

My opinion: I think the problem is the documentation; the result is actually correct. If you ask if any of an empty set of statements is True, the answer is no. This is consistent with numpy:

In [1]: import numpy as np

In [2]: np.any([])
Out[2]: False

@jorisvandenbossche
Copy link
Member

Skipna should be the same as the operation on the values with NAs removed (is that right @jorisvandenbossche?).

I suppose this as well. any/all can be seen as reductions like sum or prod, so we should probably follow their design.

So I think @dsaxton is right that it is only the documentation that is incorrect.

@dsaxton
Copy link
Member

dsaxton commented Oct 17, 2018

@jorisvandenbossche What would you say is the appropriate fix for this? If the documentation is a general statement about the skipna parameter, maybe it makes sense to just remove the claim that the result will be NA (since it's not true for any, but presumably would be true in other contexts)?

@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Dec 2, 2018
jamesmyatt added a commit to jamesmyatt/pandas that referenced this issue Dec 3, 2018
… and data are all NA (pandas-dev#23109)

Include examples with NA values and describe treatement of NA with `skipna == False`
jamesmyatt added a commit to jamesmyatt/pandas that referenced this issue Dec 3, 2018
…#23109)

Also include examples with NA values and clarify treatment of NA with `skipna == False`
jamesmyatt added a commit to jamesmyatt/pandas that referenced this issue Dec 3, 2018
…#23109)

Also include examples with NA values and clarify treatment of NA with `skipna == False`
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Dec 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants