-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: nullables use Kleene logic for any/all reductions #41970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: nullables use Kleene logic for any/all reductions #41970
Conversation
|
||
def any(self, *, skipna: bool = True, **kwargs): | ||
""" | ||
Return whether any element is truthy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a versionchanged 1.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
): | ||
ser = Series(data, dtype="boolean") | ||
ser = Series(data, dtype="boolean").astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this astyped and not the dtype passed to the Series constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have changed the parameterization to avoid needing to do this
The reason was that something like Series([True, pd.NA, True], dtype="Int64")
raises. Though using just pure booleans doesn't raise, eg
Series([True, True, True], dtype="Int64")
is fine. Does this inconsistency make sense, or should I open an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this inconsistency make sense, or should I open an issue?
pd.Series([True, np.nan, True], dtype="Int64")
works fine, but pd.Series([True, None, True], dtype="Int64")
also raises TypeError: object cannot be converted to an IntegerDtype
There does seam to appear to be inconsistency here.
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
([False, pd.NA, False], [[pd.NA, False], [False, False]]), | ||
([True, pd.NA, True], [[True, pd.NA], [True, True]]), | ||
([True, pd.NA, False], [[True, False], [True, False]]), | ||
([0, 0, 0], [[0, 0], [0, 0]]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do these return ints rather than True/False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that was a mistake - it actually returns bools, test just happened to work because the reductions return a scalar and 1 == True, etc
lgtm. @jorisvandenbossche if you can give a quick glance. |
thanks @mzeitlin11 |
Thanks @mzeitlin11 ! |
This gives consistency for nullable dtype treatment across
.any/.all
and the groupby versions which already used Kleene logic. Diff looks big, but this is just movingBooleanArray.any
andBooleanArray.all
toBaseMaskedArray