Skip to content

ENH: nullables use Kleene logic for any/all reductions #41970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Sep 21, 2021

Conversation

mzeitlin11
Copy link
Member

This gives consistency for nullable dtype treatment across .any/.all and the groupby versions which already used Kleene logic. Diff looks big, but this is just moving BooleanArray.any and BooleanArray.all to BaseMaskedArray

@mzeitlin11 mzeitlin11 added NA - MaskedArrays Related to pd.NA and nullable extension arrays Reduction Operations sum, mean, min, max, etc. labels Jun 12, 2021

def any(self, *, skipna: bool = True, **kwargs):
"""
Return whether any element is truthy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionchanged 1.4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

):
ser = Series(data, dtype="boolean")
ser = Series(data, dtype="boolean").astype(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this astyped and not the dtype passed to the Series constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed the parameterization to avoid needing to do this

The reason was that something like Series([True, pd.NA, True], dtype="Int64") raises. Though using just pure booleans doesn't raise, eg
Series([True, True, True], dtype="Int64") is fine. Does this inconsistency make sense, or should I open an issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this inconsistency make sense, or should I open an issue?

pd.Series([True, np.nan, True], dtype="Int64") works fine, but pd.Series([True, None, True], dtype="Int64") also raises TypeError: object cannot be converted to an IntegerDtype

There does seam to appear to be inconsistency here.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 4, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Sep 4, 2021
@mzeitlin11 mzeitlin11 removed the Stale label Sep 4, 2021
([False, pd.NA, False], [[pd.NA, False], [False, False]]),
([True, pd.NA, True], [[True, pd.NA], [True, True]]),
([True, pd.NA, False], [[True, False], [True, False]]),
([0, 0, 0], [[0, 0], [0, 0]]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do these return ints rather than True/False?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that was a mistake - it actually returns bools, test just happened to work because the reductions return a scalar and 1 == True, etc

@jreback jreback added this to the 1.4 milestone Sep 9, 2021
@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

lgtm. @jorisvandenbossche if you can give a quick glance.

@jreback jreback merged commit 0524ef8 into pandas-dev:master Sep 21, 2021
@jreback
Copy link
Contributor

jreback commented Sep 21, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the enh/nullable_kleene_any_all branch September 21, 2021 16:51
@jorisvandenbossche
Copy link
Member

Thanks @mzeitlin11 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants