ENH: implement Kleene versions of GroupBy any/all kernels for nullable dtypes #37506

arw2019 · 2020-10-29T23:36:28Z

phofl · 2020-10-30T22:24:30Z

Wanted to look into this and tried to understand the implementation for Keene logic in case of any/all for a Series. This means, that

s = pd.Series([False, pd.NA, False], dtype="boolean")
print(s.any(skipna=False))

works and returns pd.NA. But

df= pd.DataFrame({"a": [False, pd.NA, False]}, dtype="boolean")
print(df.any(skipna=False))

returns an empty Series. I could not find the reason, why this was not implemted for DataFrames too, when any columns there is from dtype boolean. Should this get implemented for SeriesGroupBy and DataFrameGroupBy?

arw2019 · 2020-10-30T22:31:14Z

I haven't gone through the whole path put I think we may want to do this in the cython method

phofl · 2020-10-30T22:34:07Z

For GroupBy that would probably be the best. But we have to fix #37505 before doing this, because this raises for

s = pd.Series([False, pd.NA, False], dtype="boolean", index=[0,0, 1])
grouped = s.groupby(level=0)
print(grouped.any(skipna=False))

too before even reaching the cython part. But I meant the regular DataFrame.any() without groupby. This does not work either

Traceback:

Traceback (most recent call last):
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 2594, in _get_cythonized_result
    vals, inferences = pre_processing(vals)
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 1368, in objs_to_bool
    vals = vals.astype(bool)
  File "/home/developer/PycharmProjects/pandas/pandas/core/arrays/boolean.py", line 388, in astype
    raise ValueError("cannot convert float NaN to bool")
ValueError: cannot convert float NaN to bool

arw2019 · 2020-10-30T22:43:30Z

For GroupBy that would probably be the best. But we have to fix #37505 before doing this, because this raises for

s = pd.Series([False, pd.NA, False], dtype="boolean", index=[0,0, 1])
grouped = s.groupby(level=0)
print(grouped.any(skipna=False))

too before even reaching the cython part. But I meant the regular DataFrame.any() without groupby. This does not work either

Traceback:

Traceback (most recent call last):
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 2594, in _get_cythonized_result
    vals, inferences = pre_processing(vals)
  File "/home/developer/PycharmProjects/pandas/pandas/core/groupby/groupby.py", line 1368, in objs_to_bool
    vals = vals.astype(bool)
  File "/home/developer/PycharmProjects/pandas/pandas/core/arrays/boolean.py", line 388, in astype
    raise ValueError("cannot convert float NaN to bool")
ValueError: cannot convert float NaN to bool
``

Ah but "boolean" is an Extension array right? I think if we rewrite in terms of a mask that's a fix and it's preferred performance-wise. But in the meantime we can cast to numpy before passing to cython as was done for std in #37433

jorisvandenbossche added Groupby and removed ExtensionArray Extending pandas with custom dtypes or arrays. Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 30, 2020

mzeitlin11 mentioned this issue Mar 24, 2021

BUG: groupby any/all raising on extension types #40621

Closed

4 tasks

mzeitlin11 self-assigned this Apr 6, 2021

mzeitlin11 mentioned this issue Apr 6, 2021

ENH/BUG: Use Kleene logic for groupby any/all #40819

Merged

6 tasks

jreback added this to the 1.3 milestone Apr 13, 2021

jreback closed this as completed in #40819 Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: implement Kleene versions of GroupBy any/all kernels for nullable dtypes #37506

ENH: implement Kleene versions of GroupBy any/all kernels for nullable dtypes #37506

arw2019 commented Oct 29, 2020

phofl commented Oct 30, 2020

arw2019 commented Oct 30, 2020

phofl commented Oct 30, 2020 •

edited

Loading

arw2019 commented Oct 30, 2020

ENH: implement Kleene versions of GroupBy any/all kernels for nullable dtypes #37506

ENH: implement Kleene versions of GroupBy any/all kernels for nullable dtypes #37506

Comments

arw2019 commented Oct 29, 2020

phofl commented Oct 30, 2020

arw2019 commented Oct 30, 2020

phofl commented Oct 30, 2020 • edited Loading

arw2019 commented Oct 30, 2020

phofl commented Oct 30, 2020 •

edited

Loading