Skip to content

BUG: groupby any/all raising on extension types #40621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,8 @@ Groupby/resample/rolling
- Bug in :class:`core.window.ewm.ExponentialMovingWindow` when calling ``__getitem__`` would not retain ``com``, ``span``, ``alpha`` or ``halflife`` attributes (:issue:`40164`)
- :class:`core.window.ewm.ExponentialMovingWindow` now raises a ``NotImplementedError`` when specifying ``times`` with ``adjust=False`` due to an incorrect calculation (:issue:`40098`)
- Bug in :meth:`Series.asfreq` and :meth:`DataFrame.asfreq` dropping rows when the index is not sorted (:issue:`39805`)
- Bug in :class:`SeriesGroupBy` and :class:`DataFrameGroupBy` raising ``ValueError`` when using methods ``any`` and ``all`` with ``ExtensionDType`` columns holding ``NA`` even with ``skipna=True`` (:issue:`40585`)
-

Reshaping
^^^^^^^^^
Expand Down
7 changes: 5 additions & 2 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1414,14 +1414,17 @@ def _obj_1d_constructor(self) -> Type[Series]:
return self.obj._constructor

@final
def _bool_agg(self, val_test, skipna):
def _bool_agg(self, val_test, skipna: bool):
"""
Shared func to call any / all Cython GroupBy implementations.
"""

def objs_to_bool(vals: np.ndarray) -> Tuple[np.ndarray, Type]:
def objs_to_bool(vals: ArrayLike) -> Tuple[np.ndarray, Type]:
if is_object_dtype(vals):
vals = np.array([bool(x) for x in vals])
elif isinstance(vals, ExtensionArray):
vals = vals.to_numpy(dtype=bool, na_value=np.nan)
vals = vals.astype(bool)
else:
vals = vals.astype(bool)

Expand Down
29 changes: 29 additions & 0 deletions pandas/tests/groupby/test_function.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,35 @@ def test_groupby_bool_aggs(agg_func, skipna, vals):
tm.assert_frame_equal(result, exp_df)


@pytest.mark.parametrize("bool_agg_func", ["any", "all"])
@pytest.mark.parametrize("skipna", [True, False])
def test_bool_aggs_dup_column_labels(bool_agg_func, skipna):
# 21668
df = DataFrame([[True, True]], columns=["a", "a"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add in skipna as a parameter in these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added skipna parameterization to test_bool_aggs_dup_column_labels (this was just a test moved to keep any/all tests together)

For the added nullable dtype test below, problem is that skipna=False case doesn't currently give the correct output (#37506)

grp_by = df.groupby([0])
result = getattr(grp_by, bool_agg_func)(skipna=skipna)

expected = df
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("bool_agg_func", ["any", "all"])
@pytest.mark.parametrize("dtype", ["Int64", "Float64", "boolean"])
@pytest.mark.parametrize("group_by_frame", [True, False])
def test_bool_aggs_ea_skipna(bool_agg_func, dtype, group_by_frame):
# GH-40585
df = DataFrame({"grp": [1, 1], "val": pd.array([pd.NA, 1], dtype=dtype)})
if group_by_frame:
grouped = df.groupby("grp")
expected = DataFrame({"val": [True]}, index=Index([1], name="grp"))
else:
grouped = df["val"].groupby(df["grp"])
expected = Series([True], index=Index([1], name="grp"), name="val")

result = grouped.agg(bool_agg_func, skipna=True)
tm.assert_equal(result, expected)


def test_max_min_non_numeric():
# #2700
aa = DataFrame({"nn": [11, 11, 22, 22], "ii": [1, 2, 3, 4], "ss": 4 * ["mama"]})
Expand Down
11 changes: 0 additions & 11 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1978,17 +1978,6 @@ def test_groupby_duplicate_index():
tm.assert_series_equal(result, expected)


@pytest.mark.parametrize("bool_agg_func", ["any", "all"])
def test_bool_aggs_dup_column_labels(bool_agg_func):
# 21668
df = DataFrame([[True, True]], columns=["a", "a"])
grp_by = df.groupby([0])
result = getattr(grp_by, bool_agg_func)()

expected = df
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize(
"idx", [Index(["a", "a"]), MultiIndex.from_tuples((("a", "a"), ("a", "a")))]
)
Expand Down