Skip to content

Commit fa3ba91

Browse files
Backport PR #42085: BUG: groupby any/all raising with pd.NA object data (#42164)
Co-authored-by: Matthew Zeitlin <[email protected]>
1 parent 50c2dcf commit fa3ba91

File tree

3 files changed

+32
-1
lines changed

3 files changed

+32
-1
lines changed

doc/source/whatsnew/v1.3.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -269,6 +269,7 @@ Other enhancements
269269
- :meth:`read_csv` and :meth:`read_json` expose the argument ``encoding_errors`` to control how encoding errors are handled (:issue:`39450`)
270270
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` use Kleene logic with nullable data types (:issue:`37506`)
271271
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` return a ``BooleanDtype`` for columns with nullable data types (:issue:`33449`)
272+
- :meth:`.GroupBy.any` and :meth:`.GroupBy.all` raising with ``object`` data containing ``pd.NA`` even when ``skipna=True`` (:issue:`37501`)
272273
- :meth:`.GroupBy.rank` now supports object-dtype data (:issue:`38278`)
273274
- Constructing a :class:`DataFrame` or :class:`Series` with the ``data`` argument being a Python iterable that is *not* a NumPy ``ndarray`` consisting of NumPy scalars will now result in a dtype with a precision the maximum of the NumPy scalars; this was already the case when ``data`` is a NumPy ``ndarray`` (:issue:`40908`)
274275
- Add keyword ``sort`` to :func:`pivot_table` to allow non-sorting of the result (:issue:`39143`)

pandas/core/groupby/groupby.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -1519,7 +1519,11 @@ def _bool_agg(self, val_test, skipna):
15191519

15201520
def objs_to_bool(vals: ArrayLike) -> tuple[np.ndarray, type]:
15211521
if is_object_dtype(vals):
1522-
vals = np.array([bool(x) for x in vals])
1522+
# GH#37501: don't raise on pd.NA when skipna=True
1523+
if skipna:
1524+
vals = np.array([bool(x) if not isna(x) else True for x in vals])
1525+
else:
1526+
vals = np.array([bool(x) for x in vals])
15231527
elif isinstance(vals, BaseMaskedArray):
15241528
vals = vals._data.astype(bool, copy=False)
15251529
else:

pandas/tests/groupby/test_any_all.py

+26
Original file line numberDiff line numberDiff line change
@@ -152,3 +152,29 @@ def test_masked_bool_aggs_skipna(bool_agg_func, dtype, skipna, frame_or_series):
152152

153153
result = obj.groupby([1, 1]).agg(bool_agg_func, skipna=skipna)
154154
tm.assert_equal(result, expected)
155+
156+
157+
@pytest.mark.parametrize(
158+
"bool_agg_func,data,expected_res",
159+
[
160+
("any", [pd.NA, np.nan], False),
161+
("any", [pd.NA, 1, np.nan], True),
162+
("all", [pd.NA, pd.NaT], True),
163+
("all", [pd.NA, False, pd.NaT], False),
164+
],
165+
)
166+
def test_object_type_missing_vals(bool_agg_func, data, expected_res, frame_or_series):
167+
# GH#37501
168+
obj = frame_or_series(data, dtype=object)
169+
result = obj.groupby([1] * len(data)).agg(bool_agg_func)
170+
expected = frame_or_series([expected_res], index=[1], dtype="bool")
171+
tm.assert_equal(result, expected)
172+
173+
174+
@pytest.mark.filterwarnings("ignore:Dropping invalid columns:FutureWarning")
175+
@pytest.mark.parametrize("bool_agg_func", ["any", "all"])
176+
def test_object_NA_raises_with_skipna_false(bool_agg_func):
177+
# GH#37501
178+
ser = Series([pd.NA], dtype=object)
179+
with pytest.raises(TypeError, match="boolean value of NA is ambiguous"):
180+
ser.groupby([1]).agg(bool_agg_func, skipna=False)

0 commit comments

Comments
 (0)