Skip to content

Commit f4136c0

Browse files
authored
PERF: groupby.any/all with object dtype (#50623)
1 parent 18865cf commit f4136c0

File tree

2 files changed

+9
-14
lines changed

2 files changed

+9
-14
lines changed

doc/source/whatsnew/v2.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -753,6 +753,7 @@ Removal of prior version deprecations/changes
753753
Performance improvements
754754
~~~~~~~~~~~~~~~~~~~~~~~~
755755
- Performance improvement in :meth:`.DataFrameGroupBy.median` and :meth:`.SeriesGroupBy.median` and :meth:`.GroupBy.cumprod` for nullable dtypes (:issue:`37493`)
756+
- Performance improvement in :meth:`.DataFrameGroupBy.all`, :meth:`.DataFrameGroupBy.any`, :meth:`.SeriesGroupBy.all`, and :meth:`.SeriesGroupBy.any` for object dtype (:issue:`50623`)
756757
- Performance improvement in :meth:`MultiIndex.argsort` and :meth:`MultiIndex.sort_values` (:issue:`48406`)
757758
- Performance improvement in :meth:`MultiIndex.size` (:issue:`48723`)
758759
- Performance improvement in :meth:`MultiIndex.union` without missing values and without duplicates (:issue:`48505`, :issue:`48752`)

pandas/core/groupby/groupby.py

+8-14
Original file line numberDiff line numberDiff line change
@@ -1774,22 +1774,16 @@ def _bool_agg(self, val_test: Literal["any", "all"], skipna: bool):
17741774
"""
17751775

17761776
def objs_to_bool(vals: ArrayLike) -> tuple[np.ndarray, type]:
1777-
if is_object_dtype(vals.dtype):
1777+
if is_object_dtype(vals.dtype) and skipna:
17781778
# GH#37501: don't raise on pd.NA when skipna=True
1779-
if skipna:
1780-
func = np.vectorize(
1781-
lambda x: bool(x) if not isna(x) else True, otypes=[bool]
1782-
)
1783-
vals = func(vals)
1784-
else:
1785-
vals = vals.astype(bool, copy=False)
1786-
1787-
vals = cast(np.ndarray, vals)
1779+
mask = isna(vals)
1780+
if mask.any():
1781+
# mask on original values computed separately
1782+
vals = vals.copy()
1783+
vals[mask] = True
17881784
elif isinstance(vals, BaseMaskedArray):
1789-
vals = vals._data.astype(bool, copy=False)
1790-
else:
1791-
vals = vals.astype(bool, copy=False)
1792-
1785+
vals = vals._data
1786+
vals = vals.astype(bool, copy=False)
17931787
return vals.view(np.int8), bool
17941788

17951789
def result_to_bool(

0 commit comments

Comments
 (0)