Skip to content

Commit b0140bf

Browse files
authored
PERF: Improve performance when accessing GroupBy.groups (#53088)
* PERF: Improve performance when accessing GroupBy.groups * Update v2.1.0.rst * Fix
1 parent 13c9922 commit b0140bf

File tree

2 files changed

+9
-2
lines changed

2 files changed

+9
-2
lines changed

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,7 @@ Performance improvements
285285
- Performance improvement accessing :attr:`arrays.IntegerArrays.dtype` & :attr:`arrays.FloatingArray.dtype` (:issue:`52998`)
286286
- Performance improvement in :class:`Series` reductions (:issue:`52341`)
287287
- Performance improvement in :func:`concat` when ``axis=1`` and objects have different indexes (:issue:`52541`)
288+
- Performance improvement in :meth:`.DataFrameGroupBy.groups` (:issue:`53088`)
288289
- Performance improvement in :meth:`DataFrame.loc` when selecting rows and columns (:issue:`53014`)
289290
- Performance improvement in :meth:`Series.corr` and :meth:`Series.cov` for extension dtypes (:issue:`52502`)
290291
- Performance improvement in :meth:`Series.to_numpy` when dtype is a numpy float dtype and ``na_value`` is ``np.nan`` (:issue:`52430`)

pandas/core/groupby/ops.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -697,8 +697,14 @@ def groups(self) -> dict[Hashable, np.ndarray]:
697697
if len(self.groupings) == 1:
698698
return self.groupings[0].groups
699699
else:
700-
to_groupby = zip(*(ping.grouping_vector for ping in self.groupings))
701-
index = Index(to_groupby)
700+
to_groupby = []
701+
for ping in self.groupings:
702+
gv = ping.grouping_vector
703+
if not isinstance(gv, BaseGrouper):
704+
to_groupby.append(gv)
705+
else:
706+
to_groupby.append(gv.groupings[0].grouping_vector)
707+
index = MultiIndex.from_arrays(to_groupby)
702708
return self.axis.groupby(index)
703709

704710
@final

0 commit comments

Comments
 (0)