Skip to content

Commit c65fce1

Browse files
Backport PR #53195 on branch 2.0.x (PERF: Performance regression in Groupby.apply with group_keys=True) (#53202)
Backport PR #53195: PERF: Performance regression in Groupby.apply with group_keys=True Co-authored-by: Patrick Hoefler <[email protected]>
1 parent e33e8ae commit c65fce1

File tree

2 files changed

+15
-10
lines changed

2 files changed

+15
-10
lines changed

doc/source/whatsnew/v2.0.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ including other versions of pandas.
1313

1414
Fixed regressions
1515
~~~~~~~~~~~~~~~~~
16+
- Fixed performance regression in :meth:`GroupBy.apply` (:issue:`53195`)
1617
- Fixed regression in :func:`read_sql` dropping columns with duplicated column names (:issue:`53117`)
1718
- Fixed regression in :meth:`DataFrame.loc` losing :class:`MultiIndex` name when enlarging object (:issue:`53053`)
1819
- Fixed regression in :meth:`DataFrame.to_string` printing a backslash at the end of the first row of data, instead of headers, when the DataFrame doesn't fit the line width (:issue:`53054`)

pandas/core/reshape/concat.py

+14-10
Original file line numberDiff line numberDiff line change
@@ -446,7 +446,7 @@ def __init__(
446446
keys = type(keys).from_tuples(clean_keys, names=keys.names)
447447
else:
448448
name = getattr(keys, "name", None)
449-
keys = Index(clean_keys, name=name)
449+
keys = Index(clean_keys, name=name, dtype=getattr(keys, "dtype", None))
450450

451451
if len(objs) == 0:
452452
raise ValueError("All objects passed were None")
@@ -743,15 +743,19 @@ def _make_concat_multiindex(indexes, keys, levels=None, names=None) -> MultiInde
743743

744744
for hlevel, level in zip(zipped, levels):
745745
to_concat = []
746-
for key, index in zip(hlevel, indexes):
747-
# Find matching codes, include matching nan values as equal.
748-
mask = (isna(level) & isna(key)) | (level == key)
749-
if not mask.any():
750-
raise ValueError(f"Key {key} not in level {level}")
751-
i = np.nonzero(mask)[0][0]
752-
753-
to_concat.append(np.repeat(i, len(index)))
754-
codes_list.append(np.concatenate(to_concat))
746+
if isinstance(hlevel, Index) and hlevel.equals(level):
747+
lens = [len(idx) for idx in indexes]
748+
codes_list.append(np.repeat(np.arange(len(hlevel)), lens))
749+
else:
750+
for key, index in zip(hlevel, indexes):
751+
# Find matching codes, include matching nan values as equal.
752+
mask = (isna(level) & isna(key)) | (level == key)
753+
if not mask.any():
754+
raise ValueError(f"Key {key} not in level {level}")
755+
i = np.nonzero(mask)[0][0]
756+
757+
to_concat.append(np.repeat(i, len(index)))
758+
codes_list.append(np.concatenate(to_concat))
755759

756760
concat_index = _concat_indexes(indexes)
757761

0 commit comments

Comments
 (0)