Skip to content

Commit 007bebc

Browse files
committed
Merge pull request #11147 from behzadnouri/grby-transf
PERF: improves performance in SeriesGroupBy.transform
2 parents 8c9192c + 5de00b0 commit 007bebc

File tree

2 files changed

+13
-7
lines changed

2 files changed

+13
-7
lines changed

doc/source/whatsnew/v0.17.0.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1021,7 +1021,7 @@ Performance Improvements
10211021
- Development support for benchmarking with the `Air Speed Velocity library <https://github.com/spacetelescope/asv/>`_ (:issue:`8316`)
10221022
- Added vbench benchmarks for alternative ExcelWriter engines and reading Excel files (:issue:`7171`)
10231023
- Performance improvements in ``Categorical.value_counts`` (:issue:`10804`)
1024-
- Performance improvements in ``SeriesGroupBy.nunique`` and ``SeriesGroupBy.value_counts`` (:issue:`10820`, :issue:`11077`)
1024+
- Performance improvements in ``SeriesGroupBy.nunique`` and ``SeriesGroupBy.value_counts`` and ``SeriesGroupby.transform`` (:issue:`10820`, :issue:`11077`)
10251025
- Performance improvements in ``DataFrame.drop_duplicates`` with integer dtypes (:issue:`10917`)
10261026
- 4x improvement in ``timedelta`` string parsing (:issue:`6755`, :issue:`10426`)
10271027
- 8x improvement in ``timedelta64`` and ``datetime64`` ops (:issue:`6755`)

pandas/core/groupby.py

+12-6
Original file line numberDiff line numberDiff line change
@@ -2512,13 +2512,19 @@ def _transform_fast(self, func):
25122512
if isinstance(func, compat.string_types):
25132513
func = getattr(self,func)
25142514

2515-
values = func().values
2516-
counts = self.size().fillna(0).values
2517-
values = np.repeat(values, com._ensure_platform_int(counts))
2518-
if any(counts == 0):
2519-
values = self._try_cast(values, self._selected_obj)
2515+
ids, _, ngroup = self.grouper.group_info
2516+
mask = ids != -1
2517+
2518+
out = func().values[ids]
2519+
if not mask.all():
2520+
out = np.where(mask, out, np.nan)
2521+
2522+
obs = np.zeros(ngroup, dtype='bool')
2523+
obs[ids[mask]] = True
2524+
if not obs.all():
2525+
out = self._try_cast(out, self._selected_obj)
25202526

2521-
return self._set_result_index_ordered(Series(values))
2527+
return Series(out, index=self.obj.index)
25222528

25232529
def filter(self, func, dropna=True, *args, **kwargs):
25242530
"""

0 commit comments

Comments
 (0)