Skip to content

PERF: use groupby.transform fast path for DataFrame -> Series aggregations #45387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 23, 2022

Conversation

lukemanley
Copy link
Member

@lukemanley lukemanley commented Jan 15, 2022

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

The groupby.transform fast path is currently only taken for DataFrame -> DataFrame operations. This PR makes use of the fast path for DataFrame -> Series operations. The performance impact is most observed for wide DataFrames. From what I can see, almost all benchmarks use tall DataFrames with only a few columns. I added an additional benchmark with a wide DataFrame to cover this case.

import numpy as np
from pandas import DataFrame

n = 1000
df = DataFrame(
    np.random.randn(n, n),
    index=np.random.choice(range(10), n),
)

%timeit df.groupby(level=0).transform(lambda x: np.max(x, axis=0))

562 ms ± 7.93 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   # <- main
76.2 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- PR

@jreback jreback added Groupby Performance Memory or execution speed performance labels Jan 16, 2022
@jreback jreback added this to the 1.5 milestone Jan 16, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, can you add a whatsnew for 1.5 in the performance section

@jreback jreback merged commit 90e512f into pandas-dev:main Jan 23, 2022
@jreback
Copy link
Contributor

jreback commented Jan 23, 2022

thanks @lukemanley

@lukemanley lukemanley deleted the groupby-transform-fast-path branch March 2, 2022 01:13
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants