Skip to content

PERF: Improve RollingGroupby.count #36872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 5, 2020

Conversation

mroeschke
Copy link
Member

In [1]: import pandas as pd
   ...:
   ...: # Generate sample df
   ...: df = pd.DataFrame({'column1': range(600), 'group': 5*['l'+str(i) for i in range(120)]})
   ...:
   ...: # sort by group for easy/efficient joining of new columns to df
   ...: df=df.sort_values('group',kind='mergesort').reset_index(drop=True)

In [2]: %timeit df['mean']=df.groupby('group').rolling(3,min_periods=1)['column1'].mean().values
5.59 ms ± 310 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: %timeit df['sum']=df.groupby('group').rolling(3,min_periods=1)['column1'].sum().values
   ...:
5.34 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [4]: %timeit df['count']=df.groupby('group').rolling(3,min_periods=1)['column1'].count().values
   ...:
4.97 ms ± 51.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@mroeschke mroeschke added Performance Memory or execution speed performance Window rolling, ewma, expanding labels Oct 5, 2020
@mroeschke mroeschke added this to the 1.2 milestone Oct 5, 2020
@jreback jreback merged commit f3d193f into pandas-dev:master Oct 5, 2020
@mroeschke mroeschke deleted the perf/groupby_rolling_count branch October 5, 2020 16:52
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Oct 13, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Groupby rolling count slower than rolling mean & sum (v1.1.0)
2 participants