Skip to content

REF: Groupby.pad/backfill operate blockwise #43478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 10, 2021

Conversation

jbrockmendel
Copy link
Member

We get a decent-but-not-whopping perf boost, with a much bigger perf boost in the follow-up that de-duplicates an argsort call that we're currently doing repeatedly inside the cython function.

More importantly, we're getting close to having everything operate blockwise, at which point we can get rid of _get_cythonized_result and send everything through the _cython_operation path.

import pandas as pd
import numpy as np

np.random.seed(23446365)
arr = np.random.randn(10**5, 10)
mask = arr < -1
arr[mask] = np.nan

df = pd.DataFrame(arr)

gb = df.groupby(df.index % 7)

%timeit res = gb.pad()
28.7 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- master
24 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <- PR

@jreback jreback added Groupby Performance Memory or execution speed performance Refactor Internal refactoring of code labels Sep 9, 2021
@jreback jreback added this to the 1.4 milestone Sep 9, 2021
@jreback
Copy link
Contributor

jreback commented Sep 9, 2021

lgtm

@jbrockmendel
Copy link
Member Author

rebased + green

@jreback jreback merged commit 22aa73c into pandas-dev:master Sep 10, 2021
@jbrockmendel jbrockmendel deleted the perf-gb-fillna branch September 10, 2021 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Performance Memory or execution speed performance Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants