Skip to content

CLN/PERF: group quantile #43510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 12, 2021
Merged

Conversation

mzeitlin11
Copy link
Member

Inspired by idea from @jbrockmendel in #43489 (comment).

Didn't end up seeing a huge perf improvement though, for example results on a wide frame:

arr = np.random.randn(10 ** 5, 100)
df = pd.DataFrame(arr)
gb = df.groupby(df.index % 3)
gb.quantile(0.5)
# this pr: 1.18 s ± 9.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# master: 1.55 s ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Based on a quick profile, almost all time is spent in lexsort on master or with this pr - seems lexsorting the entire block at once has comparable performance to going column by column (I guess more improvement would show up on a really wide frame).

Regardless, though, I think moving this out of cython is an improvement (since gives no perf benefit for them to be in cython when just numpy functions, also saves ~80kb on compiled code).

@mzeitlin11 mzeitlin11 added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Clean Groupby labels Sep 11, 2021
@jreback jreback added this to the 1.4 milestone Sep 11, 2021
def blk_func(values: ArrayLike) -> ArrayLike:
mask = isna(values)
vals, inference = pre_processor(values)

ncols = 1
if vals.ndim == 2:
ncols = vals.shape[0]
shaped_labels = np.broadcast_to(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the broadcast necessary here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was labels is 1-dimensional, but values can be 2. I couldn't figure out a way to make lexsort naturally broadcast in that case. I think this broadcast is relatively cheap, though, since np.broadcast_to uses views, so it shouldn't increase memory usage (and this op doesn't show up in a profile)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broadcast itself is definitely cheap. In some cases I've later found that operations done on these broadcasted ndarrays are surprisingly slow. But if you've profiled it and its not an issue, then no complaints here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. The profile showed that all time is spent in lexsort, so definitely possible slow access on the broadcasted array is slowing it down. Will check how lexsort timing compares with this broadcast vs a contiguous array

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a copy after the broadcast doesn't seem to affect lexsort time in a profile

@jreback jreback merged commit e0ef148 into pandas-dev:master Sep 12, 2021
@jreback
Copy link
Contributor

jreback commented Sep 12, 2021

thanks @mzeitlin11

@mzeitlin11 mzeitlin11 deleted the lexsort_gb_quantile branch September 12, 2021 15:11
AlexeyGy pushed a commit to AlexeyGy/pandas that referenced this pull request Sep 13, 2021
AlexeyGy pushed a commit to AlexeyGy/pandas that referenced this pull request Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Clean Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants