-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN/PERF: group quantile #43510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN/PERF: group quantile #43510
Conversation
def blk_func(values: ArrayLike) -> ArrayLike: | ||
mask = isna(values) | ||
vals, inference = pre_processor(values) | ||
|
||
ncols = 1 | ||
if vals.ndim == 2: | ||
ncols = vals.shape[0] | ||
shaped_labels = np.broadcast_to( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the broadcast necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was labels is 1-dimensional, but values can be 2. I couldn't figure out a way to make lexsort
naturally broadcast in that case. I think this broadcast is relatively cheap, though, since np.broadcast_to
uses views, so it shouldn't increase memory usage (and this op doesn't show up in a profile)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broadcast itself is definitely cheap. In some cases I've later found that operations done on these broadcasted ndarrays are surprisingly slow. But if you've profiled it and its not an issue, then no complaints here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting. The profile showed that all time is spent in lexsort, so definitely possible slow access on the broadcasted array is slowing it down. Will check how lexsort timing compares with this broadcast vs a contiguous array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a copy after the broadcast doesn't seem to affect lexsort time in a profile
thanks @mzeitlin11 |
Inspired by idea from @jbrockmendel in #43489 (comment).
Didn't end up seeing a huge perf improvement though, for example results on a wide frame:
Based on a quick profile, almost all time is spent in
lexsort
on master or with this pr - seems lexsorting the entire block at once has comparable performance to going column by column (I guess more improvement would show up on a really wide frame).Regardless, though, I think moving this out of cython is an improvement (since gives no perf benefit for them to be in cython when just numpy functions, also saves ~80kb on compiled code).