-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: pd.concat EA-backed indexes and sort=True #49178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
num_pos = ~str_pos & ~null_pos | ||
str_argsort = np.argsort(values[str_pos]) | ||
num_argsort = np.argsort(values[num_pos]) | ||
str_locs = str_pos.nonzero()[0].take(str_argsort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is nonzero doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is converting the boolean mask to positional indices within the larger array. we then sort those positional indices via the argsort of the string subset. This is all to be able to call take
on the original input so we can avoid converting everything to ndarray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so False=0 and True is nonzero is the catch here? If yes, could you add a short comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I added a comment explaining the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @mroeschke
Just a merge conflict otherwise LGTM |
doc/source/whatsnew/v2.0.0.rst
file if fixing a bug or adding a new feature.algos.safe_sort
is currently converting EA-backed indexes to ndarrays which can cause a perf hit. It can be significant if the index contains pd.NA asnp.argsort
will raise which gets caught in the try-catch. This PR avoids the numpy conversion for EA-backed indexes.Note: One test which relied on the numpy conversion was updated.
I updated an existing ASV that was just added yesterday to cover this case as well: