-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: GroupBy.any/all operate blockwise instead of column-wise #42841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
02e5318
PERF: Groupby any/all blockwise
jbrockmendel af4815b
flesh out asv
jbrockmendel 1d41741
future-proofing check
jbrockmendel ef81c0a
troubleshoot
jbrockmendel 70c4b77
troubleshoot
jbrockmendel 755a9c0
troubleshoot
jbrockmendel 2ccaf19
revert accidental com,mit
jbrockmendel 67cd22e
troubleshoot
jbrockmendel 6b73254
fix check
jbrockmendel 56940e7
Merge branch 'master' into perf-gb
jbrockmendel f5ee7f6
skip invalid asv parameter combinations
jbrockmendel 8cddf6d
Merge branch 'master' into perf-gb
jbrockmendel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -388,10 +388,10 @@ def group_fillna_indexer(ndarray[int64_t] out, ndarray[intp_t] labels, | |
|
||
@cython.boundscheck(False) | ||
@cython.wraparound(False) | ||
def group_any_all(int8_t[::1] out, | ||
const int8_t[::1] values, | ||
def group_any_all(int8_t[:, ::1] out, | ||
const int8_t[:, :] values, | ||
const intp_t[::1] labels, | ||
const uint8_t[::1] mask, | ||
const uint8_t[:, :] mask, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See question below about if |
||
str val_test, | ||
bint skipna, | ||
bint nullable) -> None: | ||
|
@@ -426,9 +426,9 @@ def group_any_all(int8_t[::1] out, | |
-1 to signify a masked position in the case of a nullable input. | ||
""" | ||
cdef: | ||
Py_ssize_t i, N = len(labels) | ||
Py_ssize_t i, j, N = len(labels), K = out.shape[1] | ||
intp_t lab | ||
int8_t flag_val | ||
int8_t flag_val, val | ||
|
||
if val_test == 'all': | ||
# Because the 'all' value of an empty iterable in Python is True we can | ||
|
@@ -448,21 +448,27 @@ def group_any_all(int8_t[::1] out, | |
with nogil: | ||
for i in range(N): | ||
lab = labels[i] | ||
if lab < 0 or (skipna and mask[i]): | ||
if lab < 0: | ||
continue | ||
|
||
if nullable and mask[i]: | ||
# Set the position as masked if `out[lab] != flag_val`, which | ||
# would indicate True/False has not yet been seen for any/all, | ||
# so by Kleene logic the result is currently unknown | ||
if out[lab] != flag_val: | ||
out[lab] = -1 | ||
continue | ||
for j in range(K): | ||
if skipna and mask[i, j]: | ||
continue | ||
|
||
if nullable and mask[i, j]: | ||
# Set the position as masked if `out[lab] != flag_val`, which | ||
# would indicate True/False has not yet been seen for any/all, | ||
# so by Kleene logic the result is currently unknown | ||
if out[lab, j] != flag_val: | ||
out[lab, j] = -1 | ||
continue | ||
|
||
val = values[i, j] | ||
|
||
# If True and 'any' or False and 'all', the result is | ||
# already determined | ||
if values[i] == flag_val: | ||
out[lab] = flag_val | ||
# If True and 'any' or False and 'all', the result is | ||
# already determined | ||
if val == flag_val: | ||
out[lab, j] = flag_val | ||
|
||
|
||
# ---------------------------------------------------------------------- | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a contiguity guarantee for
labels
here, would that also be the case for other groupby algos? Just wondering because I think the others don't specifylabels
contiguity, so might be able to gain small speedup by doing that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think thats the case, yes. merits double-checking, especially since some builds are failing