-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: reduce overhead in groupby _cython_operation #40317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: reduce overhead in groupby _cython_operation #40317
Conversation
if isna_compat(arr, fill_value): | ||
arr.fill(fill_value) | ||
if arr.dtype.kind not in ("u", "i", "b"): | ||
arr.fill(np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe_fill
is only used in _cython_operation
, and thus always used with an ndarray and with np.nan as fill value, so therefore simplified this function a bit, based on those assumptions.
# we use iNaT for the missing value on ints | ||
# so pre-convert to guard this condition | ||
if (values == iNaT).any(): | ||
values = ensure_float64(values) | ||
else: | ||
values = ensure_int_or_float(values) | ||
elif is_numeric and not is_complex_dtype(values): | ||
values = ensure_float64(ensure_float(values)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ensure_float
was needed before to ensure that a nullable float EA was converted to the float ndarray. But now EAs already take a different code path above.
can you also report how the non-AM perf is affected? |
For the benchmark, it's not really affected. The overhead I am optimizing here is only around 1% for the BlockManager case. It's only when doing those checks 1000x instead of 1x that this becomes a noticeable bottleneck (the benchmarks has 1000 columns / single block). I assume that for a small Series, the overhead might also be noticeable for non-AM. But trying this, and I don't really see much difference, and that's because for a single column, the dominant time is the actual factorization (which for 1000 columns is also only done once). Using the code from above:
|
(some failing tests related to (unsupported) operations on categorical column that I still need to address) |
This is all passing now |
pandas/core/dtypes/missing.py
Outdated
@@ -556,12 +556,12 @@ def infer_fill_value(val): | |||
return np.nan | |||
|
|||
|
|||
def maybe_fill(arr, fill_value=np.nan): | |||
def maybe_fill(arr: np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> np.ndarray
? (or just remove the return statement i guess)
dtype = values.dtype | ||
if is_numeric: | ||
# never an invalid op for those dtypes, so return early as fastpath | ||
return | ||
|
||
if is_categorical_dtype(dtype) or is_sparse(dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're micro-optimizing, these can be replaced with isinstance(dtype, FooDtype) checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to do a follow-up PR with several other dtype check optimizations, so will include it there
LGTM ex one requested annotation, cc @jreback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, surprised this has any speedup...
once you can fix the annotation go ahead and merge
The
BaseGrouper._cython_operation
has lots of checking of the dtype etc, which gives quite some overhead compared to the actual aggregation function (for small data). This is a first set of small changes to cut down that overhead.Using the
GroupManyLabels
benchmark we havegives