-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add separate numba kernels for groupby aggregations #53731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/groupby/groupby.py
Outdated
@@ -1484,6 +1484,7 @@ def _numba_agg_general( | |||
func: Callable, | |||
dtype_mapping: dict[np.dtype, Any], | |||
engine_kwargs: dict[str, bool] | None, | |||
is_grouped_kernel: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically can remove this now, since all the existing numba kernels have been ported to have a groupby variant.
If you'd rather have me split this PR by kernel, this would be necessary (would remove after all the PRs land).
6cf8316
to
f29ea3e
Compare
While there is a nice speedup here, I am a little concerned about proliferating implementations of reductions in the spirit of #53261 |
The problem is the numba kernels are basically useless without this. I was doing a little benchmarking a little while back and numba was ~5x slower than Cython before this change. With these kernels at least, I think we can share the Kahan summation part for |
Gentle ping |
* ENH: non float64 result support in numba groupby * refactor & simplify * fix CI * maybe green? * skip unsupported ops in other bench as well * updates from code review * remove commented code * update whatsnew * debug benchmarks * Skip min/max benchmarks
pyright_reportGeneralTypeIssues.json
Outdated
@@ -18,6 +18,9 @@ | |||
"pandas/_testing/__init__.py", | |||
"pandas/_testing/_hypothesis.py", | |||
"pandas/_testing/_io.py", | |||
# Numba does not like typing.cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any prospect of this being fixed in numba? i.e. condition under which we can remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add something like "# TODO(numba#5474): can be removed once numba#5474 is addressed"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we lost the ability to add comments in the pyright json config file.
Is this one of the use cases where there is a prospect of parallelizing the cython version too? |
In the near future, no, since shipping openmp in wheels is a PITA. numba seems to at least have figured out how to solve this problem partially. I think you can use the tbb backend if installing with pip and on conda-forge you can choose between tbb/openmp. |
no objection in principle, will defer to matt since he's more involved in manitaining this code |
pandas/core/_numba/kernels/mean_.py
Outdated
if nobs == 0 == min_periods: | ||
result = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this hit? Why default to 0.0 instead of nan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, my bad. I copied it from the sliding window code.
This is updated now.
@mroeschke (I'd like to get this in, if not the apply stuff, at least for 2.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but still hoping one day we can still have one aggregation kernel that can apply to both window and groupby (and standard) functions 🤞
Thanks @lithomas1 |
…53731) * ENH: Add separate numba kernels for groupby aggregations * add whatsnew * fixes from pre-commit * fix window tests * fix tests? * cleanup * is_grouped_kernel=True in groupby * typing * fix typing? * fix now * Just ignore * remove unnecessary code * remove comment
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This PR adds separate groupby numba kernels, since the old shared kernels with Window required an expensive sort beforehand.
e.g. look at this snakeviz (pay attention to the
take
call. for reference,column_looper
is the part doing the actual aggregation)Benchmarks
Compared to Cython, we are still ~2x slower, but I think I can get back some more perf (would be a follow up PR).
If
parallel=True
is passed, though, we can be around ~2x faster than our Cython impl, though.(This is on a Intel 2019 MBP, Intel core i7-9750H w/ 16 GB RAM)
cc @jbrockmendel if interested.
(You and I talked about parallelism in pandas a couple weeks back).