-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Groupby agg support multiple funcs numba #53486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -223,24 +223,26 @@ def apply(self, func, *args, **kwargs) -> Series: | |
|
||
@doc(_agg_template_series, examples=_agg_examples_doc, klass="Series") | ||
def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs): | ||
if maybe_use_numba(engine): | ||
return self._aggregate_with_numba( | ||
func, *args, engine_kwargs=engine_kwargs, **kwargs | ||
) | ||
|
||
relabeling = func is None | ||
columns = None | ||
if relabeling: | ||
columns, func = validate_func_kwargs(kwargs) | ||
kwargs = {} | ||
|
||
if isinstance(func, str): | ||
if maybe_use_numba(engine): | ||
# Not all agg functions support numba, only propagate numba kwargs | ||
# if user asks for numba | ||
kwargs["engine"] = engine | ||
kwargs["engine_kwargs"] = engine_kwargs | ||
return getattr(self, func)(*args, **kwargs) | ||
|
||
elif isinstance(func, abc.Iterable): | ||
# Catch instances of lists / tuples | ||
# but not the class list / tuple itself. | ||
func = maybe_mangle_lambdas(func) | ||
kwargs["engine"] = engine | ||
kwargs["engine_kwargs"] = engine_kwargs | ||
ret = self._aggregate_multiple_funcs(func, *args, **kwargs) | ||
if relabeling: | ||
# columns is not narrowed by mypy from relabeling flag | ||
|
@@ -255,6 +257,11 @@ def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs) | |
if cyfunc and not args and not kwargs: | ||
return getattr(self, cyfunc)() | ||
|
||
if maybe_use_numba(engine): | ||
return self._aggregate_with_numba( | ||
func, *args, engine_kwargs=engine_kwargs, **kwargs | ||
) | ||
|
||
if self.ngroups == 0: | ||
# e.g. test_evaluate_with_empty_groups without any groups to | ||
# iterate over, we have no output on which to do dtype | ||
|
@@ -1387,14 +1394,15 @@ class DataFrameGroupBy(GroupBy[DataFrame]): | |
|
||
@doc(_agg_template_frame, examples=_agg_examples_doc, klass="DataFrame") | ||
def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs): | ||
if maybe_use_numba(engine): | ||
return self._aggregate_with_numba( | ||
func, *args, engine_kwargs=engine_kwargs, **kwargs | ||
) | ||
|
||
relabeling, func, columns, order = reconstruct_func(func, **kwargs) | ||
func = maybe_mangle_lambdas(func) | ||
|
||
if maybe_use_numba(engine): | ||
# Not all agg functions support numba, only propagate numba kwargs | ||
# if user asks for numba | ||
kwargs["engine"] = engine | ||
kwargs["engine_kwargs"] = engine_kwargs | ||
|
||
op = GroupByApply(self, func, args=args, kwargs=kwargs) | ||
result = op.agg() | ||
if not is_dict_like(func) and result is not None: | ||
|
@@ -1416,6 +1424,17 @@ def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs) | |
result.columns = columns # type: ignore[assignment] | ||
|
||
if result is None: | ||
# Remove the kwargs we inserted | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So we need to go through There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, that checks for str/list/dict aggregations, and call back into the agg function itself. It'll return None, if a callable is passed, though (what this block handles). |
||
# (already stored in engine, engine_kwargs arguments) | ||
if "engine" in kwargs: | ||
del kwargs["engine"] | ||
del kwargs["engine_kwargs"] | ||
# at this point func is not a str, list-like, dict-like, | ||
# or a known callable(e.g. sum) | ||
if maybe_use_numba(engine): | ||
return self._aggregate_with_numba( | ||
func, *args, engine_kwargs=engine_kwargs, **kwargs | ||
) | ||
# grouper specific aggregations | ||
if self.grouper.nkeys > 1: | ||
# test_groupby_as_index_series_scalar gets here with 'not self.as_index' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the method signatures with by adding
engine
andengine_kwargs
params instead of mangling kwargs?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obj
here can be a Series/DF/Resampler/ too, since the code is shared with there, so it wouldn't be possible to addengine
/engine_kwargs
there (unless we supported numba there too), sinceagg
there is part of the public API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obj
here can be a Series/DF/Resampler/ too, since the code is shared with there, so it wouldn't be possible to addengine
/engine_kwargs
there (unless we supported numba there too), sinceagg
there is part of the public API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more we evolve groupby, the more I realize trying to share this code between Series/DataFrame and groupby was a mistake of mine. I've been thinking we should refactor and separate off groupby entirely (but it can still live in
core.apply
). I'm okay with having a bit of a kludge here and will separate/cleanup after.