-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Groupby.agg with non-reducing UDF #51445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Because apply and agg share code paths in various cases, and apply is allowed any UDF, we currently allow it in agg in some places as well. This should be addressed by #49673. My current view is that we should allow UDFs that don't return scalars in agg, but still treat it as an aggregation. An example where this is useful:
Replacing |
thanks for clarifying. should we close this or classify it as "closed by #49673" or something else? |
Removing the block highlighted in the OP, I'm not seeing any tests fail on main. This logic seems to be repeated after the for loop iterating over slices: pandas/pandas/core/groupby/generic.py Lines 1354 to 1356 in 3f3102b
The only difference is the lack of passing I'm guessing the case after the for loop might have been useful when failure was allowed, but should not longer be necessary. But it makes sense to me to have one of the checks remain to have apply and agg return the same result in the empty case. |
The code in the OP used to be shared by SeriesGroupBy._python_agg_general and DataFrameGroupBy._python_agg_general until last week, now is only used by DFGB. The SGB case actually did need it, but it got refactored out to be more explicit. So removing it from DFGB._python_agg_general doesn't break any tests, but plausibly could. I wouldn't mind ripping it out anyway. The |
In GroupBy._python_agg_general we have a check
The only test that fails if we disable this check is test_evaluate_with_empty_groups, which is passing
lambda x: x
as its ostensibly-aggregating function. Without this special-casing, we return object dtype instead of float64 dtype in this case. I'm not sure what our policy is on requiring actually-aggregating functions in agg (cc @rhshadrach?) but this feels sketchy.The text was updated successfully, but these errors were encountered: