-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API: Should apply be smart? #39209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just for reference, +1 in general for making |
And it's worth discussing if we should provide alternative APIs for one of |
-1 on any additional api |
@TomAugspurger DataFrame.transform may also work on frames:
It seems to me support for column and/or frame ops in apply/agg/transform/filter should be consistent. If we are to support both (and I think we should), it should be through user-passed arguments (e.g. by_column=True/False) or separate methods instead of inference (try-A-then-try-B as in the transform code above). This would be expanding the API, but my intent here is to make maintenance easier. IMO, try-A-then-try-B can be a source of regressions, performance degradation, makes it harder to reason about the code, and makes errors messages less helpful. @mroeschke: Great idea - here are the two that I see:
|
I'd like to add my opinion in this discussion. I am familiar with R and I understand pandas was originated from R(? or at least inspired by it). R has function The function can be reducer, transformer, extender... It generated confusion from novice and it was hard to debug. So a package developer named Hadley Wickham came along and developed all sort of function that does essentially the same functionality but throw out errors if the output is not the same format as specified...He made something like I'd like to give +1 for making +1 for explicitly specifying whether the function will be applied to a DataFrame or to a Series. it is quite frustrating to predict it! |
pandas already has |
these already exist .agg and .transform |
Yes, if usage of Currently, no. we can use |
Here is an example that df = pandas.DataFrame(
{
"A": ['a', 'a', 'b', 'b', 'b'],
"B": [0,1,1,1,0],
"C": [1, 2, 3, 4, 5],
"D": [1, 2, 3, 4, 5],
}
)
df.agg(lambda x: x)
## A B C D
## 0 a 0 1 1
## 1 a 1 2 2
## 2 b 1 3 3
## 3 b 1 4 4
## 4 b 0 5 5 Compare the above with the result below df.groupby(["A", "B"]).agg(lambda x: x)
## C D
## A B
## a 0 1 1
## 1 2 2
## b 0 5 5
## 1 [3, 4] [3, 4] One thing I do not understand is why Isn't it just that applying |
From #34998 (comment)
IMO,
groupby.apply
should not be a one-stop-shop for all your UDF needs. If users want to use a reducer, useagg
; if they want to use a transformer, usetransform
; if they want to use a filter, usefilter
. The role ofapply
, to me, should be for UDFs that don't fit into these roles. In this sense, I thinkapply
should not go out of its way to generate more convenient/sensible output for these classes of UDFs.The result of a groupby is a mapping from each group to a python object. Whatever behavior is decided for
apply
, it should be well-defined and entire with no more assumptions than that. Here is a naive attempt at doing so.apply
(notgroupby
) to control if the groups are added to the index, columns, or neither.Perhaps numpy-specific support should be added in. Probably things need to be added to support UDFs that reach
groupby.apply
via resample/window, but I'm not familiar enough with that code to reason about it.The text was updated successfully, but these errors were encountered: