-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Unexpected side effects within agg
function
#44813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It seems that pandas tries to pass series as well as dataframe into the func in agg: For most single user-defined function(except for np.min, np.std ...), the Resampler._groupby_and_aggregate() will be invoked. Firstly the function is tried to perform column by column, if it fails, then the function is tried to perform on the dataframe of one group. In your example, function f() can perform on series as well dataframe and due to the priority in _groupby_and_aggregate(), it actually invoked column by column(the type annotation does not take effect). As for function g(), only dataframe can be valid input, so the output is only one column filling with 2. pandas/pandas/core/resample.py Lines 418 to 460 in 3397585
Maybe it is usefull to adjust the invoking logic of agg to use the information of type annotation. |
Using type annotations to inform the execution is a decent option, although I believe it would be a first in the pandas API. Perhaps a better idea is an |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
The code should be fairly self-explanatory, but the idea is that I've defined two functions,
f
andg
. Both take in a DataFrame and return the number 2, butg
first sums the DataFrame'sA
column and discards the result. This action unexplainably alters the output ofdf.resample("D").agg
. When we don't sum theA
column, we get back a full DataFrame with bothA
andB
columns. When we do sum theA
column, we get back a Series.Expected Behavior
The output of
df.resample("D").agg(lambda x: f(x))
anddf.resample("D").agg(lambda x: g(x))
should be exactly the same, since both functions return the same thing,Installed Versions
The text was updated successfully, but these errors were encountered: