-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG:dataframe.groupby('some_column').timedelta.sum() results wrong when timedelta contains NaT (pandas=1.3.0) #42659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please povide something which is copy and pasteable (see https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) also please format your code |
updated |
I also encountered this problem, it seems quite a major bug. @phofl Here is a full copy-pastable example : import pandas as pd
df = pd.DataFrame({
'a': [1, 1, 2, 2],
'b': [pd.Timedelta('1d'), pd.Timedelta('2d'), pd.Timedelta('3d'), pd.NaT],
})
print('\ndf:')
print(df)
print('\ndf.dtypes:')
print(df.dtypes)
print('\ndf.groupby(\'a\').b.sum():')
print(df.groupby('a').b.sum()) Output of
|
changing milestone to 1.3.5 |
@phofl @simonjayhawkins, thanks for handling this request. Do you think this problem will be fixed, or is it considered to be a bad practice to use groupby & mean/sum on |
A PR addressing this would be welcome. |
good to fix but not necessary for 1.3.x |
Best guess is a |
[y] I have checked that this issue has not already been reported.
[y] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
this is a tiny example
Problem description
dataframe.groupby('some_column').timedelta.sum() results wrong when timedelta contains NaT
Expected Output
Output of
pd.show_versions()
[pandas 1.3.0]
The text was updated successfully, but these errors were encountered: