Skip to content

BUG: not possible to 'cumsum' Timedelta with named aggregation #41720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
yohplala opened this issue May 29, 2021 · 7 comments · Fixed by #50033
Closed
2 of 3 tasks

BUG: not possible to 'cumsum' Timedelta with named aggregation #41720

yohplala opened this issue May 29, 2021 · 7 comments · Fixed by #50033
Assignees
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Needs Tests Unit test(s) needed to prevent regressions Timedelta Timedelta data type

Comments

@yohplala
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
import random

# Dataset
ts = pd.DatetimeIndex([pd.Timestamp('2021/01/01 00:30'),
                       pd.Timestamp('2021/01/01 00:45'),
                       pd.Timestamp('2021/01/01 02:00'),
                       pd.Timestamp('2021/01/01 03:50'),
                       pd.Timestamp('2021/01/01 05:00')])
length = len(ts)
random.seed(1)
value = random.sample(range(1, length+1), length)
df = pd.DataFrame({'value': value, 'ts': ts})
df['td'] = df['ts'] - df['ts'].shift(1, fill_value=ts[0]-pd.Timedelta('1h'))
df['amount'] = df['value']*10
df.loc[:2,'grps'] = 'a'
df.loc[2:,'grps'] = 'b'
In [16]: df
Out[16]: 
   value                  ts              td  amount grps
0      2 2021-01-01 00:30:00 0 days 01:00:00      20    a
1      1 2021-01-01 00:45:00 0 days 00:15:00      10    a
2      5 2021-01-01 02:00:00 0 days 01:15:00      50    b
3      4 2021-01-01 03:50:00 0 days 01:50:00      40    b
4      3 2021-01-01 05:00:00 0 days 01:10:00      30    b
# cumsum and Timedelta work when used through pd.Series
df['td'].cumsum()

# cumsum and Timedelta do not work when used in groupby with named aggregation
agg_rules = {'td':('td', 'cumsum')}
res = df.groupby('grps').agg(**agg_rules)

Error message that is returned

res = df.groupby('grps').agg(**agg_rules)
Traceback (most recent call last):

  File "<ipython-input-14-c0f4606d675a>", line 2, in <module>
    res = df.groupby('grps').agg(**agg_rules)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 945, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 582, in aggregate
    return agg_dict_like(obj, arg, _axis), True

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in agg_dict_like
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in <dictcomp>
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 247, in aggregate
    ret = self._aggregate_multiple_funcs(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 315, in _aggregate_multiple_funcs
    results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 241, in aggregate
    return getattr(self, func)(*args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 2497, in cumsum
    return self._cython_transform("cumsum", **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 996, in _cython_transform
    raise DataError("No numeric types to aggregate")

DataError: No numeric types to aggregate

Problem description

Cumsum works with Timedelta in some workflows, but for some other workflows (groupby, named aggregation) it does not.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-53-generic
Version : #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 1.2.4
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 4.0.1
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.0
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.9.0
fastparquet : 0.6.3
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.51.2

@yohplala yohplala added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2021
@phofl
Copy link
Member

phofl commented May 29, 2021

The error is raised, because cumsum in groupby sets numeric_only=True as default. I think the correct way to call this is:

agg_rules = {'td': 'cumsum'}
res = df.groupby('grps').agg(agg_rules, **{"numeric_only": False})

but the error persists, because the kwargs are not passed through.

@mroeschke mroeschke added Apply Apply, Aggregate, Transform, Map Groupby Timedelta Timedelta data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021
@WillAyd
Copy link
Member

WillAyd commented Dec 2, 2022

Looks like this is fixed on main. Could use a test

@WillAyd WillAyd added the Needs Tests Unit test(s) needed to prevent regressions label Dec 2, 2022
@seanjedi
Copy link
Contributor

seanjedi commented Dec 2, 2022

I can take this and work on making a test

@seanjedi
Copy link
Contributor

seanjedi commented Dec 2, 2022

take

@seanjedi
Copy link
Contributor

seanjedi commented Dec 3, 2022

@WillAyd Are the results of the two examples above supposed to be the same?
I am trying to create a test utilizing some of the same logic as the sample above, however when I check the results, the results are different
Is this a bug?
image

@WillAyd
Copy link
Member

WillAyd commented Dec 3, 2022

No they aren't the same - the first is just a normal cumulative whereas the latter is a groupby. So should respect the groupings

@seanjedi
Copy link
Contributor

seanjedi commented Dec 3, 2022

@WillAyd Added a new test here: #50033
Im sure there will be a lot of things I can improve, as this is my first test with Pandas 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby Needs Tests Unit test(s) needed to prevent regressions Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants