BUG: not possible to 'cumsum' Timedelta with named aggregation #41720

yohplala · 2021-05-29T16:25:49Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
import random

# Dataset
ts = pd.DatetimeIndex([pd.Timestamp('2021/01/01 00:30'),
                       pd.Timestamp('2021/01/01 00:45'),
                       pd.Timestamp('2021/01/01 02:00'),
                       pd.Timestamp('2021/01/01 03:50'),
                       pd.Timestamp('2021/01/01 05:00')])
length = len(ts)
random.seed(1)
value = random.sample(range(1, length+1), length)
df = pd.DataFrame({'value': value, 'ts': ts})
df['td'] = df['ts'] - df['ts'].shift(1, fill_value=ts[0]-pd.Timedelta('1h'))
df['amount'] = df['value']*10
df.loc[:2,'grps'] = 'a'
df.loc[2:,'grps'] = 'b'

In [16]: df
Out[16]: 
   value                  ts              td  amount grps
0      2 2021-01-01 00:30:00 0 days 01:00:00      20    a
1      1 2021-01-01 00:45:00 0 days 00:15:00      10    a
2      5 2021-01-01 02:00:00 0 days 01:15:00      50    b
3      4 2021-01-01 03:50:00 0 days 01:50:00      40    b
4      3 2021-01-01 05:00:00 0 days 01:10:00      30    b

# cumsum and Timedelta work when used through pd.Series
df['td'].cumsum()

# cumsum and Timedelta do not work when used in groupby with named aggregation
agg_rules = {'td':('td', 'cumsum')}
res = df.groupby('grps').agg(**agg_rules)

Error message that is returned

res = df.groupby('grps').agg(**agg_rules)
Traceback (most recent call last):

  File "<ipython-input-14-c0f4606d675a>", line 2, in <module>
    res = df.groupby('grps').agg(**agg_rules)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 945, in aggregate
    result, how = aggregate(self, func, *args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 582, in aggregate
    return agg_dict_like(obj, arg, _axis), True

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in agg_dict_like
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/aggregation.py", line 768, in <dictcomp>
    results = {key: obj._gotitem(key, ndim=1).agg(how) for key, how in arg.items()}

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 247, in aggregate
    ret = self._aggregate_multiple_funcs(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 315, in _aggregate_multiple_funcs
    results[base.OutputKey(label=name, position=idx)] = obj.aggregate(func)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 241, in aggregate
    return getattr(self, func)(*args, **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 2497, in cumsum
    return self._cython_transform("cumsum", **kwargs)

  File "/home/pierre/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 996, in _cython_transform
    raise DataError("No numeric types to aggregate")

DataError: No numeric types to aggregate

Problem description

Cumsum works with Timedelta in some workflows, but for some other workflows (groupby, named aggregation) it does not.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-53-generic
Version : #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 1.2.4
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.1
setuptools : 52.0.0.post20210125
Cython : 0.29.23
pytest : 6.2.3
hypothesis : None
sphinx : 4.0.1
blosc : None
feather : None
xlsxwriter : 1.3.8
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.0
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.9.0
fastparquet : 0.6.3
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.51.2

The text was updated successfully, but these errors were encountered:

phofl · 2021-05-29T17:52:56Z

The error is raised, because cumsum in groupby sets numeric_only=True as default. I think the correct way to call this is:

agg_rules = {'td': 'cumsum'}
res = df.groupby('grps').agg(agg_rules, **{"numeric_only": False})

but the error persists, because the kwargs are not passed through.

WillAyd · 2022-12-02T03:19:35Z

Looks like this is fixed on main. Could use a test

seanjedi · 2022-12-02T23:19:45Z

I can take this and work on making a test

seanjedi · 2022-12-02T23:20:32Z

take

seanjedi · 2022-12-03T01:17:15Z

@WillAyd Are the results of the two examples above supposed to be the same?
I am trying to create a test utilizing some of the same logic as the sample above, however when I check the results, the results are different
Is this a bug?

WillAyd · 2022-12-03T01:21:52Z

No they aren't the same - the first is just a normal cumulative whereas the latter is a groupby. So should respect the groupings

seanjedi · 2022-12-03T01:36:37Z

@WillAyd Added a new test here: #50033
Im sure there will be a lot of things I can improve, as this is my first test with Pandas 😅

yohplala added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2021

mroeschke added Apply Apply, Aggregate, Transform, Map Groupby Timedelta Timedelta data type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021

WillAyd added the Needs Tests Unit test(s) needed to prevent regressions label Dec 2, 2022

github-actions bot assigned seanjedi Dec 2, 2022

seanjedi mentioned this issue Dec 3, 2022

TST: adding new test for groupby cumsum with named aggregate #50033

Merged

4 tasks

rhshadrach closed this as completed in #50033 Dec 8, 2022

omor1 mentioned this issue Mar 21, 2024

BUG: IndexError with pandas.DataFrame.cumsum where dtype=timedelta64[ns] #57956

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: not possible to 'cumsum' Timedelta with named aggregation #41720

BUG: not possible to 'cumsum' Timedelta with named aggregation #41720

yohplala commented May 29, 2021

INSTALLED VERSIONS

phofl commented May 29, 2021

WillAyd commented Dec 2, 2022

seanjedi commented Dec 2, 2022

seanjedi commented Dec 2, 2022

seanjedi commented Dec 3, 2022

WillAyd commented Dec 3, 2022

seanjedi commented Dec 3, 2022

BUG: not possible to 'cumsum' Timedelta with named aggregation #41720

BUG: not possible to 'cumsum' Timedelta with named aggregation #41720

Comments

yohplala commented May 29, 2021

Code Sample, a copy-pastable example

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

phofl commented May 29, 2021

WillAyd commented Dec 2, 2022

seanjedi commented Dec 2, 2022

seanjedi commented Dec 2, 2022

seanjedi commented Dec 3, 2022

WillAyd commented Dec 3, 2022

seanjedi commented Dec 3, 2022

Output of `pd.show_versions()`