Skip to content

Cumsum not available to column of list-likes in groupby #40488

Closed
@rben01

Description

@rben01
  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

df = pd.DataFrame({'a': [1,1,2], 'b': [[10],[11],[12]]})

# ok:
>>> df['b'].cumsum()
0            [10]
1        [10, 11]
2    [10, 11, 12]
Name: b, dtype: object

# error:
>>> df.groupby('a')['b'].cumsum()
DataError: No numeric types to aggregate

Problem description

Given that taking the cumsum of a column of list-likes is allowed, it should similarly be allowed when doing the same to a groupby object.

Expected Output

a
0        [10]
1    [10, 11]
2        [12]
Name: b, dtype: object

Workaround

This is only a partial workaround because it loses the index name, which is normally kept when doing a groupby.

df.groupby('a')['b'].apply(lambda col: col.cumsum())

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : b5958ee1999e9aead1938c0bba2b674378807b3d
python           : 3.8.7.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 20.4.0
Version          : Darwin Kernel Version 20.4.0: Fri Mar  5 03:57:04 PST 2021; root:xnu-7195.101.1~4/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.5
numpy            : 1.19.4
pytz             : 2020.5
dateutil         : 2.8.1
pip              : None
setuptools       : 50.3.1.post0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.5.2
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : 1.3.2
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.3
numexpr          : 2.7.2
odfpy            : None
openpyxl         : 3.0.5
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : 1.6.0
sqlalchemy       : 1.3.20
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions