DataFrameGroupBy.cumsum(axis=0) fails when grouped on axis=1 #21127

kunalgosar · 2018-05-18T23:30:47Z

Code Sample, a copy-pastable example if possible

In [3]: df = pd.DataFrame({'col1': [0, 1, 2, 3], 'col2': [4, 5, 6, 7],
   ...:                   'col3': [8, 9, 10, 11], 'col4': [12, 13, 14, 15],
   ...:                   'col5': [0, 0, 0, 0]})
   ...:                   

In [4]: df.groupby(by=[1, 2, 3, 1, 2], axis=1).cumsum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/dev/pandas/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4872         blocks = form_blocks(arrays, names, axes)
-> 4873         mgr = BlockManager(blocks, axes)
   4874         mgr._consolidate_inplace()

~/dev/pandas/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check)
   3281         if do_integrity_check:
-> 3282             self._verify_integrity()
   3283 

~/dev/pandas/pandas/core/internals.py in _verify_integrity(self)
   3492             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3493                 construction_error(tot_items, block.shape[1:], self.axes)
   3494         if len(self.items) != tot_items:

~/dev/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4842     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4843         passed, implied))
   4844 

ValueError: Shape of passed values is (4, 5), indices imply (4, 4)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-4-11b5865f27e2> in <module>()
----> 1 df.groupby(by=[1, 2, 3, 1, 2], axis=1).cumsum()

~/dev/pandas/pandas/core/groupby/groupby.py in cumsum(self, axis, *args, **kwargs)
   1926             return self.apply(lambda x: x.cumsum(axis=axis, **kwargs))
   1927 
-> 1928         return self._cython_transform('cumsum', **kwargs)
   1929 
   1930     @Substitution(name='groupby')

~/dev/pandas/pandas/core/groupby/groupby.py in _cython_transform(self, how, numeric_only, **kwargs)
   1036             raise DataError('No numeric types to aggregate')
   1037 
-> 1038         return self._wrap_transformed_output(output, names)
   1039 
   1040     def _cython_agg_general(self, how, alt=None, numeric_only=True,

~/dev/pandas/pandas/core/groupby/groupby.py in _wrap_transformed_output(self, output, names)
   4737 
   4738     def _wrap_transformed_output(self, output, names=None):
-> 4739         return DataFrame(output, index=self.obj.index)
   4740 
   4741     def _wrap_agged_blocks(self, items, blocks):

~/dev/pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    346                                  dtype=dtype, copy=copy)
    347         elif isinstance(data, dict):
--> 348             mgr = self._init_dict(data, index, columns, dtype=dtype)
    349         elif isinstance(data, ma.MaskedArray):
    350             import numpy.ma.mrecords as mrecords

~/dev/pandas/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    457             arrays = [data[k] for k in keys]
    458 
--> 459         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    460 
    461     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

~/dev/pandas/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   7321     axes = [_ensure_index(columns), _ensure_index(index)]
   7322 
-> 7323     return create_block_manager_from_arrays(arrays, arr_names, axes)
   7324 
   7325 

~/dev/pandas/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4875         return mgr
   4876     except ValueError as e:
-> 4877         construction_error(len(arrays), arrays[0].shape, axes, e)
   4878 
   4879 

~/dev/pandas/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4841         raise ValueError("Empty data passed with indices specified.")
   4842     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4843         passed, implied))
   4844 
   4845 

ValueError: Shape of passed values is (4, 5), indices imply (4, 4)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: d7c6e01
python: 3.6.5.final.0
python-bits: 64
OS: Darwin
OS-release: 17.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0rc2+30.gd7c6e0130
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.2
scipy: 1.0.1
pyarrow: 0.9.0
xarray: 0.10.3
IPython: 6.3.1
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: 0.8.0
psycopg2: None
jinja2: 2.10
s3fs: 0.1.4
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-05-19T02:22:01Z

Just to confirm what is your expected output here?

kunalgosar · 2018-05-20T19:58:03Z

I would expect this not to error and to have similar behavior to that when grouping by rows. It should just compute the cumulative sum across columns for each group, or if this does not make sense, at least throw a more descriptive exception specifying that this operation is not supported on axis=1.

It looks like all the cumulative functions fail when grouping by axis=1 (cumsum, cummax, cummin, cumprod).

When grouping on axis=0, cumsum(axis=0) and cumsum(axis=1) both work fine. But on axis=1 only cumsum(axis=1) works.

gfyoung · 2018-05-21T06:58:45Z

@kunalgosar : Indeed, this does look weird. Patch is more than welcome!

msmarchena · 2018-05-24T13:47:38Z

looks like all the cumulative functions fail when grouping by axis=1 (cumsum, cummax, cummin, cumprod).

All the referred functions have as default axis=0 , when using axis=1 it works as expected

df.groupby(by=[1, 2, 3, 1, 2], axis=1).cumsum(axis=1)
   col1  col2  col3  col4  col5
0     0     4     8    12     4
1     1     5     9    14     5
2     2     6    10    16     6
3     3     7    11    18     7

but

 df.groupby(by=[1, 2, 3, 1, 2], axis=1).cumsum()

gives

ValueError: Shape of passed values is (4, 5), indices imply (4, 4)

When grouping on axis=0, cumsum(axis=0) and cumsum(axis=1) both work fine. But on axis=1 only cumsum(axis=1) works.

My understanding here is that groupby is going to be applied to the rows. Since the df only has 4 rows I need to have only 4 arguments here. Otherwise, it doesn't make sense.

For instance, using

df0=df.groupby(by=[1, 2, 3, 1]).cumsum() #default axis=0 in groupby() and cumsum()

gives

   col1  col2  col3  col4  col5
0     0     4     8    12     0
1     1     5     9    13     0
2     2     6    10    14     0
3     3    11    19    27     0

and this results works as expected. But

df0=df.groupby(by=[1, 2, 3, 1]).cumsum(axis=1)

gives

   col1  col2  col3  col4  col5
0     0     4    12    24    24
1     1     6    15    28    28
2     2     8    18    32    32
3     3    10    21    36    36

It returns the cumsumover columns, which is weird. The same logic applies to cummax, cummin and cumprod. To me it makes sense to have an error in your case. But maybe I'm missing something.

rhshadrach · 2024-03-02T21:22:48Z

axis=1 is now deprecated; closing.

gfyoung added Groupby Bug labels May 21, 2018

jreback mentioned this issue Aug 2, 2019

BUG: grouby(axis=1) cannot select column names #27700

Merged

5 tasks

mroeschke added Enhancement Error Reporting Incorrect or improved errors from pandas Groupby and removed Bug Groupby labels Jun 19, 2021

rhshadrach closed this as completed Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrameGroupBy.cumsum(axis=0) fails when grouped on axis=1 #21127

DataFrameGroupBy.cumsum(axis=0) fails when grouped on axis=1 #21127

kunalgosar commented May 18, 2018 •

edited by WillAyd

Loading

INSTALLED VERSIONS

WillAyd commented May 19, 2018

kunalgosar commented May 20, 2018 •

edited

Loading

gfyoung commented May 21, 2018

msmarchena commented May 24, 2018 •

edited

Loading

rhshadrach commented Mar 2, 2024

DataFrameGroupBy.cumsum(axis=0) fails when grouped on axis=1 #21127

DataFrameGroupBy.cumsum(axis=0) fails when grouped on axis=1 #21127

Comments

kunalgosar commented May 18, 2018 • edited by WillAyd Loading

Code Sample, a copy-pastable example if possible

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented May 19, 2018

kunalgosar commented May 20, 2018 • edited Loading

gfyoung commented May 21, 2018

msmarchena commented May 24, 2018 • edited Loading

rhshadrach commented Mar 2, 2024

kunalgosar commented May 18, 2018 •

edited by WillAyd

Loading

Output of `pd.show_versions()`

kunalgosar commented May 20, 2018 •

edited

Loading

msmarchena commented May 24, 2018 •

edited

Loading