BUG: SeriesGroupBy.transform should raise with axis=1 #36321

arw2019 · 2020-09-13T02:57:47Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

The dropna=True case runs

In [7]: df = pd.DataFrame({"A": [0, 0, 7, 1], "B": [1, 2, 4, 3]}, index=list('abcd')) 
   ...: gb = df.groupby('B', dropna=True, axis=1) 
   ...: gb['B'].transform(len)                                                                         
Out[7]: Series([], Name: B, dtype: int64)

but with dropna=False the call to transform throws with a somewhat unfriendly message

In [9]: df = pd.DataFrame({"A": [0, 0, 7, 1], "B": [1, 2, 4, 3]}, index=list('abcd')) 
   ...: gb = df.groupby('B', dropna=False, axis=1) 
   ...: gb['B'].transform(len)                    
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-c7271aa13f24> in <module>
      1 df = pd.DataFrame({"A": [0, 0, 7, 1], "B": [1, 2, 4, 3]}, index=list('abcd'))
      2 gb = df.groupby('B', dropna=False, axis=1)
----> 3 gb['B'].transform(len)

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    506 
    507         if not isinstance(func, str):
--> 508             return self._transform_general(func, *args, **kwargs)
    509 
    510         elif func not in base.transform_kernel_allowlist:

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in _transform_general(self, func, *args, **kwargs)
    546 
    547             concatenated = concat(results)
--> 548             result = self._set_result_index_ordered(concatenated)
    549         else:
    550             result = self.obj._constructor(dtype=np.float64)

/workspaces/pandas-arw2019/pandas/core/groupby/groupby.py in _set_result_index_ordered(self, result)
    691             result = result.sort_index(axis=self.axis)
    692 
--> 693         result.set_axis(self.obj._get_axis(self.axis), axis=self.axis, inplace=True)
    694         return result
    695 

/workspaces/pandas-arw2019/pandas/core/series.py in set_axis(self, labels, axis, inplace)
   4365     @Appender(generic.NDFrame.set_axis.__doc__)
   4366     def set_axis(self, labels, axis: Axis = 0, inplace: bool = False):
-> 4367         return super().set_axis(labels, axis=axis, inplace=inplace)
   4368 
   4369     @doc(

/workspaces/pandas-arw2019/pandas/core/generic.py in set_axis(self, labels, axis, inplace)
    657         """
    658         self._check_inplace_and_allows_duplicate_labels(inplace)
--> 659         return self._set_axis_nocheck(labels, axis, inplace)
    660 
    661     def _set_axis_nocheck(self, labels, axis: Axis, inplace: bool):

/workspaces/pandas-arw2019/pandas/core/generic.py in _set_axis_nocheck(self, labels, axis, inplace)
    662         # NDFrame.rename with inplace=False calls set_axis(inplace=True) on a copy.
    663         if inplace:
--> 664             setattr(self, self._get_axis_name(axis), labels)
    665         else:
    666             obj = self.copy()

/workspaces/pandas-arw2019/pandas/core/generic.py in __setattr__(self, name, value)
   5384         try:
   5385             object.__getattribute__(self, name)
-> 5386             return object.__setattr__(self, name, value)
   5387         except AttributeError:
   5388             pass

/workspaces/pandas-arw2019/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
     64 
     65     def __set__(self, obj, value):
---> 66         obj._set_axis(self.axis, value)

/workspaces/pandas-arw2019/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    425         if not fastpath:
    426             # The ensure_index call above ensures we have an Index object
--> 427             self._mgr.set_axis(axis, labels)
    428 
    429     # ndarray compatibility

/workspaces/pandas-arw2019/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    217 
    218         if new_len != old_len:
--> 219             raise ValueError(
    220                 f"Length mismatch: Expected axis has {old_len} elements, new "
    221                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 2 elements, new values have 4 elements

Expected Output

Since there are no missing values in the input dropna=True and dropna=False should give the same results.

Problem description

(Edited)
As discussed below and in #35751 SeriesGroupBy with axis=1 never makes sense. We should have a more explicit error message.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 0cdea22
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-47-generic
Version : #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+354.g0cdea2261
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

The text was updated successfully, but these errors were encountered:

arw2019 · 2020-09-13T02:57:55Z

take

rhshadrach · 2020-09-13T12:38:54Z

I think allowing axis=1 for a Series is largely (entirely?) not implemented. Doing pd.Series([1, 2, 3]).groupby([0], axis=1) raises, and I can't find any way of making something like:

df = pd.DataFrame({"A": [0, 0, 7, 1]})
df.groupby([0], axis=1)['A'].sum()

work. I think this "backdoor" to getting a SeriesGroupBy with axis=1 is a bug. See #35443. I'm thinking the only case where it maybe makes sense to do this this when as_index=False.

arw2019 · 2020-09-13T21:21:25Z

I think I agree that it's a bug, except possibly the as_index=False case.

TODOs:

should we add tests for this?
do you think it's a good idea to throw specific error messages when a user tries this? If yes I'm happy to take a stab at that and see how complicated it gets / if it's worth the effort

rhshadrach · 2021-01-21T04:20:09Z

@arw2019 - Looks to me like this was closed along with #37725 (which I think should have been marked as a duplicate of this). Do you agree?

arw2019 · 2021-01-21T05:19:19Z

Yes, agreed. Duplicate of the other issue and can be closed.

rhshadrach · 2021-01-22T02:40:36Z

Thanks @arw2019

arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 13, 2020

github-actions bot assigned arw2019 Sep 13, 2020

arw2019 mentioned this issue Sep 13, 2020

BUG: DataFrame.groupby(., dropna=True, axis=0) incorrectly throws ShapeError #35751

Merged

5 tasks

arw2019 changed the title ~~BUG: SeriesGroupBy.transform fails for axis=1, dropna=False~~ BUG: SeriesGroupBy.transform should raise with axis=1 Sep 25, 2020

rhshadrach added Error Reporting Incorrect or improved errors from pandas Groupby Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2021

rhshadrach removed the Closing Candidate May be closeable, needs more eyeballs label Jan 22, 2021

rhshadrach closed this as completed Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: SeriesGroupBy.transform should raise with axis=1 #36321

BUG: SeriesGroupBy.transform should raise with axis=1 #36321

arw2019 commented Sep 13, 2020 •

edited

Loading

INSTALLED VERSIONS

arw2019 commented Sep 13, 2020

rhshadrach commented Sep 13, 2020 •

edited

Loading

arw2019 commented Sep 13, 2020

rhshadrach commented Jan 21, 2021

arw2019 commented Jan 21, 2021

rhshadrach commented Jan 22, 2021

BUG: SeriesGroupBy.transform should raise with axis=1 #36321

BUG: SeriesGroupBy.transform should raise with axis=1 #36321

Comments

arw2019 commented Sep 13, 2020 • edited Loading

Code Sample, a copy-pastable example

Expected Output

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

arw2019 commented Sep 13, 2020

rhshadrach commented Sep 13, 2020 • edited Loading

arw2019 commented Sep 13, 2020

rhshadrach commented Jan 21, 2021

arw2019 commented Jan 21, 2021

rhshadrach commented Jan 22, 2021

arw2019 commented Sep 13, 2020 •

edited

Loading

Output of `pd.show_versions()`

rhshadrach commented Sep 13, 2020 •

edited

Loading