Skip to content

BUG: SeriesGroupBy.transform should raise with axis=1 #36321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
arw2019 opened this issue Sep 13, 2020 · 6 comments
Closed
3 tasks done

BUG: SeriesGroupBy.transform should raise with axis=1 #36321

arw2019 opened this issue Sep 13, 2020 · 6 comments
Assignees
Labels
Bug Error Reporting Incorrect or improved errors from pandas Groupby

Comments

@arw2019
Copy link
Member

arw2019 commented Sep 13, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

The dropna=True case runs

In [7]: df = pd.DataFrame({"A": [0, 0, 7, 1], "B": [1, 2, 4, 3]}, index=list('abcd')) 
   ...: gb = df.groupby('B', dropna=True, axis=1) 
   ...: gb['B'].transform(len)                                                                         
Out[7]: Series([], Name: B, dtype: int64)

but with dropna=False the call to transform throws with a somewhat unfriendly message

In [9]: df = pd.DataFrame({"A": [0, 0, 7, 1], "B": [1, 2, 4, 3]}, index=list('abcd')) 
   ...: gb = df.groupby('B', dropna=False, axis=1) 
   ...: gb['B'].transform(len)                    
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-c7271aa13f24> in <module>
      1 df = pd.DataFrame({"A": [0, 0, 7, 1], "B": [1, 2, 4, 3]}, index=list('abcd'))
      2 gb = df.groupby('B', dropna=False, axis=1)
----> 3 gb['B'].transform(len)

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    506 
    507         if not isinstance(func, str):
--> 508             return self._transform_general(func, *args, **kwargs)
    509 
    510         elif func not in base.transform_kernel_allowlist:

/workspaces/pandas-arw2019/pandas/core/groupby/generic.py in _transform_general(self, func, *args, **kwargs)
    546 
    547             concatenated = concat(results)
--> 548             result = self._set_result_index_ordered(concatenated)
    549         else:
    550             result = self.obj._constructor(dtype=np.float64)

/workspaces/pandas-arw2019/pandas/core/groupby/groupby.py in _set_result_index_ordered(self, result)
    691             result = result.sort_index(axis=self.axis)
    692 
--> 693         result.set_axis(self.obj._get_axis(self.axis), axis=self.axis, inplace=True)
    694         return result
    695 

/workspaces/pandas-arw2019/pandas/core/series.py in set_axis(self, labels, axis, inplace)
   4365     @Appender(generic.NDFrame.set_axis.__doc__)
   4366     def set_axis(self, labels, axis: Axis = 0, inplace: bool = False):
-> 4367         return super().set_axis(labels, axis=axis, inplace=inplace)
   4368 
   4369     @doc(

/workspaces/pandas-arw2019/pandas/core/generic.py in set_axis(self, labels, axis, inplace)
    657         """
    658         self._check_inplace_and_allows_duplicate_labels(inplace)
--> 659         return self._set_axis_nocheck(labels, axis, inplace)
    660 
    661     def _set_axis_nocheck(self, labels, axis: Axis, inplace: bool):

/workspaces/pandas-arw2019/pandas/core/generic.py in _set_axis_nocheck(self, labels, axis, inplace)
    662         # NDFrame.rename with inplace=False calls set_axis(inplace=True) on a copy.
    663         if inplace:
--> 664             setattr(self, self._get_axis_name(axis), labels)
    665         else:
    666             obj = self.copy()

/workspaces/pandas-arw2019/pandas/core/generic.py in __setattr__(self, name, value)
   5384         try:
   5385             object.__getattribute__(self, name)
-> 5386             return object.__setattr__(self, name, value)
   5387         except AttributeError:
   5388             pass

/workspaces/pandas-arw2019/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()
     64 
     65     def __set__(self, obj, value):
---> 66         obj._set_axis(self.axis, value)

/workspaces/pandas-arw2019/pandas/core/series.py in _set_axis(self, axis, labels, fastpath)
    425         if not fastpath:
    426             # The ensure_index call above ensures we have an Index object
--> 427             self._mgr.set_axis(axis, labels)
    428 
    429     # ndarray compatibility

/workspaces/pandas-arw2019/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    217 
    218         if new_len != old_len:
--> 219             raise ValueError(
    220                 f"Length mismatch: Expected axis has {old_len} elements, new "
    221                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 2 elements, new values have 4 elements
                                                     

Expected Output

Since there are no missing values in the input dropna=True and dropna=False should give the same results.

Problem description

(Edited)
As discussed below and in #35751 SeriesGroupBy with axis=1 never makes sense. We should have a more explicit error message.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 0cdea22
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-47-generic
Version : #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.0.dev0+354.g0cdea2261
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.19.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1

@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 13, 2020
@arw2019
Copy link
Member Author

arw2019 commented Sep 13, 2020

take

@rhshadrach
Copy link
Member

rhshadrach commented Sep 13, 2020

I think allowing axis=1 for a Series is largely (entirely?) not implemented. Doing pd.Series([1, 2, 3]).groupby([0], axis=1) raises, and I can't find any way of making something like:

df = pd.DataFrame({"A": [0, 0, 7, 1]})
df.groupby([0], axis=1)['A'].sum()

work. I think this "backdoor" to getting a SeriesGroupBy with axis=1 is a bug. See #35443. I'm thinking the only case where it maybe makes sense to do this this when as_index=False.

@arw2019
Copy link
Member Author

arw2019 commented Sep 13, 2020

I think I agree that it's a bug, except possibly the as_index=False case.

TODOs:

  • should we add tests for this?
  • do you think it's a good idea to throw specific error messages when a user tries this? If yes I'm happy to take a stab at that and see how complicated it gets / if it's worth the effort

@arw2019 arw2019 changed the title BUG: SeriesGroupBy.transform fails for axis=1, dropna=False BUG: SeriesGroupBy.transform should raise with axis=1 Sep 25, 2020
@rhshadrach
Copy link
Member

@arw2019 - Looks to me like this was closed along with #37725 (which I think should have been marked as a duplicate of this). Do you agree?

@rhshadrach rhshadrach added Error Reporting Incorrect or improved errors from pandas Groupby Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2021
@arw2019
Copy link
Member Author

arw2019 commented Jan 21, 2021

Yes, agreed. Duplicate of the other issue and can be closed.

@rhshadrach rhshadrach removed the Closing Candidate May be closeable, needs more eyeballs label Jan 22, 2021
@rhshadrach
Copy link
Member

Thanks @arw2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

No branches or pull requests

2 participants