-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: groupby.apply result doesn't have consistent shape for single vs multiple group cases #42749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems to me to be an example where In this case, the fast_path of However, it seems to me the fast_path / slow_path makes pandas prone to inconsistencies, depending on path chosen, and hard for users to use as what object their transformer gets depends on how it behaves. Instead, could we allow user control of this by adding an argument, e.g.:
vs
This would at least give control, and the user would know whether the UDF provided should accept a DataFrame or Series. |
I ran into this problem as well these days. On the original approach with pandas/pandas/core/groupby/generic.py Line 1109 in 7214f85
The variable I doubt a fix is possible without introducing an inconsistency for the use case where On the approach of using |
This is currently true. Also, the function must be able to act column-by-column. What I'm discussing is removing these restrictions. To me, even something like the following makes sense as a transform, and currently works with the fast-path (have to also disable reindexing the columns of the result):
I don't follow why this would need to be done - can you expand on this? The key feature needed about the result from the supplied UDF is that it needs to be able to be either broadcasted or aligned to the input group's index. However thought needs to be put into the logic of whether to broadcast or align. |
What you describe seems like a useful extension of
Broadcasting in the sense of aligning to the group's index totally makes sense to me. I was more thinking in the logic of the current implementation, which requires to also preserve the columns: If the group is a data frame of shape However, with the extension that you describe, only the index would need to be preserved and it would be okay to have different columns. Then my comment about broadcasting would not apply. |
[ x ] I have checked that this issue has not already been reported.
[ x ] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
res_multi and res_single have inconsistent shapes.
Output in res_multi is correct, output in res_single have index level[0] in columns instead on in index.
Output seems to be consistent, after calling something like (but I am not sure how general it is..): res_single.swapaxis(level = 1).unstack()
So something like this is possible fix.
(I believe it should be handled in pandas, because otherwise all downstream code have to handle single group case separately.)
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : c7f7443
python : 3.8.10.final.0
python-bits : 64
OS : Linux
OS-release : 5.13.5-xanmod1
Version : #0~git20210725.ec5a2ac SMP PREEMPT Sun Jul 25 17:24:47 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.1
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.2
setuptools : 52.0.0.post20210125
Cython : None
pytest : 6.2.4
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.24.1
pandas_datareader: 0.10.0
bs4 : None
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : 1.4.19
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : 0.53.1
The text was updated successfully, but these errors were encountered: