-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame.GroupBy.apply has unexpected returns in some cases #31063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It looks like there's a check for this type of case here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/generic.py#L1268 that sends the code down separate paths depending on the number of unique values. In the case where we have one the result is explicitly unstacked here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/generic.py#L1306. I'm not sure if this is by design with some other situation in mind but I'd agree the shape of the output shouldn't depend on the cardinality of the thing you're grouping on. |
Agree that it seems strange:
|
Though I think here it should be |
Is there any workaround for this issue currently in place without me having to modify the internals of Pandas? I personally never want my output to be formatted as pandas/pandas/core/groupby/generic.py Lines 1213 to 1227 in 46841f8
|
My bad workaround right now is a simple loop - but I really don't like it, its less pythonic and much slower in case of many groups. How ever, modifying the internals of pandas is not an option to me. In my point of view, this makes apply simply not usable for productive data-pipelines. |
I also ran into this issue. My workaround was to append a call to df.groupby(...).apply(...).pipe(
lambda df_or_series: df_or_series
if isinstance(df_or_series, pd.Series)
else df_or_series.iloc[0, :]
) |
Code Sample, a copy-pastable example if possible
Problem description
This returns a DataFrame of shape (1, 10)
This seems to occur when i) The groupby key happens to have a unique value ii) The apply function takes a DataFrame and returns a Series.
Expected Output
I expect it returns a Series of shape (10, 1)
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-72-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.1
numpy : 1.17.2
pytz : 2019.3
dateutil : 2.8.0
pip : 19.2.3
setuptools : 41.4.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : None
sphinx : 2.2.0
blosc : None
feather : None
xlsxwriter : 1.2.1
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: 0.8.1
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.13.0
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.9
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.1
The text was updated successfully, but these errors were encountered: