Groupby apply result shape depends on internal apply path #31612

fjetter · 2020-02-03T13:11:48Z

Problem description

The shape returned by groupby.apply currently depends on the path the internal apply takes (fast apply vs slow apply) which is opaque to the user. The following examples show the behaviour for two simple examples where the fast path returns the group key as index while the slow path returns the original index.

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'A': [0, 0, 1], 'b': range(3)})

def slow(group):
       # slow apply because of check `result is input`, c.f. https://github.com/pandas-dev/pandas/blob/44782c0809e296a8e57b7f77d963e999c7e0f4a7/pandas/_libs/reduction.pyx#L494
       return group

def fast(group):
       return group.copy()

   ...: df.groupby("A").apply(slow)
   ...:
Out[3]:
   A  B
0  0  0
1  0  1
2  1  2

In [4]: df.groupby("B").apply(fast)
Out[4]:
     A  B
B
0 0  0  0
1 1  0  1

Expected Output

A transparent and consistent output data type. In particular, the behaviour should not be coupled to private, performance related internals.

Related issues

There are a lot of related issues which may or may not have the same root cause. Here an excerpt

#12824
#14538
#12977
#28662

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.0.0
numpy : 1.16.5
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200119
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

The text was updated successfully, but these errors were encountered:

mroeschke · 2020-02-03T18:06:01Z

Is this the same issue as #14927?

fjetter · 2020-02-04T08:35:29Z

@mroeschke Indeed, it looks to be the same thing. I added a test case for it in #31613

simonjayhawkins · 2020-05-25T09:36:59Z

Is this the same issue as #14927?

@mroeschke Indeed, it looks to be the same thing. I added a test case for it in #31613

closing as duplicate

TomAugspurger · 2020-07-16T13:46:24Z

This is being reverted in #35306 and will be re-fixed in #34998 in pandas 1.2

simonjayhawkins · 2020-07-16T13:54:52Z

since #14927 has been reopened. can this remain closed as duplicate?

TomAugspurger · 2020-07-16T14:12:31Z

Sure if they're the same.

simonjayhawkins · 2020-07-16T14:17:43Z

closing as duplicate, see #31612 (comment)

fjetter mentioned this issue Feb 3, 2020

BUG: Ensure same index is returned for slow and fast path in groupby.apply #31613

Merged

6 tasks

jreback added Bug Groupby labels Feb 5, 2020

jreback added this to the 1.0.1 milestone Feb 5, 2020

jorisvandenbossche modified the milestones: 1.0.1, 1.1 Feb 5, 2020

simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Bug Groupby labels May 25, 2020

simonjayhawkins removed this from the 1.1 milestone May 25, 2020

simonjayhawkins closed this as completed May 25, 2020

jreback added this to the No action milestone May 25, 2020

jreback added the Groupby label May 25, 2020

TomAugspurger mentioned this issue Jul 16, 2020

API: User-control of result keys in GroupBy.apply #34998

Merged

TomAugspurger reopened this Jul 16, 2020

simonjayhawkins closed this as completed Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Groupby apply result shape depends on internal apply path #31612

Groupby apply result shape depends on internal apply path #31612

fjetter commented Feb 3, 2020

INSTALLED VERSIONS

mroeschke commented Feb 3, 2020

fjetter commented Feb 4, 2020

simonjayhawkins commented May 25, 2020

TomAugspurger commented Jul 16, 2020

simonjayhawkins commented Jul 16, 2020

TomAugspurger commented Jul 16, 2020

simonjayhawkins commented Jul 16, 2020

Groupby apply result shape depends on internal apply path #31612

Groupby apply result shape depends on internal apply path #31612

Comments

fjetter commented Feb 3, 2020

Problem description

Code Sample, a copy-pastable example if possible

Expected Output

Related issues

Output of pd.show_versions()

INSTALLED VERSIONS

mroeschke commented Feb 3, 2020

fjetter commented Feb 4, 2020

simonjayhawkins commented May 25, 2020

TomAugspurger commented Jul 16, 2020

simonjayhawkins commented Jul 16, 2020

TomAugspurger commented Jul 16, 2020

simonjayhawkins commented Jul 16, 2020

Output of `pd.show_versions()`