Skip to content

DataFrame.groupby.apply returns different results with copy lambda functions. #14927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mroeschke opened this issue Dec 20, 2016 · 4 comments · Fixed by #31613 or #34998
Closed

DataFrame.groupby.apply returns different results with copy lambda functions. #14927

mroeschke opened this issue Dec 20, 2016 · 4 comments · Fixed by #31613 or #34998
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby
Milestone

Comments

@mroeschke
Copy link
Member

Code Sample, a copy-pastable example if possible

In [32]: df = pd.DataFrame({"g":[1, 2, 2, 2], "a":[1, 2, 3, 4], "b":[5, 6, 7, 8]})
    ...: 

In [33]: df.groupby('g').apply(lambda x: x)
Out[33]: 
   a  b  g
0  1  5  1
1  2  6  2
2  3  7  2
3  4  8  2

In [34]: df.groupby('g').apply(lambda x: x[:])
Out[34]: 
     a  b  g
g           
1 0  1  5  1
2 1  2  6  2
  2  3  7  2
  3  4  8  2

In [35]: df.groupby('g').apply(lambda x: x.copy(deep=False))
Out[35]: 
   a  b  g
0  1  5  1
1  2  6  2
2  3  7  2
3  4  8  2

In [36]: df.groupby('g').apply(lambda x: x.copy(deep=True))
Out[36]: 
     a  b  g
g           
1 0  1  5  1
2 1  2  6  2
  2  3  7  2
  3  4  8  2

Problem description

xref #9867

Issue #9946 (In[36]) no longer raises a ValueError on master 0.19.1, but I am curious why these 4 cases return 2 different results.

Expected Output

Not sure since I don't have a personal use-case for this operation, but I'd expect these operations to return the same result unless I am missing something with using a deepcopy.

Output of pd.show_versions()

```

commit: 3ccb501
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-45-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.0+228.g3ccb501
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_datareader: None

</details>
@jreback
Copy link
Contributor

jreback commented Dec 20, 2016

so these are examples where things seem to be mutated, the only detection is that the reference changed. This is an implementation detail; we have had a couple of reports very similiar to this.

#14810 (and some others, but this is hard to search for actually).

If you want to try to figure out how to make this better, by all means.

@mroeschke mroeschke added the Apply Apply, Aggregate, Transform, Map label May 26, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.24.0 milestone Jul 7, 2018
@jorisvandenbossche
Copy link
Member

I think we should try to deprecate this behaviour of trying to detect whether the original object is mutated (and then do a transform like operation) before 1.0

@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Nov 6, 2018
@jreback jreback modified the milestones: Contributions Welcome, 1.0.1 Feb 5, 2020
@jorisvandenbossche jorisvandenbossche modified the milestones: 1.0.1, 1.1 Feb 5, 2020
@TomAugspurger
Copy link
Contributor

This is being reverted in #35306 and will be re-fixed in #34998 in pandas 1.2.

@TomAugspurger TomAugspurger reopened this Jul 16, 2020
@TomAugspurger TomAugspurger modified the milestones: 1.1, 1.2 Jul 16, 2020
@jreback jreback modified the milestones: 1.2, 1.3 Nov 29, 2020
@mroeschke mroeschke added the Bug label May 2, 2021
@simonjayhawkins simonjayhawkins modified the milestones: 1.3, Contributions Welcome Jun 11, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.5 Mar 30, 2022
@MoritzLaurer
Copy link

MoritzLaurer commented Mar 7, 2023

I'm still encountering this issue of inconsistent return formats with pandas 1.4.4 and 1.5
This issue is actually quite frustrating. it makes code with groupby defacto not reproducible and unstable because groupby seems to randomly return different output formats which means that my code sometimes runs and sometimes doesnt :(
see also #13056

e.g with this code the results are inconsistent and I don't understand why df_cl.groupby(by=group).apply(lambda x: x.label.value_counts(sort=True))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Groupby
Projects
None yet
6 participants