DataFrame Groupby Apply Returning Unexpected Result? #12652

jaradc · 2016-03-17T03:06:59Z

Why doesn't applying the addColumn function to a DataFrameGroupby object return the expected output below?

Code Sample, a copy-pasteable example if possible

import pandas as pd

df = pd.DataFrame({
  ('C', 'julian'): [0.258185, 0.52591899999999991, 0.17491099999999998, 0.94083099999999997, 0.70193700000000003, 0.189361, 0.90364500000000003, 0.56848199999999993, 0.44919799999999993, 0.39054899999999998],
  ('B', 'geoffrey'): [0.27970200000000001, 0.54119799999999996, 0.36436499999999999, 0.73802900000000005, 0.85527000000000009, 0.37441099999999999, 0.87378500000000003, 0.062140000000000001, 0.008404, 0.171458], 
  ('A', 'julian'): [0.20408199999999999, 0.263235, 0.196243, 0.52878500000000006, 0.85351699999999997, 0.23979699999999998, 0.98073399999999999, 0.59194199999999997, 0.81816699999999998, 0.21742399999999998], 
  ('B', 'julian'): [0.79572500000000002, 0.507324, 0.65340799999999999, 0.65416000000000007, 0.803087, 0.94354400000000005, 0.85009699999999988, 0.56629799999999997, 0.28205000000000002, 0.47193299999999999], 
  ('A', 'geoffrey'): [0.073676000000000005, 0.096733, 0.028613, 0.831569, 0.26324999999999998, 0.069519000000000011, 0.29041400000000001, 0.088387000000000007, 0.061483000000000003, 0.42760200000000004], 
  ('C', 'geoffrey'): [0.25811200000000001, 0.75765199999999999, 0.92473300000000003, 0.29447299999999998, 0.26469799999999999, 0.84664699999999993, 0.11871300000000001, 0.87206399999999995, 0.65837000000000001, 0.23442600000000002]},
  columns=pd.MultiIndex.from_tuples([('A','julian'),('A','geoffrey'), ('B','julian'),('B','geoffrey'), ('C','julian'),('C','geoffrey')]))

def addColumn(grouped):
  name = grouped.columns[0][1]
  grouped['sum', name] = grouped.sum(axis=1)
  #print(grouped)
  return grouped

result = df.groupby(level=1, axis=1).apply(addColumn)

Expected Output

      A         B         C       sum         A         B         C       sum  
   geoffrey  geoffrey  geoffrey  geoffrey    julian    julian    julian    julian  
0  0.073676  0.279702  0.258112  0.611491  0.204082  0.795725  0.258185  1.257992  
1  0.096733  0.541198  0.757652  1.395584  0.263235  0.507324  0.525919  1.296478  
2  0.028613  0.364365  0.924733  1.317710  0.196243  0.653408  0.174911  1.024561  
3  0.831569  0.738029  0.294473  1.864071  0.528785  0.654160  0.940831  2.123776  
4  0.263250  0.855270  0.264698  1.383219  0.853517  0.803087  0.701937  2.358542  
5  0.069519  0.374411  0.846647  1.290578  0.239797  0.943544  0.189361  1.372703  
6  0.290414  0.873785  0.118713  1.282912  0.980734  0.850097  0.903645  2.734476  
7  0.088387  0.062140  0.872064  1.022590  0.591942  0.566298  0.568482  1.726721  
8  0.061483  0.008404  0.658370  0.728257  0.818167  0.282050  0.449198  1.549415  
9  0.427602  0.171458  0.234426  0.833486  0.217424  0.471933  0.390549  1.079906

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.4.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: None
setuptools: 15.0
Cython: 0.22
numpy: 1.9.3
scipy: 0.16.0c1
statsmodels: 0.6.1
IPython: 4.0.0
sphinx: None
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.4.2
bs4: 4.1.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: 2.8

The text was updated successfully, but these errors were encountered:

jreback · 2016-03-17T03:10:38Z

your post is not copy-pastable, something wrong in the formatting.

further you REALLY REALLY should not be mutating INSIDE an .apply. We have to ban this. it is wholly bad practice.

jaradc · 2016-03-17T04:01:50Z

It actually is copy/pasteable. Would you like a screenshot proving that?

Anyways, the question is why doesn't this work, not whether this is considered good or bad practice. It comes from a question on stackoverflow and I'm trying to understand:

Is my understanding about groupby apply incorrect
or is this a bug?

jreback · 2016-03-17T16:23:06Z

this is partially a bug, will update soon.

rhshadrach · 2020-09-26T02:46:55Z

I'm seeing the expected output on master. That said, I don't think tests should be added for this. One should not mutate in an apply in the first place.

jreback · 2020-09-26T02:52:10Z

note that we do have a couple tests that lock down this behavior

when we want to change it - these kinds of tests are helpful

rhshadrach · 2020-09-26T13:55:20Z

Alright - I'll mark as such.

jreback added Groupby Usage Question labels Mar 17, 2016

jreback mentioned this issue Mar 17, 2016

API: ban mutation within groupby.apply #12653

Closed

jreback added this to the 0.18.1 milestone Mar 17, 2016

jreback added the Bug label Mar 17, 2016

jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016

jreback mentioned this issue Jun 7, 2016

[BUG] Groupby, Apply #13390

Closed

jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Sep 1, 2016

mroeschke added Apply Apply, Aggregate, Transform, Map and removed Usage Question labels Oct 9, 2019

rhshadrach added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Sep 26, 2020

mroeschke removed Apply Apply, Aggregate, Transform, Map Bug Groupby labels Apr 23, 2021

mroeschke mentioned this issue May 12, 2021

TST: Add test for old issues #41431

Merged

10 tasks

jreback modified the milestones: Contributions Welcome, 1.3 May 12, 2021

jreback closed this as completed in #41431 May 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame Groupby Apply Returning Unexpected Result? #12652

DataFrame Groupby Apply Returning Unexpected Result? #12652

jaradc commented Mar 17, 2016

INSTALLED VERSIONS

jreback commented Mar 17, 2016

jaradc commented Mar 17, 2016

jreback commented Mar 17, 2016

rhshadrach commented Sep 26, 2020

jreback commented Sep 26, 2020

rhshadrach commented Sep 26, 2020

DataFrame Groupby Apply Returning Unexpected Result? #12652

DataFrame Groupby Apply Returning Unexpected Result? #12652

Comments

jaradc commented Mar 17, 2016

Code Sample, a copy-pasteable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Mar 17, 2016

jaradc commented Mar 17, 2016

jreback commented Mar 17, 2016

rhshadrach commented Sep 26, 2020

jreback commented Sep 26, 2020

rhshadrach commented Sep 26, 2020

output of `pd.show_versions()`