Group-by/apply unexpected output with some operations when as_index=False #14547

anandtrex · 2016-10-31T20:01:52Z

A small, complete example of the issue

import pandas as pd
import numpy as np
dd = dict(a=np.arange(9), b=np.repeat(np.arange(3), 3))
df = pd.DataFrame(dd)

# Problematic line below
df.groupby('b', as_index=False).apply(np.std)
# The line below has the same problem
df.groupby('b', as_index=False).std()
# When as_index=False is not passed in, it works as expected.

# The following two lines work exactly as expected
# (An example of the group-by/apply working for some operations
# df.groupby('b', as_index=False).apply(np.mean)
# df.groupby('b', as_index=False).mean()

Expected Output

          a    b
0  0.816497  0.0
1  0.816497  1.0
2  0.816497  2.0

Actual Output

          a    b
0  0.816497  0.0
1  0.816497  0.0
2  0.816497  0.0

When as_index=False is passed into groupby, finding the standard deviation doesn't work as expected. When as_index=True, everything works as expected.

Finding the mean works as expected in both cases.

I have been able to reproduce the problem also on Linux with the same version of pandas.

Output of `pd.show_versions()`

> INSTALLED VERSIONS > ------------------ > commit: None > python: 3.5.2.final.0 > python-bits: 64 > OS: Darwin > OS-release: 15.6.0 > machine: x86_64 > processor: i386 > byteorder: little > LC_ALL: en_US > LANG: en_US.UTF8 > > pandas: 0.18.0 > nose: None > pip: 8.1.2 > setuptools: 28.5.0 > Cython: None > numpy: 1.11.0 > scipy: 0.17.0 > statsmodels: None > xarray: None > IPython: 5.1.0 > sphinx: 1.3.5 > patsy: None > dateutil: 2.5.3 > pytz: 2016.3 > blosc: None > bottleneck: None > tables: None > numexpr: None > matplotlib: 1.5.1 > openpyxl: None > xlrd: 0.9.4 > xlwt: None > xlsxwriter: None > lxml: None > bs4: 4.4.1 > html5lib: None > httplib2: None > apiclient: None > sqlalchemy: None > pymysql: None > psycopg2: None > jinja2: 2.8 > boto: None

The text was updated successfully, but these errors were encountered:

chris-b1 · 2016-10-31T20:52:27Z

Yeah, I don't think as_index=False is that well tested; xref #13217, though that seems to be a separate issue. PR to fix welcome!

jorisvandenbossche · 2016-10-31T21:04:06Z

Note that the two cases (.apply(np.std) and .std()) give a different result:

In [59]: df.groupby('b', as_index=False).apply(np.std)
Out[59]: 
          a    b
0  0.816497  0.0
1  0.816497  0.0
2  0.816497  0.0

In [60]: df.groupby('b', as_index=False).std()
Out[60]: 
          b    a
0  0.000000  1.0
1  1.000000  1.0
2  1.414214  1.0

I think the .apply(np.std) case is actually correct. It also applies the function on the 'b' column, and therefore this are all 0's. So it seems mainly the 'b' column in the .std() case that is wrong

jreback · 2016-10-31T21:07:28Z

these have different degrees of freedom

jorisvandenbossche · 2016-10-31T21:09:24Z

@jreback Yes, that explains the 1 vs 0.816, but not the 0.00, 1.00, 1.41 for the b column

jreback · 2016-11-01T10:02:07Z

this is a dupe of #10355

xieyuheng · 2019-02-14T05:24:21Z

#25315

chris-b1 added Bug Groupby labels Oct 31, 2016

chris-b1 added this to the Next Major Release milestone Oct 31, 2016

jreback closed this as completed Nov 1, 2016

jreback added the Duplicate Report Duplicate issue or pull request label Nov 1, 2016

jxrossel mentioned this issue Nov 1, 2016

BUG: STD modifies groupby target column when as_index=False #10355

Closed

jorisvandenbossche modified the milestones: No action, Next Major Release Nov 1, 2016

ivaniadg mentioned this issue Jun 29, 2017

BUG: apply std to groupby with as_index=False #16799

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group-by/apply unexpected output with some operations when as_index=False #14547

Group-by/apply unexpected output with some operations when as_index=False #14547

anandtrex commented Oct 31, 2016 •

edited by jorisvandenbossche

Loading

chris-b1 commented Oct 31, 2016 •

edited

Loading

jorisvandenbossche commented Oct 31, 2016

jreback commented Oct 31, 2016

jorisvandenbossche commented Oct 31, 2016

jreback commented Nov 1, 2016

xieyuheng commented Feb 14, 2019

Group-by/apply unexpected output with some operations when as_index=False #14547

Group-by/apply unexpected output with some operations when as_index=False #14547

Comments

anandtrex commented Oct 31, 2016 • edited by jorisvandenbossche Loading

A small, complete example of the issue

Expected Output

Actual Output

Output of pd.show_versions()

chris-b1 commented Oct 31, 2016 • edited Loading

jorisvandenbossche commented Oct 31, 2016

jreback commented Oct 31, 2016

jorisvandenbossche commented Oct 31, 2016

jreback commented Nov 1, 2016

xieyuheng commented Feb 14, 2019

anandtrex commented Oct 31, 2016 •

edited by jorisvandenbossche

Loading

Output of `pd.show_versions()`

chris-b1 commented Oct 31, 2016 •

edited

Loading