BUG: categorical column dropped from groupby agg result when as_index=False #8770

aimboden · 2014-11-10T09:22:12Z

Hello everyone, updating to Pandas 0.15.1 breaks my code because of the following behaviour. When performing an aggregation operation on a groupby object containing a categorical, with as_index=False, this column is now dropped from the aggregation result. This may be an unexpected consequence of #8585 , and certainly looks as a bug to me.

import pandas as pd
import numpy as np

d = {'Foo': [1, 2, 3, 4, 5, 6, 11],
     'Bar': ['AA', 'AA', 'AA', 'BB', 'BB', 'BB',
             'CC'],
     'Baz': [10, 20, 30, 40, 50, 60, 70]}

df = pd.DataFrame(d)
ls = np.linspace(0, 30, 7)
groups = df.groupby([pd.cut(df['Foo'], ls), 'Bar'], as_index=False)

df2 = groups['Baz'].agg({'Mean Baz': 'mean'})

produces:

>>> pd.__version__
'0.15.1'
>>> df2

	Bar	Mean Baz
0	AA	20
1	BB	45
2	BB	60
3	CC	70

, whereas I expect the following:

>>> pd.__version__
'0.15.0'

	Foo	Bar	Mean Baz
0	(0, 5]	AA	20
1	(0, 5]	BB	45
2	(5, 10]	BB	60
3	(10, 15]	CC	70

jreback · 2014-11-10T12:01:30Z

cc @behzadnouri

I marked this as a bug. But on second thought I think this is correct and the prior behavior was wrong.

You are grouping with 2 anonymous arrays (well a Series and an array). Neither of which exist in the grouped object. By definition groupby will make the groupers into the result index. But since you are dropping it (with as_index=False), then they are removed.

You can add these groupers as columns to preserver them, or use as_index=True (and then reset_index).

behzadnouri · 2014-11-10T12:36:13Z

yes, this was changed by #8585 in order to have

consistency between aggregating methods: for example max and idxmax which does not add the grouper to the frame; as in groupby with as_index=False, inconsistency between max and idxmax #8582
not to have a new column with NaN name added to the frame (when the grouper series has no name)
adding the grouper will break groupby ops if the grouper name conflicts with column names in the resulting frame; as in BUG: groupby with reducer of max and as_index=False raises #7115
consistency with api documentation which says:

as_index=False is effectively “SQL-style” grouped output

As @jreback mentioned, for the old behaviour, you can either do as_index=True followed by reset_index or add the groupers as new columns to the frame, and do groupby by the columns:

In [19]: groups = df.groupby([pd.cut(df['Foo'], ls), 'Bar'], as_index=True)

In [20]: groups['Baz'].agg({'Mean Baz': 'mean'}).reset_index()
Out[20]: 
        Foo Bar  Mean Baz
0    (0, 5]  AA        20
1    (0, 5]  BB        45
2   (5, 10]  BB        60
3  (10, 15]  CC        70

or,

In [21]: df['Foo.1'] = pd.cut(df['Foo'], ls)

In [22]: groups = df.groupby(['Foo.1', 'Bar'], as_index=False)

In [23]: groups['Baz'].agg({'Mean Baz': 'mean'})
Out[23]: 
      Foo.1 Bar  Mean Baz
0    (0, 5]  AA        20
1    (0, 5]  BB        45
2   (5, 10]  BB        60
3  (10, 15]  CC        70

aimboden · 2014-11-10T12:36:51Z

@jreback

If you think the previous behavior was wrong, I can live with that using reset_index.

However, I still think that it is inconsistent. Why would the categorical that we are grouping with be dropped, while the Series that is also used as a grouper ('Bar') is kept?

If we follow through with your logic, then shouldn't the result simply be

	Mean Baz
0	20
1	45
2	60
3	70

, since both groupers would then be dropped from the resulting DataFrame?

jreback · 2014-11-10T12:48:27Z

@Gimli510 no, the 'Bar' is added back, as a named grouper is not excluded. I could see that this is confusing. @behzadnouri ?

aimboden · 2014-11-10T12:49:31Z

@behzadnouri
Thanks for explaining the rationale behind this change, which makes sense regarding the issues encountered. Maybe there is no better way? I still feel that this should be documented.

Maybe the API changes section of the release notes should be updated?
It states that

groupby with as_index=False will not add erroneous extra columns to result

I couldn't find any hint that the addition of the grouper to the resulting DataFrame was an erroneous extra column. The "SQL-Style" grouped output of the documentation is also not very telling to the mere user that I am :/

aimboden · 2014-11-10T12:54:28Z

@jreback but the grouper does have a name in this case, doesn't it?

>>> grouper = pd.cut(df['Foo'], ls)
>>> grouper
0      (0, 5]
1      (0, 5]
2      (0, 5]
3      (0, 5]
4      (0, 5]
5     (5, 10]
6    (10, 15]
Name: Foo, dtype: category
Categories (6, object): [(0, 5] < (5, 10] < (10, 15] < (15, 20] < (20, 25] < (25, 30]]

wesm · 2018-07-06T21:30:00Z

Closing. I welcome anyone who wishes to revisit this with pandas >= 0.23.2

aimboden changed the title ~~BUG: Categorical column dropped from groupby agg result when as_index=False~~ BUG: categorical column dropped from groupby agg result when as_index=False Nov 10, 2014

jreback added Bug Categorical Categorical Data Type Groupby labels Nov 10, 2014

jreback added this to the 0.15.2 milestone Nov 10, 2014

jreback modified the milestones: 0.16.0, 0.15.2 Dec 4, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

wesm closed this as completed Jul 6, 2018

wesm added the Won't Fix label Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: categorical column dropped from groupby agg result when as_index=False #8770

BUG: categorical column dropped from groupby agg result when as_index=False #8770

aimboden commented Nov 10, 2014

jreback commented Nov 10, 2014

behzadnouri commented Nov 10, 2014

aimboden commented Nov 10, 2014

jreback commented Nov 10, 2014

aimboden commented Nov 10, 2014

aimboden commented Nov 10, 2014

wesm commented Jul 6, 2018

BUG: categorical column dropped from groupby agg result when as_index=False #8770

BUG: categorical column dropped from groupby agg result when as_index=False #8770

Comments

aimboden commented Nov 10, 2014

jreback commented Nov 10, 2014

behzadnouri commented Nov 10, 2014

aimboden commented Nov 10, 2014

jreback commented Nov 10, 2014

aimboden commented Nov 10, 2014

aimboden commented Nov 10, 2014

wesm commented Jul 6, 2018