-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: categorical column dropped from groupby agg result when as_index=False #8770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @behzadnouri I marked this as a bug. But on second thought I think this is correct and the prior behavior was wrong. You are grouping with 2 anonymous arrays (well a Series and an array). Neither of which exist in the grouped object. By definition groupby will make the groupers into the result index. But since you are dropping it (with You can add these groupers as columns to preserver them, or use |
yes, this was changed by #8585 in order to have
As @jreback mentioned, for the old behaviour, you can either do
or,
|
If you think the previous behavior was wrong, I can live with that using However, I still think that it is inconsistent. Why would the categorical that we are grouping with be dropped, while the Series that is also used as a grouper ('Bar') is kept? If we follow through with your logic, then shouldn't the result simply be
, since both groupers would then be dropped from the resulting DataFrame? |
@Gimli510 no, the 'Bar' is added back, as a named grouper is not excluded. I could see that this is confusing. @behzadnouri ? |
@behzadnouri Maybe the API changes section of the release notes should be updated? groupby with as_index=False will not add erroneous extra columns to result I couldn't find any hint that the addition of the grouper to the resulting DataFrame was an erroneous extra column. The "SQL-Style" grouped output of the documentation is also not very telling to the mere user that I am :/ |
@jreback but the grouper does have a name in this case, doesn't it? >>> grouper = pd.cut(df['Foo'], ls)
>>> grouper
0 (0, 5]
1 (0, 5]
2 (0, 5]
3 (0, 5]
4 (0, 5]
5 (5, 10]
6 (10, 15]
Name: Foo, dtype: category
Categories (6, object): [(0, 5] < (5, 10] < (10, 15] < (15, 20] < (20, 25] < (25, 30]] |
Closing. I welcome anyone who wishes to revisit this with pandas >= 0.23.2 |
Hello everyone, updating to Pandas 0.15.1 breaks my code because of the following behaviour. When performing an aggregation operation on a groupby object containing a categorical, with as_index=False, this column is now dropped from the aggregation result. This may be an unexpected consequence of #8585 , and certainly looks as a bug to me.
produces:
, whereas I expect the following:
The text was updated successfully, but these errors were encountered: