Skip to content

BUG: Keep categorical name in groupby #28798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 7, 2019
Merged

BUG: Keep categorical name in groupby #28798

merged 6 commits into from
Oct 7, 2019

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Oct 5, 2019

Fixes an issue where column name information was getting dropped when grouping by a categorical column. I had to change a couple existing tests which I think were incorrect since they were implicitly assuming this behavior was expected. Also confirmed that this fixes the problem from #28787:

[ins] In [1]: import pandas as pd 
         ...:  
         ...: df = pd.DataFrame(data=(('Bob', 2),  ('Greg', None), ('Greg', 6)), columns=['Name', 'Items']) 
         ...:  
         ...: df_cat = df.copy() 
         ...: df_cat['Name'] = df_cat['Name'].astype('category') 
         ...: df_cat.groupby('Name', observed=True).agg(pd.DataFrame.sum, skipna=True).reset_index() 
         ...:                                                                                                                                                              
Out[1]: 
   Name  Items
0   Bob    2.0
1  Greg    6.0

@topper-123 topper-123 added Categorical Categorical Data Type Bug Groupby labels Oct 5, 2019
@topper-123 topper-123 added this to the 1.0 milestone Oct 5, 2019
@jreback
Copy link
Contributor

jreback commented Oct 5, 2019

lgtm. over to @topper-123

@dsaxton
Copy link
Member Author

dsaxton commented Oct 6, 2019

@topper-123 Not sure if it was related to using the fixture, but was getting some test failures due to column order after the update, so added a check_like to the assertion

def test_groupby_cat_preserves_structure(observed):
# GH 28787
df = DataFrame({"Name": Categorical(["Bob", "Greg"]), "Item": [1, 2]})
expected = df.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify columns= here to fix the ordering rather using check_like

@topper-123 topper-123 merged commit 0ffdbe3 into pandas-dev:master Oct 7, 2019
@dsaxton dsaxton deleted the keep-name branch October 7, 2019 13:50
@topper-123
Copy link
Contributor

Thanks, @dsaxton!

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
* BUG: Keep categorical name in groupby
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
* BUG: Keep categorical name in groupby
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
* BUG: Keep categorical name in groupby
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Groupby
Projects
None yet
3 participants