Skip to content

groupby value_counts strips categorical information #19416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mwaskom opened this issue Jan 26, 2018 · 1 comment
Closed

groupby value_counts strips categorical information #19416

mwaskom opened this issue Jan 26, 2018 · 1 comment
Labels
Categorical Categorical Data Type Duplicate Report Duplicate issue or pull request Groupby

Comments

@mwaskom
Copy link
Contributor

mwaskom commented Jan 26, 2018

Code Sample, a copy-pastable example if possible

import seaborn as sns
tips = sns.load_dataset("tips")
print(tips["day"].dtype)
counts = tips.groupby("size").day.value_counts().rename("count").reset_index()
print(counts["day"].dtype)
category
object

Problem description

When the categorical variable (day) becomes an index in the groupby and then gets reset, it loses its categorical metadata (and dtype). In the case of the tips dataset, this includes the proper ordering of days.

Interestingly, this isn't the case with the Series.value_counts method, which properly creates a CategoricalIndex with the order metadata.

I don't see an existing issue about this but maybe #9748 is related?

Expected Output

category
category

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.21.1
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.1
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback jreback added Groupby Duplicate Report Duplicate issue or pull request Categorical Categorical Data Type labels Jan 27, 2018
@jreback jreback added this to the No action milestone Jan 27, 2018
@jreback
Copy link
Contributor

jreback commented Jan 27, 2018

thanksj duplicate of #18502

@jreback jreback closed this as completed Jan 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Duplicate Report Duplicate issue or pull request Groupby
Projects
None yet
Development

No branches or pull requests

2 participants