Skip to content

BUG: SeriesGroupBy.value_counts is inconsistent with Series.value_counts when the Series is categorical #38672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
venaturum opened this issue Dec 24, 2020 · 1 comment · Fixed by #38796
Closed
2 of 3 tasks
Assignees
Labels
Bug Categorical Categorical Data Type Groupby
Milestone

Comments

@venaturum
Copy link
Contributor

venaturum commented Dec 24, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame()
df["col1"] = ["A", "A", "B", "B"]
df["col2"] = pd.Categorical(["1", "2", "1", "1"], categories=["1", "2", "3"])

print(df.groupby("col1").col2.value_counts())

Problem description

A value_counts applied to the col2 Series:

df.col2.value_counts()

yields:

1      3
2      1
3      0
Name: col2, dtype: int64

Because col2 is categorical we get the 0 count for value 3, as desired.

If we groupby col1 first

df.groupby("col1").col2.value_counts()

we get the following output:

col1  col2
A     1       1
      2       1
B     1       2
Name: col2, dtype: int64

Expected Output

The expected output can be achieved with a workaround:

df.groupby("col1").col2.apply(pd.Series.value_counts)

which yields:

col1   
A     2    1
      1    1
      3    0
B     1    2
      3    0
      2    0
Name: col2, dtype: int64

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d42
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.4
numpy : 1.19.1
pytz : 2019.3
dateutil : 2.8.0
pip : 20.2.3
setuptools : 41.2.0
Cython : 0.29.21
pytest : 5.3.5
hypothesis : None
sphinx : 2.4.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fsspec : 0.6.0
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.10
tables : None
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@venaturum venaturum added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 24, 2020
@venaturum venaturum changed the title BUG: SeriesGroupBy.value_counts is inconsitent with Series.value_counts when the Series is categorical BUG: SeriesGroupBy.value_counts is inconsistent with Series.value_counts when the Series is categorical Dec 24, 2020
@MarcoGorelli MarcoGorelli added Categorical Categorical Data Type and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 24, 2020
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Dec 24, 2020
@venaturum
Copy link
Contributor Author

take

@jreback jreback modified the milestones: Contributions Welcome, 1.3 Dec 30, 2020
NumberPiOso added a commit to NumberPiOso/pandas that referenced this issue Feb 1, 2022
mroeschke pushed a commit that referenced this issue Feb 5, 2022
…olumns (#45625)

* BUG: SeriesGroupBy.value_counts index name missing

Issue  #44324

* TST: Change test to correct categorical naming

Value counts tend to preserve index names #45625
Change test test_sorting_with_different_categoricals to comply
to this change

* REF: Refactor conditionals in value_counts()

* RFT: correct mistake introduced via RFT

In line with 44324

* RFT: Change variable names and comment #38672

* BUG: Update conditional to is None to consider series
phofl pushed a commit to phofl/pandas that referenced this issue Feb 14, 2022
…olumns (pandas-dev#45625)

* BUG: SeriesGroupBy.value_counts index name missing

Issue  pandas-dev#44324

* TST: Change test to correct categorical naming

Value counts tend to preserve index names pandas-dev#45625
Change test test_sorting_with_different_categoricals to comply
to this change

* REF: Refactor conditionals in value_counts()

* RFT: correct mistake introduced via RFT

In line with 44324

* RFT: Change variable names and comment pandas-dev#38672

* BUG: Update conditional to is None to consider series
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this issue Jul 13, 2022
…olumns (pandas-dev#45625)

* BUG: SeriesGroupBy.value_counts index name missing

Issue  pandas-dev#44324

* TST: Change test to correct categorical naming

Value counts tend to preserve index names pandas-dev#45625
Change test test_sorting_with_different_categoricals to comply
to this change

* REF: Refactor conditionals in value_counts()

* RFT: correct mistake introduced via RFT

In line with 44324

* RFT: Change variable names and comment pandas-dev#38672

* BUG: Update conditional to is None to consider series
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants