.groupby() .value_counts() incompatible with .reset_index() in 0.18.1 #14014

dcroote · 2016-08-16T19:46:37Z

Code Sample

df = pd.DataFrame([[0,1],[0,1],[0,2],[1,1]], columns=['a','b'])
df
   a  b
0  0  1
1  0  1
2  0  2
3  1  1

df.groupby('a').b.value_counts().reset_index()

ValueError: cannot insert b, already exists

Expected Output

In version 0.18.0, the output was:

   a  b  0
0  0  1  2
1  0  2  1
2  1  1  1
dtype: int64

The difference is that now the groupby() value_counts() operation returns a Series named equivalently to the column on which value_counts() was computed.

df.groupby('a').b.value_counts()

0.18.0

a  b
0  1    2
   2    1
1  1    1
dtype: int64

0.18.1 (including 0.18.1+367.g6b7857b)

a  b
0  1    2
   2    1
1  1    1
Name: b, dtype: int64

This change in behavior is not completely unexpected given that outside of groupby(), value_counts() has historically returned a Series named equivalently to the column the operation was performed on:

df.a.value_counts()
0    3
1    1
Name: a, dtype: int64

A manual workaround would be to rename the Series before reset_index() as follows:

g = df.groupby('a').b.value_counts()
g.name = 0
g.reset_index()
   a  b  0
0  0  1  2
1  0  2  1
2  1  1  1

However, the one-line functionality was much appreciated. Being able to pass a new name to value_counts() could solve this issue?

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-21-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1  (also verified with 0.18.1+367.g6b7857b)
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24.1
numpy: 1.11.1
scipy: 0.18.0
statsmodels: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2016-08-16T20:39:00Z

Probably a result of #12363 fixing groupby sometimes losing the name.

In this case I'd say that

In [37]: df.groupby('a').b.value_counts().reset_index(name='counts')
Out[37]:
   a  b  counts
0  0  1       2
1  0  2       1
2  1  1       1

is even clearer than your original. Thoughts?

dcroote · 2016-08-16T20:43:03Z

Even better, thanks!

TomAugspurger added the Usage Question label Aug 16, 2016

dcroote closed this as completed Aug 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.groupby() .value_counts() incompatible with .reset_index() in 0.18.1 #14014

.groupby() .value_counts() incompatible with .reset_index() in 0.18.1 #14014

dcroote commented Aug 16, 2016

TomAugspurger commented Aug 16, 2016

dcroote commented Aug 16, 2016

.groupby() .value_counts() incompatible with .reset_index() in 0.18.1 #14014

.groupby() .value_counts() incompatible with .reset_index() in 0.18.1 #14014

Comments

dcroote commented Aug 16, 2016

Code Sample

Expected Output

output of pd.show_versions()

TomAugspurger commented Aug 16, 2016

dcroote commented Aug 16, 2016

output of `pd.show_versions()`