BUG: When agg or reset_index to grouped DF, insert function causes ValueError. #20566

TakaakiFuruse · 2018-03-31T14:08:11Z

Code Sample, a copy-pastable example if possible

Case 1

import pandas as pd
df = pd.DataFrame({
               'a' : [1,1,1,2,2,2,3,3,3],
               'b' : [1,2,3,4,5,6,7,8,9],
})
df.groupby('a', as_index=False).agg({'a': 'count'})

Case 2

import pandas as pd
df = pd.DataFrame({
               'a' : [1,1,1,2,2,2,3,3,3],
               'b' : [1,2,3,4,5,6,7,8,9],
})
df.groupby(['a']).agg({'a': 'count'}).reset_index(level=0)

It returns "ValueError: cannot insert a, already exists" for both caes.

Problem description

So far, I could found 2 functions which calls "insert" function without "allow_duplicates=True" option.
One is in "def _insert_inaxis_grouper_inplace" from "pandas/core/groupby.py"

result.insert(0, name, lev)

and another is in "def reset_index" from "pandas/core/frame.py"

 new_obj.insert(0, name, level_values)

For both cases, it result in ValueError as I wrote in Code section.

Expected Output

I think for both cases it should at least return some kind of dataframe without returning ValueError.

It should add a new column even it was duplicated.
It should add a new column with renaming it (say a_1 or a_x as new column).

My concern is

Is this a bug?

Should we supposed to aggrecate for "as_index=False" dataframe or do "reset_index" to aggregated dataframe?
(i.e. Are there any other way to rename column before doing reset_index or agg?)

If this were a bug, how could I fix it?

I could add "allow_duplicate=True" option for both lines.
Or, I could rename column by adding "_x" or "_dup" or any kind of suffix.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.8.0-53-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: ja_JP.UTF-8
LOCALE: ja_JP.UTF-8

pandas: 0.22.0
pytest: 3.5.0
pip: 9.0.3
setuptools: 39.0.1
Cython: 0.28.1
numpy: 1.14.2
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.2
patsy: 0.5.0
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.1
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.5
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-04-01T21:19:50Z

What is the expected result of your code samples?

TakaakiFuruse · 2018-04-01T22:44:42Z

@WillAyd

For Case 1,

or

For Case2,

or

WillAyd · 2018-04-02T00:46:50Z

Your problem is that you are trying to group and aggregate the same column, which doesn't make sense. What you are looking for is below:

In [13]: df.groupby('a', as_index=False).agg({'b': 'count'})
Out[13]: 
   a  b
0  1  3
1  2  3
2  3  3

TakaakiFuruse · 2018-04-02T00:49:13Z

Thanks. So this isn't a bug. My mistake.
Let me close the issue.

TakaakiFuruse mentioned this issue Mar 31, 2018

BUG: STD modifies groupby target column when as_index=False #10355

Closed

TakaakiFuruse closed this as completed Apr 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: When agg or reset_index to grouped DF, insert function causes ValueError. #20566

BUG: When agg or reset_index to grouped DF, insert function causes ValueError. #20566

TakaakiFuruse commented Mar 31, 2018

INSTALLED VERSIONS

WillAyd commented Apr 1, 2018

TakaakiFuruse commented Apr 1, 2018 •

edited

Loading

WillAyd commented Apr 2, 2018

TakaakiFuruse commented Apr 2, 2018

BUG: When agg or reset_index to grouped DF, insert function causes ValueError. #20566

BUG: When agg or reset_index to grouped DF, insert function causes ValueError. #20566

Comments

TakaakiFuruse commented Mar 31, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Apr 1, 2018

TakaakiFuruse commented Apr 1, 2018 • edited Loading

WillAyd commented Apr 2, 2018

TakaakiFuruse commented Apr 2, 2018

Output of `pd.show_versions()`

TakaakiFuruse commented Apr 1, 2018 •

edited

Loading