Using agg with groupy, as_index=False still returning group variable as index #25011

cthamilton · 2019-01-29T20:51:17Z

Code Sample, a copy-pastable example if possible

Code sample:

# Import packages
import pandas as pd
import numpy as np
# Set up test DataFrame
test_array = np.arange(50) + 100
test_matrix = test_array.reshape((10,5))
test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
test_df.loc[0:5,'shouldnt be index'] = 3
test_df.loc[5:8,'shouldnt be index'] = 4
# groupby and agg
end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
print(end_result)

execution:

>>> # Import packages
... import pandas as pd
>>> import numpy as np
>>> # Set up test DataFrame
... test_array = np.arange(50) + 100
>>> test_matrix = test_array.reshape((10,5))
>>> test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
>>> # Make groupby data more grouped
... test_df.loc[0:5,'shouldnt be index'] = 3
>>> test_df.loc[5:8,'shouldnt be index'] = 4
>>> # groupby and agg
... end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
>>> print(end_result)
                     1               2               3               4
                   min  max count  min  max count  min  max count  min  max count
shouldnt be index
3                  101  121     5  102  122     5  103  123     5  104  124     5
4                  126  141     4  127  142     4  128  143     4  129  144     4
145                146  146     1  147  147     1  148  148     1  149  149     1

Problem description

I'm trying to use groupby with as_index=False and then do an aggregate statement. I included an example where the groupby variable ends up as an index rather than staying as a column. My understanding is that this should result in the groupby variable being a column and not an index (as below), but perhaps I am mistaken.

This is my first time creating an issue, so my apologies if this is operator error or I didn't include important information. Please let me know if this is the case.

Maybe this is related to #22546?

Expected Output

You can see what the result should be when using "reset_index"

>>> end_result.reset_index()
  shouldnt be index    1               2               3               4
                     min  max count  min  max count  min  max count  min  max count
0                 3  101  121     5  102  122     5  103  123     5  104  124     5
1                 4  126  141     4  127  142     4  128  143     4  129  144     4
2               145  146  146     1  147  147     1  148  148     1  149  149     1

Output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0
pytest: 3.8.0
pip: 19.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-01-31T13:52:51Z

Thanks for the report. I think the problem here is a conflict between the as_index keyword and how we are piecing together the result of multiple agg function applications.

Specifically, this is fine:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg(min)

but this would reproduce the error you are seeing:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg([min])

Investigation and PRs would certainly be welcome

nathan-zymergen · 2020-03-17T23:36:03Z

Curious if there is any update on this. I just ran into this issue. I appreciate all the work being done on this great project!

simonjayhawkins · 2020-04-24T15:53:28Z

closing as duplicate of #13217. ping me if I'm missing something.

AnkithO-0 · 2020-05-01T21:39:53Z

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

realGhostFoxx · 2021-08-21T10:39:19Z

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

hero! hero! hero!

IsmamHuda · 2021-12-26T09:07:35Z

Still would be good for this to get resolved all the same.

vroomzel · 2022-01-20T21:15:05Z

Encountering the same with pandas=='1.3.4'

SieSiongWong · 2022-12-03T04:25:22Z

Hello, today I ran below groupby() code and getting this error: ValueError: Cannot set a DataFrame with multiple columns to the single column max_date This is super strange. I run exactly the same code in another pc and getting the expected result without any error but then on another pc I'm getting this error. I used to run this code 1 month ago on both PCs and no issue at all and I have used the same code to run for about a year now without any error. Is this a bug being introduced to Pandas recently? The version of Pandas I use of getting this error is 1.5.1 but the version does not generate this error is pandas version 1.4. On PC which has the pandas version 1.5.1, I need to set the as_index=True in order to avoid getting this error but still this is super super strange because I use the same code every week. If anyone can tell what is happening, really appreciate it.

df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)

WillAyd added Bug Groupby labels Jan 31, 2019

WillAyd added this to the Contributions Welcome milestone Jan 31, 2019

HubertKl mentioned this issue Feb 3, 2019

BUG: Using agg with groupy, as_index=False still returning group variable as index #25114

Closed

3 tasks

jbrockmendel added the Apply Apply, Aggregate, Transform, Map label Dec 1, 2019

amineKammah mentioned this issue Mar 10, 2020

[BUG] Groupby with as_index=False raises error when type is Category. #32599

Closed

simonjayhawkins closed this as completed Apr 24, 2020

simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Apply Apply, Aggregate, Transform, Map Bug Groupby labels Apr 24, 2020

simonjayhawkins removed this from the Contributions Welcome milestone Apr 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using agg with groupy, as_index=False still returning group variable as index #25011

Using agg with groupy, as_index=False still returning group variable as index #25011

cthamilton commented Jan 29, 2019 •

edited

Loading

INSTALLED VERSIONS

WillAyd commented Jan 31, 2019

nathan-zymergen commented Mar 17, 2020

simonjayhawkins commented Apr 24, 2020

AnkithO-0 commented May 1, 2020

realGhostFoxx commented Aug 21, 2021

IsmamHuda commented Dec 26, 2021

vroomzel commented Jan 20, 2022

SieSiongWong commented Dec 3, 2022 •

edited

Loading

Using agg with groupy, as_index=False still returning group variable as index #25011

Using agg with groupy, as_index=False still returning group variable as index #25011

Comments

cthamilton commented Jan 29, 2019 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Jan 31, 2019

nathan-zymergen commented Mar 17, 2020

simonjayhawkins commented Apr 24, 2020

AnkithO-0 commented May 1, 2020

realGhostFoxx commented Aug 21, 2021

IsmamHuda commented Dec 26, 2021

vroomzel commented Jan 20, 2022

SieSiongWong commented Dec 3, 2022 • edited Loading

cthamilton commented Jan 29, 2019 •

edited

Loading

Output of `pd.show_versions()`

SieSiongWong commented Dec 3, 2022 •

edited

Loading