Skip to content

Using agg with groupy, as_index=False still returning group variable as index #25011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cthamilton opened this issue Jan 29, 2019 · 8 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request

Comments

@cthamilton
Copy link

cthamilton commented Jan 29, 2019

Code Sample, a copy-pastable example if possible

Code sample:

# Import packages
import pandas as pd
import numpy as np
# Set up test DataFrame
test_array = np.arange(50) + 100
test_matrix = test_array.reshape((10,5))
test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
test_df.loc[0:5,'shouldnt be index'] = 3
test_df.loc[5:8,'shouldnt be index'] = 4
# groupby and agg
end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
print(end_result)

execution:

>>> # Import packages
... import pandas as pd
>>> import numpy as np
>>> # Set up test DataFrame
... test_array = np.arange(50) + 100
>>> test_matrix = test_array.reshape((10,5))
>>> test_df = pd.DataFrame(test_matrix).rename(columns={0:'shouldnt be index'})
>>> # Make groupby data more grouped
... test_df.loc[0:5,'shouldnt be index'] = 3
>>> test_df.loc[5:8,'shouldnt be index'] = 4
>>> # groupby and agg
... end_result = test_df.groupby('shouldnt be index',as_index=False).agg(["min", "max", "count"])
>>> print(end_result)
                     1               2               3               4
                   min  max count  min  max count  min  max count  min  max count
shouldnt be index
3                  101  121     5  102  122     5  103  123     5  104  124     5
4                  126  141     4  127  142     4  128  143     4  129  144     4
145                146  146     1  147  147     1  148  148     1  149  149     1

Problem description

I'm trying to use groupby with as_index=False and then do an aggregate statement. I included an example where the groupby variable ends up as an index rather than staying as a column. My understanding is that this should result in the groupby variable being a column and not an index (as below), but perhaps I am mistaken.

This is my first time creating an issue, so my apologies if this is operator error or I didn't include important information. Please let me know if this is the case.

Maybe this is related to #22546?

Expected Output

You can see what the result should be when using "reset_index"

>>> end_result.reset_index()
  shouldnt be index    1               2               3               4
                     min  max count  min  max count  min  max count  min  max count
0                 3  101  121     5  102  122     5  103  123     5  104  124     5
1                 4  126  141     4  127  142     4  128  143     4  129  144     4
2               145  146  146     1  147  147     1  148  148     1  149  149     1

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0
pytest: 3.8.0
pip: 19.0.1
setuptools: 40.2.0
Cython: 0.28.5
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: 1.7.9
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 2.2.3
openpyxl: 2.5.6
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.0
lxml.etree: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@WillAyd
Copy link
Member

WillAyd commented Jan 31, 2019

Thanks for the report. I think the problem here is a conflict between the as_index keyword and how we are piecing together the result of multiple agg function applications.

Specifically, this is fine:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg(min) 

but this would reproduce the error you are seeing:

end_result = test_df.groupby('shouldnt be index',as_index=False).agg([min])

Investigation and PRs would certainly be welcome

@nathan-zymergen
Copy link

Curious if there is any update on this. I just ran into this issue. I appreciate all the work being done on this great project!

@simonjayhawkins
Copy link
Member

closing as duplicate of #13217. ping me if I'm missing something.

@simonjayhawkins simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Apply Apply, Aggregate, Transform, Map Bug Groupby labels Apr 24, 2020
@simonjayhawkins simonjayhawkins removed this from the Contributions Welcome milestone Apr 24, 2020
@AnkithO-0
Copy link

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

@realGhostFoxx
Copy link

Hi guys, I too am running into the same and I just found out that doing a .reset_index() instead of as_index = False solves the issue for me. Thanks :)

hero! hero! hero!

@IsmamHuda
Copy link

Still would be good for this to get resolved all the same.

@vroomzel
Copy link

Encountering the same with pandas=='1.3.4'

@SieSiongWong
Copy link

SieSiongWong commented Dec 3, 2022

Hello, today I ran below groupby() code and getting this error: ValueError: Cannot set a DataFrame with multiple columns to the single column max_date This is super strange. I run exactly the same code in another pc and getting the expected result without any error but then on another pc I'm getting this error. I used to run this code 1 month ago on both PCs and no issue at all and I have used the same code to run for about a year now without any error. Is this a bug being introduced to Pandas recently? The version of Pandas I use of getting this error is 1.5.1 but the version does not generate this error is pandas version 1.4. On PC which has the pandas version 1.5.1, I need to set the as_index=True in order to avoid getting this error but still this is super super strange because I use the same code every week. If anyone can tell what is happening, really appreciate it.

df['max_date'] = df.groupby(['Provider', 'Location', 'Address', 'Open Hours', 'Test to Treat'], as_index=False)['End Date'].transform(max)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants