BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401

ghost9023 · 2020-12-10T08:03:48Z

[O] I have checked that this issue has not already been reported.
I checked it but this bug reported that already solved. but in my pandas, still occur.
(AssertionError when grouping with max/min as aggregation functions (pandas-1.0.0) #31522)
[O] I have confirmed this bug exists on the latest version of pandas.
I've checked latest version at anaconda
(optional) I have confirmed this bug exists on the master branch of pandas.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# data given from https://www.kaggle.com/hesh97/titanicdataset-traincsv/data
import pandas as pd

df = pd.read_csv('./train.csv')
df.head()

class_group = df.groupby('Pclass')
print(class_group.groups)  # pass
print(class_group.mean()['Survived'])  # pass
print(class_group.median())  # pass
print(class_group.agg('min'))  # error occur. also .agg('max'), .min(), .max()

Problem description

I found same error occurred at older version, 1.0.0, 1.0.1. (#31522)

At that ticket, after 1.0.3, this bug solved.

But I tested, Windows10 64bit python3.6 pandas 1.0.1, 1.0.3, 1.0.5, 1.1.3 and CentOS7.5 python3.8 pandas 1.0.5, same error occured.

Just like this. Same code, same data at Windows 10 and CentOS 7

Windows
""""""""""""""""""""""""""""""""""""""""""""""
Traceback (most recent call last):
File "E:/PythonProjects/Deepphi/local-dev/pandaserror.py", line 11, in
print(class_group.agg('min')) # error occur. also .agg('max'), .min(), .max()
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 951, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\base.py", line 307, in _aggregate
return self._try_aggregate_string_function(arg, *args, **kwargs), None
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\base.py", line 263, in _try_aggregate_string_function
return f(*args, **kwargs)
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\groupby.py", line 1552, in min
numeric_only=numeric_only, min_count=min_count, alias="min", npfunc=np.min
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\groupby.py", line 1000, in _agg_general
how=alias, alt=npfunc, numeric_only=numeric_only, min_count=min_count,
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 1022, in _cython_agg_general
how, alt=alt, numeric_only=numeric_only, min_count=min_count
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 1135, in _cython_agg_blocks
assert len(locs) == result.shape[1]
AssertionError

Process finished with exit code 1
""""""""""""""""""""""""""""""""""""""""""""""

CentOS
""""""""""""""""""""""""""""""""""""""""""""""
Traceback (most recent call last):
File "pandaserror.py", line 24, in
print(class_group.agg('min'))
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 928, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/base.py", line 311, in _aggregate
return self._try_aggregate_string_function(arg, *args, **kwargs), None
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/base.py", line 267, in _try_aggregate_string_function
return f(*args, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1372, in f
return self._cython_agg_general(alias, alt=npfunc, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 993, in _cython_agg_general
agg_blocks, agg_items = self._cython_agg_blocks(
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 1100, in _cython_agg_blocks
assert len(locs) == result.shape[1]
AssertionError
""""""""""""""""""""""""""""""""""""""""""""""

Expected Output

Output of `pd.show_versions()`

Windows
INSTALLED VERSIONS

commit : db08276
python : 3.6.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.4.0.post20200518
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.1
None

CentOS
INSTALLED VERSIONS

commit : None
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.12.1.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ko_KR.UTF-8
LOCALE : ko_KR.UTF-8

pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : None
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.9
numba : 0.50.1
None

The text was updated successfully, but these errors were encountered:

phofl · 2020-12-10T17:36:31Z

Hi thanks for your report. Could you provide an example where the df is constructed in code and not from an external file?

simonjayhawkins · 2020-12-15T20:42:48Z

the data fails to parse, maybe 'O'Dwyer, Miss. Ellen "Nellie"'

Please read this guide detailing how to provide the necessary information for us to reproduce your bug and how to create minimal bug reports

ghost9023 · 2020-12-16T07:14:28Z

I think this error seems to be not bug.

When I call .group.agg('min')['Age'], error occurred.

But when I call .group['Age'].agg('min'), error did not occurred.

It seems process step, such as loop, might be in the group calculating function.

.group['Age'].agg('min') means do first grouping, second get only 'Age', then calculate min value of 'Age'
.group.agg('min')['Age'] means do first grouping, second calculate min value of every columns, then get only min value of 'Age'

So in this case, if any column that can't be calculated min, max value exists, this error occurs.

I think it would be good to do exception handling and error message update for this process. It's not a bug, but from a user's point of view, it's very hard to see the problem if error message or error code is not displayed or just raising python attribute error only.

ryansrobin · 2020-12-17T22:45:27Z

To add for any others that trip across this like myself. Previous versions it felt like you could just use .max() on any dataframe. After upgrading to pandas 1.1.5, it seems that you should clean your dataframe further to ensure there are no null values.

Good luck!

mcarans · 2021-01-20T19:25:50Z

Similar to @ryansrobin, the problem I encountered with min() in Pandas 1.2.1 which didn't occur with 1.0.5 was solved by removing a column that had NaNs in it.

simonjayhawkins · 2021-01-21T16:40:04Z

@ryansrobin @mcarans If you have a reproducible example, feel free to open a new issue.

mcarans · 2021-01-21T23:40:08Z

@simonjayhawkins I have done so here: #39329

simonjayhawkins · 2021-01-22T10:17:41Z

Thanks @mcarans

ghost9023 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 10, 2020

rhshadrach added Groupby Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 11, 2020

ghost9023 closed this as completed Dec 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401

BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401

ghost9023 commented Dec 10, 2020 •

edited

Loading

phofl commented Dec 10, 2020

simonjayhawkins commented Dec 15, 2020

ghost9023 commented Dec 16, 2020

ryansrobin commented Dec 17, 2020

mcarans commented Jan 20, 2021

simonjayhawkins commented Jan 21, 2021

mcarans commented Jan 21, 2021

simonjayhawkins commented Jan 22, 2021

BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401

BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401

Comments

ghost9023 commented Dec 10, 2020 • edited Loading

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

phofl commented Dec 10, 2020

simonjayhawkins commented Dec 15, 2020

ghost9023 commented Dec 16, 2020

ryansrobin commented Dec 17, 2020

mcarans commented Jan 20, 2021

simonjayhawkins commented Jan 21, 2021

mcarans commented Jan 21, 2021

simonjayhawkins commented Jan 22, 2021

ghost9023 commented Dec 10, 2020 •

edited

Loading

Output of `pd.show_versions()`