Skip to content

BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
ghost9023 opened this issue Dec 10, 2020 · 8 comments
Closed
1 task

BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401

ghost9023 opened this issue Dec 10, 2020 · 8 comments
Labels
Bug Groupby Needs Info Clarification about behavior needed to assess issue

Comments

@ghost9023
Copy link

ghost9023 commented Dec 10, 2020


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# data given from https://www.kaggle.com/hesh97/titanicdataset-traincsv/data
import pandas as pd

df = pd.read_csv('./train.csv')
df.head()

class_group = df.groupby('Pclass')
print(class_group.groups)  # pass
print(class_group.mean()['Survived'])  # pass
print(class_group.median())  # pass
print(class_group.agg('min'))  # error occur. also .agg('max'), .min(), .max()

Problem description

I found same error occurred at older version, 1.0.0, 1.0.1. (#31522)

At that ticket, after 1.0.3, this bug solved.

But I tested, Windows10 64bit python3.6 pandas 1.0.1, 1.0.3, 1.0.5, 1.1.3 and CentOS7.5 python3.8 pandas 1.0.5, same error occured.

Just like this. Same code, same data at Windows 10 and CentOS 7

  • Windows
    """"""""""""""""""""""""""""""""""""""""""""""
    Traceback (most recent call last):
    File "E:/PythonProjects/Deepphi/local-dev/pandaserror.py", line 11, in
    print(class_group.agg('min')) # error occur. also .agg('max'), .min(), .max()
    File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 951, in aggregate
    result, how = self._aggregate(func, *args, **kwargs)
    File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\base.py", line 307, in _aggregate
    return self._try_aggregate_string_function(arg, *args, **kwargs), None
    File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\base.py", line 263, in _try_aggregate_string_function
    return f(*args, **kwargs)
    File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\groupby.py", line 1552, in min
    numeric_only=numeric_only, min_count=min_count, alias="min", npfunc=np.min
    File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\groupby.py", line 1000, in _agg_general
    how=alias, alt=npfunc, numeric_only=numeric_only, min_count=min_count,
    File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 1022, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only, min_count=min_count
    File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 1135, in _cython_agg_blocks
    assert len(locs) == result.shape[1]
    AssertionError

Process finished with exit code 1
""""""""""""""""""""""""""""""""""""""""""""""

CentOS
""""""""""""""""""""""""""""""""""""""""""""""
Traceback (most recent call last):
File "pandaserror.py", line 24, in
print(class_group.agg('min'))
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 928, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/base.py", line 311, in _aggregate
return self._try_aggregate_string_function(arg, *args, **kwargs), None
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/base.py", line 267, in _try_aggregate_string_function
return f(*args, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1372, in f
return self._cython_agg_general(alias, alt=npfunc, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 993, in _cython_agg_general
agg_blocks, agg_items = self._cython_agg_blocks(
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 1100, in _cython_agg_blocks
assert len(locs) == result.shape[1]
AssertionError
""""""""""""""""""""""""""""""""""""""""""""""

Expected Output

Output of pd.show_versions()

  • Windows
    INSTALLED VERSIONS

commit : db08276
python : 3.6.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.4.0.post20200518
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.1
None

  • CentOS
    INSTALLED VERSIONS

commit : None
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.12.1.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ko_KR.UTF-8
LOCALE : ko_KR.UTF-8

pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : None
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.9
numba : 0.50.1
None

@ghost9023 ghost9023 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 10, 2020
@phofl
Copy link
Member

phofl commented Dec 10, 2020

Hi thanks for your report. Could you provide an example where the df is constructed in code and not from an external file?

@rhshadrach rhshadrach added Groupby Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 11, 2020
@simonjayhawkins
Copy link
Member

the data fails to parse, maybe 'O'Dwyer, Miss. Ellen "Nellie"'

Please read this guide detailing how to provide the necessary information for us to reproduce your bug and how to create minimal bug reports

@ghost9023
Copy link
Author

I think this error seems to be not bug.

When I call .group.agg('min')['Age'], error occurred.

But when I call .group['Age'].agg('min'), error did not occurred.

It seems process step, such as loop, might be in the group calculating function.

.group['Age'].agg('min') means do first grouping, second get only 'Age', then calculate min value of 'Age'
.group.agg('min')['Age'] means do first grouping, second calculate min value of every columns, then get only min value of 'Age'

So in this case, if any column that can't be calculated min, max value exists, this error occurs.

I think it would be good to do exception handling and error message update for this process. It's not a bug, but from a user's point of view, it's very hard to see the problem if error message or error code is not displayed or just raising python attribute error only.

@ryansrobin
Copy link

To add for any others that trip across this like myself. Previous versions it felt like you could just use .max() on any dataframe. After upgrading to pandas 1.1.5, it seems that you should clean your dataframe further to ensure there are no null values.

Good luck!

@mcarans
Copy link

mcarans commented Jan 20, 2021

Similar to @ryansrobin, the problem I encountered with min() in Pandas 1.2.1 which didn't occur with 1.0.5 was solved by removing a column that had NaNs in it.

@simonjayhawkins
Copy link
Member

@ryansrobin @mcarans If you have a reproducible example, feel free to open a new issue.

@mcarans
Copy link

mcarans commented Jan 21, 2021

@simonjayhawkins I have done so here: #39329

@simonjayhawkins
Copy link
Member

Thanks @mcarans

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

6 participants