-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: .groupby().min() .max() .agg('min') .agg('max') ERROR #38401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi thanks for your report. Could you provide an example where the df is constructed in code and not from an external file? |
the data fails to parse, maybe Please read this guide detailing how to provide the necessary information for us to reproduce your bug and how to create minimal bug reports |
I think this error seems to be not bug. When I call .group.agg('min')['Age'], error occurred. But when I call .group['Age'].agg('min'), error did not occurred. It seems process step, such as loop, might be in the group calculating function. .group['Age'].agg('min') means do first grouping, second get only 'Age', then calculate min value of 'Age' So in this case, if any column that can't be calculated min, max value exists, this error occurs. I think it would be good to do exception handling and error message update for this process. It's not a bug, but from a user's point of view, it's very hard to see the problem if error message or error code is not displayed or just raising python attribute error only. |
To add for any others that trip across this like myself. Previous versions it felt like you could just use .max() on any dataframe. After upgrading to pandas 1.1.5, it seems that you should clean your dataframe further to ensure there are no null values. Good luck! |
Similar to @ryansrobin, the problem I encountered with min() in Pandas 1.2.1 which didn't occur with 1.0.5 was solved by removing a column that had NaNs in it. |
@ryansrobin @mcarans If you have a reproducible example, feel free to open a new issue. |
@simonjayhawkins I have done so here: #39329 |
Thanks @mcarans |
[O] I have checked that this issue has not already been reported.
I checked it but this bug reported that already solved. but in my pandas, still occur.
(AssertionError when grouping with max/min as aggregation functions (pandas-1.0.0) #31522)
[O] I have confirmed this bug exists on the latest version of pandas.
I've checked latest version at anaconda
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
I found same error occurred at older version, 1.0.0, 1.0.1. (#31522)
At that ticket, after 1.0.3, this bug solved.
But I tested, Windows10 64bit python3.6 pandas 1.0.1, 1.0.3, 1.0.5, 1.1.3 and CentOS7.5 python3.8 pandas 1.0.5, same error occured.
Just like this. Same code, same data at Windows 10 and CentOS 7
""""""""""""""""""""""""""""""""""""""""""""""
Traceback (most recent call last):
File "E:/PythonProjects/Deepphi/local-dev/pandaserror.py", line 11, in
print(class_group.agg('min')) # error occur. also .agg('max'), .min(), .max()
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 951, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\base.py", line 307, in _aggregate
return self._try_aggregate_string_function(arg, *args, **kwargs), None
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\base.py", line 263, in _try_aggregate_string_function
return f(*args, **kwargs)
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\groupby.py", line 1552, in min
numeric_only=numeric_only, min_count=min_count, alias="min", npfunc=np.min
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\groupby.py", line 1000, in _agg_general
how=alias, alt=npfunc, numeric_only=numeric_only, min_count=min_count,
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 1022, in _cython_agg_general
how, alt=alt, numeric_only=numeric_only, min_count=min_count
File "C:\Users\deepnoid-workstation\Anaconda3_64bit\envs\dev\lib\site-packages\pandas\core\groupby\generic.py", line 1135, in _cython_agg_blocks
assert len(locs) == result.shape[1]
AssertionError
Process finished with exit code 1
""""""""""""""""""""""""""""""""""""""""""""""
CentOS
""""""""""""""""""""""""""""""""""""""""""""""
Traceback (most recent call last):
File "pandaserror.py", line 24, in
print(class_group.agg('min'))
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 928, in aggregate
result, how = self._aggregate(func, *args, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/base.py", line 311, in _aggregate
return self._try_aggregate_string_function(arg, *args, **kwargs), None
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/base.py", line 267, in _try_aggregate_string_function
return f(*args, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 1372, in f
return self._cython_agg_general(alias, alt=npfunc, **kwargs)
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 993, in _cython_agg_general
agg_blocks, agg_items = self._cython_agg_blocks(
File "/conda/anaconda3/lib/python3.8/site-packages/pandas/core/groupby/generic.py", line 1100, in _cython_agg_blocks
assert len(locs) == result.shape[1]
AssertionError
""""""""""""""""""""""""""""""""""""""""""""""
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : db08276
python : 3.6.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.4.0.post20200518
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.49.1
None
INSTALLED VERSIONS
commit : None
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1062.12.1.el7.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : ko_KR.UTF-8
LOCALE : ko_KR.UTF-8
pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : None
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.9
numba : 0.50.1
None
The text was updated successfully, but these errors were encountered: