REGR: ExtensionArray aggregation on non-numeric types fails #38980
Labels
ExtensionArray
Extending pandas with custom dtypes or arrays.
Groupby
Regression
Functionality that used to work in a prior pandas version
Milestone
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Expected:
Actual fails with error:
Problem description
Calling groupby on an ExtensionArray and aggregation on a string column fails with the error:
NotImplementedError: string
. This is because an optimized cython aggregation is attempted and when fails, the error string is improperly checked and does not fallback to a non-cython aggregation.This was found with the extension arrays from https://github.com/CODAIT/text-extensions-for-pandas/blob/master/text_extensions_for_pandas/io/bert.py#L217 and full error log can be seen here https://github.com/CODAIT/text-extensions-for-pandas/pull/157/checks?check_run_id=1646927381#step:5:1438
Expected Output
Pandas should handle the NotImplementedError and fallback to a non-cython aggregation implementation.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : 3e89b4c
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-58-generic
Version : #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.2.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 49.6.0.post20201009
Cython : None
pytest : 6.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: