-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
AssertionError when grouping with max/min as aggregation functions (pandas-1.0.0) #31522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. That assert is from #29035 (cc @jbrockmendel) We do min on object dtype, which is NotImplemented in Cython, so fall back to the python agg. Then in result = cast(DataFrame, result)
# unwrap DataFrame to get array
assert len(result._data.blocks) == 1
result = result._data.blocks[0].values
if isinstance(result, np.ndarray) and result.ndim == 1:
result = result.reshape(1, -1) the ` assert len(result._data.blocks) == 1 fails (Pdb) pp result
key2 key3
key1
a one six
b one six and we fall through to the FYI @marcevrard , we publish release candidates and nightly builds, if you want to catch these before the release. You can select watch pandas' "Release only" if nightly builds aren't an option. |
I guess the faulty assumption is that a groupby aggregation on an single Block won't split it into multiple blocks. This apparently isn't true for object blocks (Pdb) obj._data.blocks
(ObjectBlock: slice(0, 2, 1), 2 x 5, dtype: object,)
(Pdb) result._data.blocks
(ObjectBlock: slice(0, 1, 1), 1 x 2, dtype: object, ObjectBlock: slice(1, 2, 1), 1 x 2, dtype: object) |
I guess _split_and_operate would have to be called in there somehow. Easiest solution would be to raise TypeError, which should revert to the previous behavior. Longer-term we probably should be handling that case without raising. |
Looking into this today. |
* REGR: Fixed AssertionError in groupby Closes #31522
Thank you for the quick fix, I confirm it indeed works again with the 1.0.1 version. |
I'm still seeing this error in
Any idea why this might still be happening? @TomAugspurger Thanks! |
@zking1219 if you have a minimal example I'd recommend opening a new issue. |
I can see how that might help, I'll work on putting one together. Thanks! |
I am still getting this error message running pandas 1.0.5. I switched back to 0.25.1 and it is working just fine. My dataset is a little complicated and I don't have time to put together a minimal example now, but thought you would want to know that this still seems to be a problem. |
Seeing the same error in the pandas-1.1.0 version as well |
I got the same error message... The 1465th row has 43 columns instead of 42. But when I have deleted the 42nd column (Ie the 43rd) nothing got better. I still get the same error message. |
I did not see the error again in 1.1.0. I followed the procedure to recompile all the packages and also I identified that it had also occurred since I had some NaN in my data which I have fixed. Before was not getting assert error even in the presence of NaN |
Code Sample
Problem description
Since
pandas-1.0.0
, anAssertionError
is thrown when grouping aDataFrame
by a key and usingmax
/min
as aggregation functions. It works fine if only 1 key (other than the grouping key) is of the typeobject
in the DataFrame, but it doesn't when the number of keys of typeobject
is bigger than 1 (as shown in the example). This configuration worked fine on previous versions ofpandas
(e.g.,pandas-0.25.3
).Expected Output
Output of
pd.show_versions()
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.2.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200127
Cython : 0.29.14
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
tabulate : 0.8.3
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: