-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Series mode crashes when there is more than one mode #38534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Problem seems to arise here: pandas/pandas/core/groupby/generic.py Lines 480 to 485 in ca3e351
raises, but ensuring the single mode group comes first
gives the output
|
Is the validation on only the first group a time-saving optimization or intended behaviour for any reason? |
@mzeitlin11 I would like to take this up if you haven't already started work on this. |
I haven't, go ahead! |
take |
Hello, Are there any updates on this? I recently came across the same issue too. I also faced an issue with the mode function but on further checking, I found that the issue is not just with mode, it is basically any function that can possibly give a multi-value result. For instance, just for a contrived example with |
I would like to help if I can but I don't have any experience in deep-diving into pandas. @skvrahul @mzeitlin11 Please let me know if there's anything I can do on this issue. |
I apologize for not having gotten to this earlier. After doing some digging, This is where the Error is being thrown: pandas/pandas/_libs/reduction.pyx Lines 88 to 94 in dad3e7f
pandas/pandas/_libs/reduction.pyx Lines 29 to 37 in dad3e7f
Now the below code snippet doesn't throw an error since the aggregation function returns the same dtype as the Series being aggregated. import pandas as pd
def sum_list(s):
return [sum(x) for x in s]
df = pd.DataFrame(
{
"data": [1, 1, 1],
"value": [[1,2,3], [4,0,8], [11,7,9]]
}
)
print(df)
'''
data value
0 1 [1, 2, 3]
1 1 [4, 0, 8]
2 1 [11, 7, 9]
'''
print(df.value.dtype) # dtype = object
result = df.groupby("data").agg({"value": sum_list})
print(result)
'''
data value
1 [6, 12, 27]
'''
print(result.value.dtype) # dtype = object |
I guess it is up to the others to weigh in here whether this is intentional and expected behaviour or needs any changes? |
[ x] I have checked that this issue has not already been reported.
[ x] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
This line of code crashes with
ValueError: Must produce aggregated value
Expected Output
[0,1]
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : b5958ee
python : 3.7.9.final.0
python-bits : 64
OS : Darwin
OS-release : 20.1.0
Version : Darwin Kernel Version 20.1.0: Sat Oct 31 00:07:11 PDT 2020; root:xnu-7195.50.7~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.5
numpy : 1.19.2
pytz : 2020.4
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.0.0.post20201207
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
The text was updated successfully, but these errors were encountered: