-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
groupby nunique makes inplace replacement of NaN values to -9.223372036854776e18 #32632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. I think this is the same issue as #16674 |
Apologies for not being clearer. I think the issues are related, but not quite the same. There are two issues here:
The second issue here is not a duplicate. |
Thanks for clarifying. @nicholasyli Could you clarify the issue title and description highlighting the 2nd issue. |
Thanks @mroeschke, hopefully that's clearer. |
Perfect, thank you. |
I think this was fixed on master by #32175 |
Thanks for the catch @dsaxton. Does look fixed on master
|
Thanks all, sorry for missing the previous issue. |
Code Sample, a copy-pastable example if possible
Problem description
Using DataFrameGroupBy.nunique on a dataframe makes an inplace replacement of NaN values. Simply applying the function (and not assigning it to anything) will replace existing NaN's with -9223372036854775808.
There are two distinct issues:
Expected Output
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
My Output
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.0
1 1 2.0
2 1 NaN
3 2 NaN
4 2 NaN
5 2 6.0
x y
0 1 1.000000e+00
1 1 2.000000e+00
2 1 -9.223372e+18
3 2 -9.223372e+18
4 2 -9.223372e+18
5 2 6.000000e+00
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Linux
OS-release : 2.6.32-754.27.1.el6.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : None
pytest : None
hypothesis : None
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0
The text was updated successfully, but these errors were encountered: