Skip to content

error that converts nan to non-nan float value via groupby and nunique #32054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mnky9800n opened this issue Feb 17, 2020 · 1 comment
Closed

Comments

@mnky9800n
Copy link

mnky9800n commented Feb 17, 2020

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
print(pd.__version__)
>> '1.0.1'
df = pd.DataFrame({'a':[1,2,3], 'b':[np.nan, np.nan, np.nan], 'c':['d', 'e', 'f']}, index=[0, 1, 2])
print(df)
print(df.loc[1].b)
# >> nan
df.groupby('a').nunique()
print(df)
#    a             b  c
# 0  1 -9.223372e+18  d
# 1  2 -9.223372e+18  e
# 2  3 -9.223372e+18  f
print(df.c.unique())
print(df)
# print(df.groupby('a').nunique().c.unique())
# >> array([1])
print(df.loc[1].b)
# >> -9.223372036854776e+18

Problem description

When a groupby operation followed by an nunique operation, nans are converted to non-nan float values. According to the documentation, the nunique function should return a series but should not update the original dataframe. Also according to the documentation you can ignore nans, however this seems to still change the value nans to a non-nan number.

This does not happen if nunique is applied without the groupby function.

Expected Output

The expected output is that nan values should not be converted to any other value based on an aggregrate operation on the dataframe.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.10.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-45-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200210
Cython : 0.29.15
pytest : 5.3.5
hypothesis : 5.4.1
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.5
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.48.0

@MarcoGorelli
Copy link
Member

Thanks @mnky9800n - seems like a duplicate of #31950 (which is in the 1.0.2 milestone), so I'll close this for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants