-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nan #35392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
nan
s
nan
s
Thanks @mhabets for the report. I can confirm that this issue persists on master and also on 0.25.3, so not a recent regression. |
One additional point to be noted, if we pass any other value along with np.nan, output is returned as expected.
This returns: |
Not quite. If we pass another value that is also numeric, the problem remains. When the input is all-numeric, it gets coerced to a numeric-dtype Index. Then when passed to get_indexer_for it does an astype(object), which results in a new NaN object, leading to set-semantics problems. @realead think we should just avoid using python sets altogether here? see discussion in #35498 |
Code Sample, a copy-pastable example
Current Output
array([-1], dtype=int64)
Meaning that
np.nan
is not in axis which is incorrect and makesdf.drop(columns=[np.nan])
to fail when columns contains multiplenan
.Expected Output
array([0, 2], dtype=int64)
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.20.3
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.9
numba : 0.50.1
The text was updated successfully, but these errors were encountered: