Skip to content

BUG: get_indexer_non_unique() does not handle targets of dtype='object' with NaNs correctly #44482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
johannes-mueller opened this issue Nov 16, 2021 · 0 comments · Fixed by #44483
Closed
3 tasks done
Labels
Bug Index Related to the Index class or subclasses
Milestone

Comments

@johannes-mueller
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

idx = pd.Index([1.0, 2.0])
target = pd.Index([1, np.nan], dtype='object')

print(idx.get_indexer(target))
print(idx.get_indexer_non_unique(target))

Issue Description

The script outputs

[ 0 -1]
(array([0, 0, 1]), array([], dtype=int64))

It boils down to the fact that in object arrays with NaNs in them the are not sorted in as expected as discussed in numpy/numpy#15499.

This is one aspect of #44465 which needs to be split due to two different root causes.

Expected Behavior

Expected output

[ 0 -1]
(array([ 0, -1]), array([1]))

Installed Versions

INSTALLED VERSIONS ------------------ commit : 700be61 python : 3.8.12.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-90-lowlatency Version : #101-Ubuntu SMP PREEMPT Fri Oct 15 20:57:56 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8

pandas : 1.4.0.dev0+1132.g700be617eb
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 58.5.3
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.24.2
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.4
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.23.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : 0.7.1
gcsfs : 2021.11.0
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.0
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.2
sqlalchemy : 1.4.26
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

@johannes-mueller johannes-mueller added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 16, 2021
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Nov 16, 2021
…ndas-dev#44482)

numpy.searchsorted() does not handle NaNs in 'object' arrays as
expected (numpy/numpy/pandas-dev#15499).  Therefore we cannot search NaNs using binary
search.  So we use binary search only for targets without NaNs.
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Nov 16, 2021
…ndas-dev#44482)

numpy.searchsorted() does not handle NaNs in 'object' arrays as
expected (numpy/numpy#15499).  Therefore we cannot search NaNs using binary
search.  So we use binary search only for targets without NaNs.
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Nov 17, 2021
…ndas-dev#44482)

numpy.searchsorted() does not handle NaNs in 'object' arrays as
expected (numpy/numpy#15499).  Therefore we cannot search NaNs using binary
search.  So we use binary search only for targets without NaNs.
@mroeschke mroeschke added Index Related to the Index class or subclasses and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 21, 2021
@mroeschke mroeschke added this to the 1.4 milestone Nov 21, 2021
johannes-mueller added a commit to boschresearch/pandas that referenced this issue Nov 26, 2021
…ndas-dev#44482)

numpy.searchsorted() does not handle NaNs in 'object' arrays as
expected (numpy/numpy#15499).  Therefore we cannot search NaNs using binary
search.  So we use binary search only for targets without NaNs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Index Related to the Index class or subclasses
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants