BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nan #35392

mhabets · 2020-07-23T13:53:57Z

Code Sample, a copy-pastable example

axis = pd.Index([np.nan, 'var1', np.nan])
axis.get_indexer_for([np.nan])

Current Output

array([-1], dtype=int64)
Meaning that np.nan is not in axis which is incorrect and makes df.drop(columns=[np.nan]) to fail when columns contains multiple nan.

Expected Output

array([0, 2], dtype=int64)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None

pandas : 1.0.5
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.20.3
sphinx : 3.1.2
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : None
gcsfs : None
lxml.etree : 4.5.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.3
pyxlsb : None
s3fs : None
scipy : 1.5.0
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.9
numba : 0.50.1

The text was updated successfully, but these errors were encountered:

simonjayhawkins · 2020-07-23T14:42:20Z

Thanks @mhabets for the report. I can confirm that this issue persists on master and also on 0.25.3, so not a recent regression.

SanthoshBala18 · 2020-07-30T11:31:25Z

One additional point to be noted, if we pass any other value along with np.nan, output is returned as expected.
For Example:

axis = pd.Index([np.nan, 'var1', np.nan])
axis.get_indexer_for([np.nan, 'var1'])

This returns: array([0, 2, 1])

…iple nan (pandas-dev#35392)

jbrockmendel · 2021-06-28T19:56:10Z

One additional point to be noted, if we pass any other value along with np.nan, output is returned as expected

Not quite. If we pass another value that is also numeric, the problem remains. When the input is all-numeric, it gets coerced to a numeric-dtype Index. Then when passed to get_indexer_for it does an astype(object), which results in a new NaN object, leading to set-semantics problems.

@realead think we should just avoid using python sets altogether here? see discussion in #35498

…iple nan (pandas-dev#35392)

…iple nan (#35392) (#35498)

…iple nan (pandas-dev#35392) (pandas-dev#35498)

mhabets added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 23, 2020

mhabets changed the title ~~BUG: Index.get_indexer_non_unique misbehaves when passed with duplicated nan~~ BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nans Jul 23, 2020

mhabets changed the title ~~BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nans~~ BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nan Jul 23, 2020

simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 23, 2020

simonjayhawkins added this to the Contributions Welcome milestone Jul 23, 2020

alexhlim added a commit to alexhlim/pandas that referenced this issue Jul 31, 2020

BUG: Index.get_indexer_non_unique misbehaves when index contains mult…

d041117

…iple nan (pandas-dev#35392)

alexhlim added a commit to alexhlim/pandas that referenced this issue Jul 31, 2020

BUG: Index.get_indexer_non_unique misbehaves when index contains mult…

f7bcc10

…iple nan (pandas-dev#35392)

alexhlim mentioned this issue Jul 31, 2020

BUG: Index.get_indexer_non_unique misbehaves with multiple nan #35498

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.3 Dec 28, 2020

simonjayhawkins removed this from the 1.3 milestone Jun 11, 2021

jbrockmendel mentioned this issue Jun 28, 2021

BUG: get_indexer_non_unique with np.nan #42289

Closed

4 tasks

alexhlim added a commit to alexhlim/pandas that referenced this issue Jul 2, 2021

BUG: Index.get_indexer_non_unique misbehaves when index contains mult…

85dd25c

…iple nan (pandas-dev#35392)

jreback added this to the 1.4 milestone Aug 4, 2021

jreback closed this as completed in #35498 Aug 5, 2021

jreback pushed a commit that referenced this issue Aug 5, 2021

BUG: Index.get_indexer_non_unique misbehaves when index contains mult…

bf267b4

…iple nan (#35392) (#35498)

feefladder pushed a commit to feefladder/pandas that referenced this issue Sep 7, 2021

BUG: Index.get_indexer_non_unique misbehaves when index contains mult…

88c999c

…iple nan (pandas-dev#35392) (pandas-dev#35498)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nan #35392

BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nan #35392

mhabets commented Jul 23, 2020 •

edited

Loading

INSTALLED VERSIONS

simonjayhawkins commented Jul 23, 2020

SanthoshBala18 commented Jul 30, 2020

jbrockmendel commented Jun 28, 2021

BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nan #35392

BUG: Index.get_indexer_non_unique misbehaves when index contains multiple nan #35392

Comments

mhabets commented Jul 23, 2020 • edited Loading

Code Sample, a copy-pastable example

Current Output

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

simonjayhawkins commented Jul 23, 2020

SanthoshBala18 commented Jul 30, 2020

jbrockmendel commented Jun 28, 2021

mhabets commented Jul 23, 2020 •

edited

Loading

Output of `pd.show_versions()`