Skip to content

API: idx.get_indexer(keys) fails if idx is non-unique, even if keys in idx are unique #38773

Open
@ivirshup

Description

@ivirshup

Code Sample, a copy-pastable example

import pandas as pd

pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Came across this here: #38745 (comment)

Problem description

It makes sense to me why this would error:

pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["a", "c"]))

There isn't a unique solution. For the case presented at the top, there is a unique solution.

Expected Output

pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# array([2, 3])

Note, this is the behaviour of pd.Index._get_indexer on master

Output of pd.show_versions()

master
INSTALLED VERSIONS
------------------
commit           : 7f912a4009da963b5eacdcebb638c38eec06e7a7
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.0.dev0+228.g7f912a4009
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 49.2.0.post20200712
Cython           : 0.29.21
pytest           : 5.4.3
hypothesis       : 5.20.3
sphinx           : 3.1.1
blosc            : None
feather          : None
xlsxwriter       : 1.2.9
lxml.etree       : 4.5.2
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.16.1
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : 1.3.2
fsspec           : 0.7.4
fastparquet      : 0.4.1
gcsfs            : 0.6.2
matplotlib       : 3.2.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : None
pyarrow          : 0.17.1
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.0
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.50.1
1.2.0

pd.show_versions() is failing with: ImportError: Can't determine version for numba, so here's conda list:

$ conda list
# packages in environment at /Users/isaac/miniconda3/envs/pandas-1.2:
#
# Name                    Version                   Build  Channel
appnope                   0.1.2           py38hecd8cb5_1001  
backcall                  0.2.0                      py_0  
blas                      1.0                         mkl  
ca-certificates           2020.12.8            hecd8cb5_0  
certifi                   2020.12.5        py38hecd8cb5_0  
decorator                 4.4.2                      py_0  
intel-openmp              2019.4                      233  
ipython                   7.19.0           py38h01d92e1_0  
ipython_genutils          0.2.0              pyhd3eb1b0_1  
jedi                      0.17.0                   py38_0  
libcxx                    11.0.0               h4c3b8ed_1    conda-forge
libedit                   3.1.20191231         h1de35cc_1  
libffi                    3.3                  hb1e8313_2  
mkl                       2019.4                      233  
mkl-service               2.3.0            py38h9ed2024_0  
mkl_fft                   1.2.0            py38hc64f4ea_0  
mkl_random                1.1.1            py38h959d312_0  
ncurses                   6.2                  h0a44026_1  
numpy                     1.19.2           py38h456fd55_0  
numpy-base                1.19.2           py38hcfb5961_0  
openssl                   1.1.1i               h9ed2024_0  
pandas                    1.2.0            py38he9f00de_0    conda-forge
parso                     0.8.1              pyhd3eb1b0_0  
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pip                       20.3.3           py38hecd8cb5_0  
prompt-toolkit            3.0.8                      py_0  
ptyprocess                0.6.0              pyhd3eb1b0_2  
pygments                  2.7.3              pyhd3eb1b0_0  
python                    3.8.5                h26836e1_1  
python-dateutil           2.8.1                      py_0  
python_abi                3.8                      1_cp38    conda-forge
pytz                      2020.4             pyhd3eb1b0_0  
readline                  8.0                  h1de35cc_0  
setuptools                51.0.0           py38hecd8cb5_2  
six                       1.15.0           py38hecd8cb5_0  
sqlite                    3.33.0               hffcf06c_0  
tk                        8.6.10               hb0a8c7a_0  
traitlets                 5.0.5                      py_0  
wcwidth                   0.2.5                      py_0  
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h1de35cc_0  
zlib                      1.2.11               h1de35cc_3 

Metadata

Metadata

Assignees

No one assigned

    Labels

    API DesignBugIndexRelated to the Index class or subclassesNeeds DiscussionRequires discussion from core team before further action

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions