Skip to content

API: idx.get_indexer(keys) fails if idx is non-unique, even if keys in idx are unique #38773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ivirshup opened this issue Dec 29, 2020 · 0 comments
Labels
API Design Bug Index Related to the Index class or subclasses Needs Discussion Requires discussion from core team before further action

Comments

@ivirshup
Copy link
Contributor

ivirshup commented Dec 29, 2020

Code Sample, a copy-pastable example

import pandas as pd

pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Came across this here: #38745 (comment)

Problem description

It makes sense to me why this would error:

pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["a", "c"]))

There isn't a unique solution. For the case presented at the top, there is a unique solution.

Expected Output

pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# array([2, 3])

Note, this is the behaviour of pd.Index._get_indexer on master

Output of pd.show_versions()

master
INSTALLED VERSIONS
------------------
commit           : 7f912a4009da963b5eacdcebb638c38eec06e7a7
python           : 3.8.5.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 19.6.0
Version          : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.0.dev0+228.g7f912a4009
numpy            : 1.18.5
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.1.1
setuptools       : 49.2.0.post20200712
Cython           : 0.29.21
pytest           : 5.4.3
hypothesis       : 5.20.3
sphinx           : 3.1.1
blosc            : None
feather          : None
xlsxwriter       : 1.2.9
lxml.etree       : 4.5.2
html5lib         : 1.1
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.16.1
pandas_datareader: None
bs4              : 4.9.1
bottleneck       : 1.3.2
fsspec           : 0.7.4
fastparquet      : 0.4.1
gcsfs            : 0.6.2
matplotlib       : 3.2.2
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.4
pandas_gbq       : None
pyarrow          : 0.17.1
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.1
sqlalchemy       : 1.3.18
tables           : 3.6.1
tabulate         : 0.8.7
xarray           : 0.16.0
xlrd             : 1.2.0
xlwt             : 1.3.0
numba            : 0.50.1
1.2.0

pd.show_versions() is failing with: ImportError: Can't determine version for numba, so here's conda list:

$ conda list
# packages in environment at /Users/isaac/miniconda3/envs/pandas-1.2:
#
# Name                    Version                   Build  Channel
appnope                   0.1.2           py38hecd8cb5_1001  
backcall                  0.2.0                      py_0  
blas                      1.0                         mkl  
ca-certificates           2020.12.8            hecd8cb5_0  
certifi                   2020.12.5        py38hecd8cb5_0  
decorator                 4.4.2                      py_0  
intel-openmp              2019.4                      233  
ipython                   7.19.0           py38h01d92e1_0  
ipython_genutils          0.2.0              pyhd3eb1b0_1  
jedi                      0.17.0                   py38_0  
libcxx                    11.0.0               h4c3b8ed_1    conda-forge
libedit                   3.1.20191231         h1de35cc_1  
libffi                    3.3                  hb1e8313_2  
mkl                       2019.4                      233  
mkl-service               2.3.0            py38h9ed2024_0  
mkl_fft                   1.2.0            py38hc64f4ea_0  
mkl_random                1.1.1            py38h959d312_0  
ncurses                   6.2                  h0a44026_1  
numpy                     1.19.2           py38h456fd55_0  
numpy-base                1.19.2           py38hcfb5961_0  
openssl                   1.1.1i               h9ed2024_0  
pandas                    1.2.0            py38he9f00de_0    conda-forge
parso                     0.8.1              pyhd3eb1b0_0  
pexpect                   4.8.0              pyhd3eb1b0_3  
pickleshare               0.7.5           pyhd3eb1b0_1003  
pip                       20.3.3           py38hecd8cb5_0  
prompt-toolkit            3.0.8                      py_0  
ptyprocess                0.6.0              pyhd3eb1b0_2  
pygments                  2.7.3              pyhd3eb1b0_0  
python                    3.8.5                h26836e1_1  
python-dateutil           2.8.1                      py_0  
python_abi                3.8                      1_cp38    conda-forge
pytz                      2020.4             pyhd3eb1b0_0  
readline                  8.0                  h1de35cc_0  
setuptools                51.0.0           py38hecd8cb5_2  
six                       1.15.0           py38hecd8cb5_0  
sqlite                    3.33.0               hffcf06c_0  
tk                        8.6.10               hb0a8c7a_0  
traitlets                 5.0.5                      py_0  
wcwidth                   0.2.5                      py_0  
wheel                     0.36.2             pyhd3eb1b0_0  
xz                        5.2.5                h1de35cc_0  
zlib                      1.2.11               h1de35cc_3 
@ivirshup ivirshup added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 29, 2020
@jorisvandenbossche jorisvandenbossche changed the title BUG: idx.get_indexer(keys) fails if idx is non-unique, even if keys in idx are unique API: idx.get_indexer(keys) fails if idx is non-unique, even if keys in idx are unique Dec 29, 2020
@jorisvandenbossche jorisvandenbossche added API Design Index Related to the Index class or subclasses and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 29, 2020
@mroeschke mroeschke added Bug Needs Discussion Requires discussion from core team before further action labels Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Index Related to the Index class or subclasses Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants