Open
Description
Code Sample, a copy-pastable example
import pandas as pd
pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Came across this here: #38745 (comment)
Problem description
It makes sense to me why this would error:
pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["a", "c"]))
There isn't a unique solution. For the case presented at the top, there is a unique solution.
Expected Output
pd.Index(["a", "a", "b", "c"]).get_indexer(pd.Index(["b", "c"]))
# array([2, 3])
Note, this is the behaviour of pd.Index._get_indexer
on master
Output of pd.show_versions()
master
INSTALLED VERSIONS
------------------
commit : 7f912a4009da963b5eacdcebb638c38eec06e7a7
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0.dev0+228.g7f912a4009
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200712
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.20.3
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.16.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.1
gcsfs : 0.6.2
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.4
pandas_gbq : None
pyarrow : 0.17.1
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : 1.3.18
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.0
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.50.1
1.2.0
pd.show_versions()
is failing with: ImportError: Can't determine version for numba
, so here's conda list
:
$ conda list
# packages in environment at /Users/isaac/miniconda3/envs/pandas-1.2:
#
# Name Version Build Channel
appnope 0.1.2 py38hecd8cb5_1001
backcall 0.2.0 py_0
blas 1.0 mkl
ca-certificates 2020.12.8 hecd8cb5_0
certifi 2020.12.5 py38hecd8cb5_0
decorator 4.4.2 py_0
intel-openmp 2019.4 233
ipython 7.19.0 py38h01d92e1_0
ipython_genutils 0.2.0 pyhd3eb1b0_1
jedi 0.17.0 py38_0
libcxx 11.0.0 h4c3b8ed_1 conda-forge
libedit 3.1.20191231 h1de35cc_1
libffi 3.3 hb1e8313_2
mkl 2019.4 233
mkl-service 2.3.0 py38h9ed2024_0
mkl_fft 1.2.0 py38hc64f4ea_0
mkl_random 1.1.1 py38h959d312_0
ncurses 6.2 h0a44026_1
numpy 1.19.2 py38h456fd55_0
numpy-base 1.19.2 py38hcfb5961_0
openssl 1.1.1i h9ed2024_0
pandas 1.2.0 py38he9f00de_0 conda-forge
parso 0.8.1 pyhd3eb1b0_0
pexpect 4.8.0 pyhd3eb1b0_3
pickleshare 0.7.5 pyhd3eb1b0_1003
pip 20.3.3 py38hecd8cb5_0
prompt-toolkit 3.0.8 py_0
ptyprocess 0.6.0 pyhd3eb1b0_2
pygments 2.7.3 pyhd3eb1b0_0
python 3.8.5 h26836e1_1
python-dateutil 2.8.1 py_0
python_abi 3.8 1_cp38 conda-forge
pytz 2020.4 pyhd3eb1b0_0
readline 8.0 h1de35cc_0
setuptools 51.0.0 py38hecd8cb5_2
six 1.15.0 py38hecd8cb5_0
sqlite 3.33.0 hffcf06c_0
tk 8.6.10 hb0a8c7a_0
traitlets 5.0.5 py_0
wcwidth 0.2.5 py_0
wheel 0.36.2 pyhd3eb1b0_0
xz 5.2.5 h1de35cc_0
zlib 1.2.11 h1de35cc_3