Skip to content

Code for list of partial indexers should be in MultiIndex.get_indexer #27181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toobaz opened this issue Jul 2, 2019 · 3 comments
Open

Code for list of partial indexers should be in MultiIndex.get_indexer #27181

toobaz opened this issue Jul 2, 2019 · 3 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Refactor Internal refactoring of code

Comments

@toobaz
Copy link
Member

toobaz commented Jul 2, 2019

Code Sample, a copy-pastable example if possible

From #27180

In [5]: midx = pd.MultiIndex.from_product([[0, 1], [2, 3]])                                                                                                                                                                                   

In [6]: midx.get_loc(0)                                                                                                                                                                                                                       
Out[6]: slice(0, 2, None)

In [7]: midx.get_indexer([0])                                                                                                                                                                                                                 
Out[7]: array([-1])

Problem description

Indexing with lists of partial keys works (except for #18345 )

In [5]: midx = pd.MultiIndex.from_product([[0, 1], [2, 3]])                                                                                                                                                                                   

In [6]: midx.get_loc(0)                                                                                                                                                                                                                       
Out[6]: slice(0, 2, None)

In [7]: midx.get_indexer([0])                                                                                                                                                                                                                 
Out[7]: array([-1])

but it is not implemented in get_indexer() - where it should be.

Expected Output

slice(0, 2, None)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7ceefb3
python : 3.7.3.candidate.1
python-bits : 64
OS : Linux
OS-release : 4.9.0-9-amd64
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : it_IT.UTF-8
LOCALE : it_IT.UTF-8

pandas : 0.25.0.dev0+861.g7ceefb3f2.dirty
numpy : 1.16.4
pytz : 2016.7
dateutil : 2.8.0
pip : 9.0.1
setuptools : 41.0.1
Cython : 0.29.2
pytest : 4.6.3
hypothesis : 3.71.11
sphinx : 1.4.9
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : 4.3.2
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: 0.2.1
bs4 : 4.5.3
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.2
matplotlib : 3.0.2
numexpr : 2.6.9
openpyxl : 2.3.0
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.0.15
tables : 3.4.4
xarray : None
xlrd : 1.0.0
xlwt : 1.3.0
xlsxwriter : 0.9.6

@toobaz toobaz added MultiIndex Index Related to the Index class or subclasses labels Jul 2, 2019
@toobaz
Copy link
Member Author

toobaz commented Jul 2, 2019

In fact, it seems to be in MultiIndex.get_locs. Which seems to do a subset of what MultiIndex.get_indexer does, with the only difference that it raises on missing keys. Any objection to deprecating it?

@toobaz
Copy link
Member Author

toobaz commented Jul 2, 2019

According to the docstring for get_locs, it accepts

  • label (duplicates get_loc)
  • slice (not supported by other indexes, implemented directly in indexing code)
  • list (duplicates get_indexer)
  • mask - doesn't work, but list of masks does (hardly useful - equivalent to calling np.where on each item and combining)

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Feb 22, 2020
@mroeschke mroeschke added the Bug label May 5, 2020
@jbrockmendel
Copy link
Member

Which seems to do a subset of what MultiIndex.get_indexer does

A lot of the action also goes on in MultiIndex._convert_listlike_indexer, which handles (some) cases where we are indexing only along the first level of the MultiIndex. In those cases, get_indexer would raise. i.e. for MultiIndex get_indexer does not behave like a vectorized get_loc

mi = pd.MultiIndex.from_product([range(3), ["A", "B"]])
key = mi.get_level_values(0)
>>> mi.get_indexer(key)
array([-1, -1, -1, -1, -1, -1])

If we edit MultiIndex.get_indexer to behave like vectorized get_loc, that breaks MultiIndex.reindex and set operations, for which we dont want to allow partial indexing

@jbrockmendel jbrockmendel added Refactor Internal refactoring of code and removed Bug Index Related to the Index class or subclasses labels Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Refactor Internal refactoring of code
Projects
None yet
Development

No branches or pull requests

3 participants