Skip to content

Indexing a mulitIndex with a series fails #14730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
owlas opened this issue Nov 24, 2016 · 9 comments
Closed

Indexing a mulitIndex with a series fails #14730

owlas opened this issue Nov 24, 2016 · 9 comments

Comments

@owlas
Copy link

owlas commented Nov 24, 2016

Code Sample

import pandas as pd
index = pd.Index([1,2,3])
x = pd.Series(index=index, data=range(3))
y = pd.Series([1,3])
x.loc[[1,3]] # can index series with list
x.loc[y] # can index series with another series

index = pd.MultiIndex.from_product([[1, 2, 3], ['A', 'B', 'C']])
x = pd.Series(index=index, data=range(9))
y = pd.Series([1,3])
x.loc[[1,3]] # can index series with list
x.loc[y] # cannot index series with another series

Problem description

In pandas you can index a DataFrame or Series with a list, ndarray, series, etc. But this fails when using a muliIndex.

Expected Output

No failures

Error message

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
     94             try:
---> 95                 return self.obj._xs(label, axis=axis)
     96             except:

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   1776             loc, new_index = self.index.get_loc_level(key,
-> 1777                                                       drop_level=drop_level)
   1778         else:

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in get_loc_level(self, key, level, drop_level)
   1795                 else:
-> 1796                     return partial_selection(key)
   1797             else:

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in partial_selection(key, indexer)
   1761                     if indexer is None:
-> 1762                         indexer = self.get_loc(key)
   1763                     ilevels = [i for i in range(len(key))

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in get_loc(self, key, method)
   1671         start, stop = (self.slice_locs(lead_key, lead_key)
-> 1672                        if lead_key else (0, len(self)))
   1673 

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in slice_locs(self, start, end, step, kind)
   1577         # happens in get_slice_bound method), but it adds meaningful doc.
-> 1578         return super(MultiIndex, self).slice_locs(start, end, step, kind=kind)
   1579 

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/base.py in slice_locs(self, start, end, step, kind)
   3175         if start is not None:
-> 3176             start_slice = self.get_slice_bound(start, 'left', kind)
   3177         if start_slice is None:

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in get_slice_bound(self, label, side, kind)
   1548             label = label,
-> 1549         return self._partial_tup_index(label, side=side)
   1550 

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in _partial_tup_index(self, tup, side)
   1591 
-> 1592             if lab not in lev:
   1593                 if not lev.is_type_compatible(lib.infer_dtype([lab])):

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/base.py in __contains__(self, key)
   1392     def __contains__(self, key):
-> 1393         hash(key)
   1394         # work around some kind of odd cython bug

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/generic.py in __hash__(self)
    830         raise TypeError('{0!r} objects are mutable, thus they cannot be'
--> 831                         ' hashed'.format(self.__class__.__name__))
    832 

TypeError: 'Series' objects are mutable, thus they cannot be hashed

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-7-5927dd14b2bf> in <module>()
      4 y = pd.Series([1,3])
      5 x.loc[[1,3]] # can index series with list
----> 6 x.loc[y] # cannot index series with another series

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1309             return self._getitem_tuple(key)
   1310         else:
-> 1311             return self._getitem_axis(key, axis=0)
   1312 
   1313     def _getitem_axis(self, key, axis=0):

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1480         # fall thru to straight lookup
   1481         self._has_valid_type(key, axis)
-> 1482         return self._get_label(key, axis=axis)
   1483 
   1484 

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
     95                 return self.obj._xs(label, axis=axis)
     96             except:
---> 97                 return self.obj[label]
     98         elif isinstance(label, tuple) and isinstance(label[axis], slice):
     99             raise IndexingError('no slices here, handle elsewhere')

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/series.py in __getitem__(self, key)
    640             key = check_bool_indexer(self.index, key)
    641 
--> 642         return self._get_with(key)
    643 
    644     def _get_with(self, key):

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/series.py in _get_with(self, key)
    653             if isinstance(key, tuple):
    654                 try:
--> 655                     return self._get_values_tuple(key)
    656                 except:
    657                     if len(key) == 1:

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/series.py in _get_values_tuple(self, key)
    701 
    702         # If key is contained, would have returned by now
--> 703         indexer, new_index = self.index.get_loc_level(key)
    704         return self._constructor(self._values[indexer],
    705                                  index=new_index).__finalize__(self)

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in get_loc_level(self, key, level, drop_level)
   1794                         return partial_selection(key)
   1795                 else:
-> 1796                     return partial_selection(key)
   1797             else:
   1798                 indexer = None

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in partial_selection(key, indexer)
   1760                 def partial_selection(key, indexer=None):
   1761                     if indexer is None:
-> 1762                         indexer = self.get_loc(key)
   1763                     ilevels = [i for i in range(len(key))
   1764                                if key[i] != slice(None, None)]

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in get_loc(self, key, method)
   1670         lead_key, follow_key = key[:i], key[i:]
   1671         start, stop = (self.slice_locs(lead_key, lead_key)
-> 1672                        if lead_key else (0, len(self)))
   1673 
   1674         if start == stop:

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in slice_locs(self, start, end, step, kind)
   1576         # This function adds nothing to its parent implementation (the magic
   1577         # happens in get_slice_bound method), but it adds meaningful doc.
-> 1578         return super(MultiIndex, self).slice_locs(start, end, step, kind=kind)
   1579 
   1580     def _partial_tup_index(self, tup, side='left'):

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/base.py in slice_locs(self, start, end, step, kind)
   3174         start_slice = None
   3175         if start is not None:
-> 3176             start_slice = self.get_slice_bound(start, 'left', kind)
   3177         if start_slice is None:
   3178             start_slice = 0

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in get_slice_bound(self, label, side, kind)
   1547         if not isinstance(label, tuple):
   1548             label = label,
-> 1549         return self._partial_tup_index(label, side=side)
   1550 
   1551     def slice_locs(self, start=None, end=None, step=None, kind=None):

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/multi.py in _partial_tup_index(self, tup, side)
   1590             section = labs[start:end]
   1591 
-> 1592             if lab not in lev:
   1593                 if not lev.is_type_compatible(lib.infer_dtype([lab])):
   1594                     raise TypeError('Level type mismatch: %s' % lab)

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/indexes/base.py in __contains__(self, key)
   1391 
   1392     def __contains__(self, key):
-> 1393         hash(key)
   1394         # work around some kind of odd cython bug
   1395         try:

/home/owl/miniconda3/envs/p3/lib/python3.5/site-packages/pandas/core/generic.py in __hash__(self)
    829     def __hash__(self):
    830         raise TypeError('{0!r} objects are mutable, thus they cannot be'
--> 831                         ' hashed'.format(self.__class__.__name__))
    832 
    833     def __iter__(self):

TypeError: 'Series' objects are mutable, thus they cannot be hashed

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-47-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.19.1
nose: None
pip: 8.1.2
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: 5.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Nov 24, 2016

So which should be the answer?

IIn [7]: x.loc[y.index.tolist()]
Out[7]: 
1  A    0
   B    1
   C    2
dtype: int64

In [8]: x.loc[y.tolist()]
Out[8]: 
1  A    0
   B    1
   C    2
3  A    6
   B    7
   C    8
dtype: int64

This is the correct method. There is not enough information to directly index above.

In [11]: x.reindex(y,level=0)
Out[11]: 
1  A    0
   B    1
   C    2
3  A    6
   B    7
   C    8
dtype: int64

This is not a bug, rather a nice doc example.

@jreback jreback added this to the Next Major Release milestone Nov 24, 2016
@owlas
Copy link
Author

owlas commented Nov 24, 2016

That's clear thank you. I think my ground assumption was that passing any iterable as an index to .loc would have the same behaviour.

Your second example is exactly what I was looking for. Many thanks for suggesting adding to docs too, hope it will help somebody else in future.

@jreback
Copy link
Contributor

jreback commented Nov 24, 2016

if u would like to do a pull request for the docs would be great

@owlas
Copy link
Author

owlas commented Nov 24, 2016

Will do. Would this belong in multiindex / advanced indexing ?

@jreback
Copy link
Contributor

jreback commented Nov 24, 2016

yes, I would make a note that illustrates the above example

@jorisvandenbossche
Copy link
Member

So which should be the answer?

@jreback why do you find this not obvious in case of a multi-index, but not a problem in case of single index?
Because there you would also get a different answer:

In [13]: index = pd.Index([1,2,3])
    ...: x1 = pd.Series(index=index, data=range(3))
    ...: y = pd.Series([1,3])
    ...: 

In [14]: x1.loc[y.tolist()]
Out[14]: 
1    0
3    2
dtype: int64

In [15]: x1.loc[y.index.tolist()]
Out[15]: 
0    NaN
1    0.0
dtype: float64

And you could ask the same question for reindex.

Our docs say that the argument to .loc can be "A list or array of labels". Although a Series is not strictly that, it is allowed for non-hierarchical indexed, and when interpreted as "array-like", I think using the values is the obvious choice. So IMO we could see this inconsistency also as a bug.

@owlas
Copy link
Author

owlas commented Nov 25, 2016

As far as I'm aware list/ndarray/Series are interchangeable everywhere else for indexing. That was what confused me in this example.

I found pd.IndexSlice useful for this problem in the end. Is that the recommended approach?

@jreback
Copy link
Contributor

jreback commented Nov 25, 2016

yeah I guess this could be a bug; should work same for Index and MultiIndex; we always do use the values (and not the index of the indexer).

@owlas
Copy link
Author

owlas commented May 19, 2017

A little late, just wanted to say thanks for getting this addressed so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants