Skip to content

KeyError when retrieving columns with lists of incomplete tuples from MultiIndex #12369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Feb 17, 2016 · 4 comments
Closed

Comments

@toobaz
Copy link
Member

toobaz commented Feb 17, 2016

Sorry in advance if this was already reported - I searched a bit but didn't know exactly what to search.

In [2]: df = pd.DataFrame(0, index=range(1), columns=pd.MultiIndex.from_product([['a', 'b'], ['c']]))

In [3]: cols = [('a',), ('b',)]

In [4]: df[cols[0]]
Out[4]: 
   c
0  0

In [5]: df[cols[1]]
Out[5]: 
   c
0  0

In [6]: df[cols]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-957cf8a677a9> in <module>()
----> 1 df[cols]

/home/pietro/nobackup/repo/pandas/pandas/core/frame.pyc in __getitem__(self, key)
   1975         if isinstance(key, (Series, np.ndarray, Index, list)):
   1976             # either boolean or fancy integer index
-> 1977             return self._getitem_array(key)
   1978         elif isinstance(key, DataFrame):
   1979             return self._getitem_frame(key)

/home/pietro/nobackup/repo/pandas/pandas/core/frame.pyc in _getitem_array(self, key)
   2019             return self.take(indexer, axis=0, convert=False)
   2020         else:
-> 2021             indexer = self.ix._convert_to_indexer(key, axis=1)
   2022             return self.take(indexer, axis=1, convert=True)
   2023 

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1203                 mask = check == -1
   1204                 if mask.any():
-> 1205                     raise KeyError('%s not in index' % objarr[mask])
   1206 
   1207                 return _values_from_object(indexer)

KeyError: "[('a',) ('b',)] not in index"

Notice that df[[c[0] for c in cols]] works just fine... but still we probably want to consider as valid a list of valid keys.

@jreback
Copy link
Contributor

jreback commented Feb 17, 2016

you have to use multiindex slicers if you want partial selections

In [12]: df.loc[:,['a','b']]
Out[12]: 
   a  b
   c  c
0  0  0

[] is too ambiguous for things like this

@toobaz
Copy link
Member Author

toobaz commented Feb 17, 2016

(I just thought the above would be expected to work fine, but df[['a', 'b']] actually does the job too)

@toobaz
Copy link
Member Author

toobaz commented Feb 22, 2016

So this neither is a bug, right?

In [2]: df = pd.DataFrame(index=range(2), columns=pd.MultiIndex.from_product([[10,20], ['a', 'b']]))

In [3]: df[[20]]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-3-8422ebb3f356> in <module>()
----> 1 df[[20]]

/home/pietro/nobackup/repo/pandas/pandas/core/frame.pyc in __getitem__(self, key)
   1975         if isinstance(key, (Series, np.ndarray, Index, list)):
   1976             # either boolean or fancy integer index
-> 1977             return self._getitem_array(key)
   1978         elif isinstance(key, DataFrame):
   1979             return self._getitem_frame(key)

/home/pietro/nobackup/repo/pandas/pandas/core/frame.pyc in _getitem_array(self, key)
   2020         else:
   2021             indexer = self.ix._convert_to_indexer(key, axis=1)
-> 2022             return self.take(indexer, axis=1, convert=True)
   2023 
   2024     def _getitem_multilevel(self, key):

/home/pietro/nobackup/repo/pandas/pandas/core/generic.pyc in take(self, indices, axis, convert, is_copy)
   1590         new_data = self._data.take(indices,
   1591                                    axis=self._get_block_manager_axis(axis),
-> 1592                                    convert=True, verify=True)
   1593         result = self._constructor(new_data).__finalize__(self)
   1594 

/home/pietro/nobackup/repo/pandas/pandas/core/internals.pyc in take(self, indexer, axis, verify, convert)
   3617         n = self.shape[axis]
   3618         if convert:
-> 3619             indexer = maybe_convert_indices(indexer, n)
   3620 
   3621         if verify:

/home/pietro/nobackup/repo/pandas/pandas/core/indexing.pyc in maybe_convert_indices(indices, n)
   1803     mask = (indices >= n) | (indices < 0)
   1804     if mask.any():
-> 1805         raise IndexError("indices are out-of-bounds")
   1806     return indices
   1807 

IndexError: indices are out-of-bounds

(it is interpreting the "20" as index rather than label)
For instance the following "works" instead:

In [4]: df[[1]]
Out[4]: 
    10
     b
0  NaN
1  NaN

@jreback
Copy link
Contributor

jreback commented Feb 22, 2016

they both should prob raise KeyError

might be non trivial to fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants