Skip to content

BUG: loc should not fallback for integer indexing for multi-index #5420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Nov 3, 2013 · 16 comments
Closed

BUG: loc should not fallback for integer indexing for multi-index #5420

jreback opened this issue Nov 3, 2013 · 16 comments
Labels
Bug Docs Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Nov 3, 2013

https://groups.google.com/forum/m/#!topic/pydata/W0e3l0UvNwI

@jtratner
Copy link
Contributor

jtratner commented Nov 3, 2013

Related from that discussion: iloc fails if you give it something out of range. loc should probably fail too if there are indices not included.

@jreback
Copy link
Contributor Author

jreback commented Nov 3, 2013

oh it does, but right now iirc it delegates multi index handling to the ix routines (which do fallback)
should be straightforward to fix

@jtratner
Copy link
Contributor

jtratner commented Nov 3, 2013

No it doesn't (that was part of what was confusing on ML):

In [3]: df = DataFrame({"A": [1, 2, 3]})

In [4]: df
Out[4]:
   A
0  1
1  2
2  3

In [6]: df.loc[['a', 'b']]
Out[6]:
    A
a NaN
b NaN

In [7]: df.loc['a']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-5dbae926782f> in <module>()
----> 1 df.loc['a']

../pandas/pandas/core/indexing.pyc in __getitem__(self, key)
    958             return self._getitem_tuple(key)
    959         else:
--> 960             return self._getitem_axis(key, axis=0)
    961
    962     def _getitem_axis(self, key, axis=0):

../pandas/pandas/core/indexing.pyc in _getitem_axis(self, key, axis)
   1067             return self._getitem_iterable(key, axis=axis)
   1068         else:
-> 1069             self._has_valid_type(key,axis)
   1070             return self._get_label(key, axis=axis)
   1071

../pandas/pandas/core/indexing.pyc in _has_valid_type(self, key, axis)
   1047                 raise
   1048             except:
-> 1049                 error()
   1050
   1051         return True

../pandas/pandas/core/indexing.pyc in error()
   1034                 if isnull(key):
   1035                     raise ValueError("cannot use label indexing with a null key")
-> 1036                 raise KeyError("the label [%s] is not in the [%s]" % (key,self.obj._get_axis_name(axis)))
   1037
   1038             try:

KeyError: 'the label [a] is not in the [index]'

@jreback
Copy link
Contributor Author

jreback commented Nov 3, 2013

your example is as expected

a single loc raises while a list does not

@jtratner
Copy link
Contributor

jtratner commented Nov 3, 2013

Okay, then we should update the docs to reflect that. It's weird that
df.iloc[[15, 20, 1000]] on a 16 element dataframe doesn't fail.

@jreback
Copy link
Contributor Author

jreback commented Nov 4, 2013

iloc will fail with an out-of-range, while loc won't (the reason loc will not fail as it other wise would have to scan the entire index to look for each element), its essentially a reindex, which does exactly that. You can make an argument both ways though.

In [1]: df = DataFrame(np.arange(20).reshape(20,1))

In [2]: df
Out[2]: 
     0
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10
11  11
12  12
13  13
14  14
15  15
16  16
17  17
18  18
19  19

In [4]: df.iloc[[1,3,4,60]]
IndexError: index 60 is out of bounds for size 20

In [5]: df.loc[[1,3,4,60]]
Out[5]: 
     0
1    1
3    3
4    4
60 NaN

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

It clearly has to scan the entire index for each element anyways, right?
Otherwise how could it know to produce nan values?

@jreback
Copy link
Contributor Author

jreback commented Nov 4, 2013

no its doing a hash lookup, get_indexer

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

why would a label not be in the hash if it's in the index?

@jreback
Copy link
Contributor Author

jreback commented Nov 4, 2013

not sure I understand the question?

it would be in the hash if it's in the index

but remember that's only calculated once

it doesn't scan when doing lookups

just hits the index for the locs

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

If I say, df.loc[["A", "B", "C"]], it has to do a hash lookup for each of "A", "B", "C". If it's not in the hash, then it's pretty obvious it's not in the index - right? So this would just be something like:

if (ind.get_indexer(<whatever>) == -1).any():
    raise KeyError(...)

@jreback
Copy link
Contributor Author

jreback commented Nov 4, 2013

not sure I understand the question

that is true but what's the point?

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

from the docs:

.loc is strictly label based, will raise KeyError when the items are not found

@jreback
Copy link
Contributor Author

jreback commented Nov 4, 2013

not in a list though (guess the docs need to be updated)

@jtratner
Copy link
Contributor

jtratner commented Nov 4, 2013

yeah, now we're on the same page

@jreback
Copy link
Contributor Author

jreback commented Jun 19, 2014

@jreback jreback closed this as completed Jun 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants