Skip to content

Unexpected difference between loc and ix #11840

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ben519 opened this issue Dec 14, 2015 · 7 comments
Closed

Unexpected difference between loc and ix #11840

ben519 opened this issue Dec 14, 2015 · 7 comments
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question

Comments

@ben519
Copy link

ben519 commented Dec 14, 2015

See my StackOverflow post for details. I've come to the conclusion that this is a bug -

import pandas as pd

# Create a dataframe
df = pd.DataFrame({'id':[10,9,5,6,8], 'x1':[10.0,12.3,13.4,11.9,7.6], 'x2':['a','a','b','c','c']})
df.set_index('id', inplace=True)
df.loc[[10, 9, 7]] # 7 does not exist in the index so a NaN row is returned
df.loc[[7]] # KeyError: 'None of [[7]] are in the [index]'
df.ix[[7]] # 7 does not exist in the index so a NaN row is returned

I think df.loc[[7]] should not return an error, but should return a dataframe with 1 row for id=7 with NaN values.

@jreback
Copy link
Contributor

jreback commented Dec 14, 2015

this is by definition of .loc it is strict for this very reason.

.ix returns based on the type of the index.

@jreback jreback closed this as completed Dec 14, 2015
@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Usage Question labels Dec 14, 2015
@ben519
Copy link
Author

ben519 commented Dec 14, 2015

@jreback it's confusing to me that df.loc[[7]] would return an error yet df.loc[[10, 9, 7]] would return three rows (i.e. including one for id=7 when no such id exists). Is there any reason for this or mention of it in the docs?

@jorisvandenbossche
Copy link
Member

@ben519 It is stated here in the docs about loc: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label ("At least 1 of the labels for which you ask, must be in the index or a KeyError will be raised").
I don't think the different behaviour of ix is documented somewhere. And you certainly right that some of the indexing rules are quite confusing ..

@jorisvandenbossche
Copy link
Member

@jreback What do you exactly mean with:

.ix returns based on the type of the index.

That it is label vs location based depending on the type of the index? But in this case (integer index), it is label based. Do you know if this difference in behaviour for ix vs loc for this case (reindex or raise) is somewhere discussed or documented?
It seems that ix will always have 'reindex' behaviour and does not check if the labels are in the index (apart from the cases where it falls back .. :-))

@jreback
Copy link
Contributor

jreback commented Dec 15, 2015

list-based re-indexing (whether they are interpreted as labels or positional) is the original purpose of .ix / .loc. So these are de-facto equivalent of .reindex. The single label raising case was added to provide user convenience.

My point about the label/positional indexing is simply to not use .ix except in a specific case where you need simultaneous multi-axis index with some positional and some label based.

@ben519
Copy link
Author

ben519 commented Dec 15, 2015

Copied from my SO comment...

@jreback
What's so confusing is that df.loc[[7,8,9]] actually returns a row for id=7, but df.loc[[7]] does not. I would have expected them to either both error, or for df.loc[[7,8,9]] to not return a row for id=7. Nonetheless, I'm glad to know this isn't a bug. Thanks for your help.

@max-sixty
Copy link
Contributor

xref: #10695

Agree it's confusing! Although multiple issues at play here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants