Skip to content

BUG: Bug in multi-index slicing with missing indexers (GH7866) #7867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 30, 2014

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jul 29, 2014

closes #7866

@jreback jreback added this to the 0.15.0 milestone Jul 29, 2014
@jorisvandenbossche
Copy link
Member

With a single index loc, you get a NaN for the missing indexer:

In [1]: s = pd.Series(range(3), index=['A','B','C'])

In [3]: s.loc[['A', 'D']]
Out[3]:
A     0
D   NaN
dtype: float64

To be consistent, shouldn't this also be the case here?
But of course, what do you do with the second level ...

In [7]: s.loc[['A','D']]
Out[7]:
one  two
A    bar    1
     baz    2
     foo    0
D    ?      NaN
dtype: int32

@jreback
Copy link
Contributor Author

jreback commented Jul 29, 2014

@jorisvandenbossche you raise a good point. I don't think this is desirable. Well it IS possible, but you would then I think have to add the 2nd level with ALL of the possible values. I am not sure that is desirable. (or use explicity tuple reindexing, which DOES specify the labels).

jreback added a commit that referenced this pull request Jul 30, 2014
BUG: Bug in multi-index slicing with missing indexers (GH7866)
@jreback jreback merged commit 0bd7abc into pandas-dev:master Jul 30, 2014
@jorisvandenbossche
Copy link
Member

late answer, you are right that the other possibility (include all labels of second level) is not really desirable, so no 'best' solution here. So OK for the merge.
Should this be documented somewhere?

@immerrr
Copy link
Contributor

immerrr commented Jul 31, 2014

With a single index loc, you get a NaN for the missing indexer

Oh, I must have missed the moment when it got introduced, when was it? Is it too late to voice an objection? :)

@jreback
Copy link
Contributor Author

jreback commented Jul 31, 2014

at least 0.12 iirc (note that is only for a list-like)

@immerrr
Copy link
Contributor

immerrr commented Jul 31, 2014

Indeed:

In [1]: s = pd.Series([1,2,3])

In [2]: s
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: s.loc[[1,2,3]]
Out[3]: 
1     2
2     3
3   NaN
dtype: float64

In [4]: s.loc[[1,2,3]] = 4
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-6b0a59de774f> in <module>()
----> 1 s.loc[[1,2,3]] = 4

/home/immerrr/sources/pandas/pandas/core/indexing.pyc in __setitem__(self, key, value)
    116                 indexer = self._convert_tuple(key, is_setter=True)
    117             else:
--> 118                 indexer = self._convert_to_indexer(key, is_setter=True)
    119 
    120         self._setitem_with_indexer(indexer, value)

/home/immerrr/sources/pandas/pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1085                     if isinstance(obj, tuple) and is_setter:
   1086                         return {'key': obj}
-> 1087                     raise KeyError('%s not in index' % objarr[mask])
   1088 
   1089                 return indexer

KeyError: '[3] not in index'

I'd expect it to behave the opposite: raise if trying to get a non-existing label (one has reindex for that), but allow adding it via setting a new value. That's a bummer.

@jreback
Copy link
Contributor Author

jreback commented Jul 31, 2014

well it's effectively a reindex with a list-like so I find this kind of nice

the setting restriction prevents accidental enlargement (though prob no objection to changing for consistency)

@immerrr
Copy link
Contributor

immerrr commented Jul 31, 2014

but reindex also supports list-likes, right? and it also works quite a tad faster, not being crumpled with a variety of indexing options.

accidental enlargement is a valid concern, however, perhaps loc could have a parameter, e.g. loc(resize=True)[[2,3,4]] to allow both cases and still maintain consistency...

@jreback
Copy link
Contributor Author

jreback commented Jul 31, 2014

I think the original motivation is that .loc/.ix by list is exactly reindex (I guess reindex is faster because its interprets more input but should not be dramatically so).

the setting is for the same reason, because you cannot assign to an lvalue

s.reindex([1,2,3]) = 5

but I think possibly allow this is ok

s.loc[[1,2,3]] = 5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/API: bug in multi-index slicing with missing indexers
3 participants