ERR: reindexed non-included labels on a multiindex are dropped #7886

jreback · 2014-07-31T11:32:38Z

related #4088
related #7867

I think this should raise as this is not clear how this should work (e.g. should you get all the other levels set to nan?)

or see my comment below, maybe just document?

In [11]: s = pd.Series(np.arange(9),index=pd.MultiIndex.from_product([['A','B','C'],['foo','bar','baz']],names=['one','two'])).sortlevel()

In [12]: s
Out[12]: 
one  two
A    bar    1
     baz    2
     foo    0
B    bar    4
     baz    5
     foo    3
C    bar    7
     baz    8
     foo    6
dtype: int64

In [13]: s.reindex(['A','B','D'],level=0)
Out[13]: 
one  two
A    bar    1
     baz    2
     foo    0
B    bar    4
     baz    5
     foo    3
dtype: int64

The text was updated successfully, but these errors were encountered:

jreback · 2014-07-31T11:32:44Z

cc @immerrr

immerrr · 2014-07-31T13:06:55Z

FWIW, I agree that it's better to refuse temptation to guess and raise in such cases.

cpcloud · 2014-07-31T15:43:31Z

me three, this is a weird operation

jreback · 2014-07-31T15:49:45Z

though by virtue of fixing #7866

s.loc[['A','B','D']] gives the same result

even though these are basically the same type of operation, conceptually

.loc is selection, while reindex is making an index the same

so maybe just doc this?

immerrr · 2014-07-31T16:31:16Z

I think incomplete missing keys must raise in any case: there's not enough information to insert new labels and there's no data to be retrieved with those (well, save for searchsorted-like lookups)

For loc there's a chance to provide missing keys in full: s.loc[['A', 'B', ('D', 0)]]. That also looks weird, but not unthinkable, IMO, for reindex there's just no such possibility.

jreback · 2014-07-31T16:37:26Z

hmm, so @immerrr do you disagree with #7866 then?

immerrr · 2014-07-31T16:38:27Z

As of now, yes, I don't see anything broken there to be fixed.

jreback · 2014-07-31T16:40:49Z

Here was the original motivation: http://stackoverflow.com/questions/25006197/multiple-key-cross-sections-in-pandas-dealing-with-misses-and-duplicate-indices/25014301#25014301

do you see any other way to do this?

immerrr · 2014-07-31T16:51:13Z

I'd go for boolean mask, like you proposed there. To me that sounds closer to the problem definition: find all rows where A is one of the following.

In fact, it would probably be nice to have a level= kwarg in Index.isin method in unified API for both saving some keystrokes and optimization potential (as in look up levels once and match labels afterwards).

jreback · 2014-07-31T16:53:17Z

hmm, I like that idea for isin. want to create and issue? pull-request welcome for that as well.

immerrr · 2014-07-31T16:53:41Z

Or maybe even Index.isin({'A': set(['foo', 'bar']), 'B': set(['baz', 'qux'])})

jreback · 2014-07-31T16:55:14Z

Index.isin([ set(['foo', 'bar']), set(['baz', 'qux'] ], level=['A','B'])

is more consistent (with how we use set_name/levels/labels). I mean could accept the dict, but should do that separately

immerrr · 2014-07-31T17:00:24Z

Yup, that last one is probably me getting too carried away with syntax sugar.

jreback · 2014-07-31T17:03:41Z

ok, if you'd open a new issue (for isin enhancement would be great. I'll close this one.

have to think about reverting #7866 though (I agree its a bit of a stretch, but it IS convient)

immerrr · 2014-07-31T17:03:58Z

Ok, done

immerrr · 2014-07-31T17:30:47Z

Speaking of not enough information, I remembered that there's some kind of "variable length" multiindex emulation with empty string keys:

In [70]: df
Out[70]: 
    a   b
    1   1
0   0   3
1   6   9
2  12  15
3  18  21
4  24  27

In [71]: df.loc[:, ('c','')] = 100.

In [72]: df
Out[72]: 
    a   b    c
    1   1     
0   0   3  100
1   6   9  100
2  12  15  100
3  18  21  100
4  24  27  100

In [73]: df['c']
Out[73]: 
0    100
1    100
2    100
3    100
4    100
Name: c, dtype: float64

I'm not sure how it works across the library, though, it was so slow that we didn't even consider it. But I suppose it can be made to work nicely (e.g. change empty string to nan/nat to include numeric and datetime indices, optimize here and there) and then incomplete missing keys would be ok.

jreback · 2014-07-31T17:36:59Z

I think this is actually a candidate for adding to a MultiIndex (maybe via an attribute or something).

Separate issue though.

jreback added this to the 0.15.0 milestone Jul 31, 2014

jreback added Indexing labels Jul 31, 2014

jreback added the Docs label Jul 31, 2014

jreback closed this as completed Jul 31, 2014

immerrr mentioned this issue Jul 31, 2014

API: add 'level' kwarg for Index.isin method #7890

Closed

jreback mentioned this issue Aug 1, 2014

MultiIndex reindex should behave like Index. #7895

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR: reindexed non-included labels on a multiindex are dropped #7886

ERR: reindexed non-included labels on a multiindex are dropped #7886

jreback commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

cpcloud commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

ERR: reindexed non-included labels on a multiindex are dropped #7886

ERR: reindexed non-included labels on a multiindex are dropped #7886

Comments

jreback commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

cpcloud commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014

immerrr commented Jul 31, 2014

immerrr commented Jul 31, 2014

jreback commented Jul 31, 2014