Wildcard for indexing with MultiIndex #371

lodagro · 2011-11-16T10:54:19Z

based on mailing list discussion, adding here for better traceability.

Is there a way to use wild cards when using MultiIndex?

As an example, let`s work with the following DataFrame:

In [10]: df
Out[10]:
         0        1       2       3       4
x  y  z -0.02688  1.236  -1.174  -1.026   1.508
a  b  c  0.9156   1.742   0.4483  1.005   0.3641
p  q  r  0.2019  -0.3895 -0.8484  0.9992 -0.2274

Selecting all rows where level 0 of the MultiIndex == 'a' is easy:

In [11]: df.ix['a']
Out[11]:
      0       1      2       3      4
b  c  0.9156  1.742  0.4483  1.005  0.3641

But what if i want to select all rows where level 3 of the MultiIndex == 'c', and i don`t care what the other levels are?
This is what can be done:

In [12]: df.swaplevel(0, 2).swaplevel(1, 2).ix['c']
Out[12]:
      0       1      2       3      4
a  b  0.9156  1.742  0.4483  1.005  0.3641

Wes idea is that this merits a new API function or adding to an existing one.
e.g. to modify DataFrame.xs to work like:

df.xs('c', axis=0, level=2)

Other idea is to do something something similar as in numpy, and use : as wildcard.
For example in numpy:

In [7]: y = np.arange(27).reshape(3,3,3)

In [8]: y

Out[8]: array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [9]: y[:, :, 0]

Out[9]: array([[ 0,  3,  6],
       [ 9, 12, 15],
       [18, 21, 24]])

y[:, :, 0] indexes all elements with "level 2" == 0.

This would translate to df.ix[:, :, 'c'] on the first example.
Another example on a DataFrames with MultIndex with a lot of levels and selecting on more than one level: df.ix[:, :, 'c', :, 10]
This approach looks familiar, and no other api method is needed for indexing when using wildcard.

wesm · 2011-11-25T04:40:46Z

from @jseabold:

You could also do

df[df.index.get_level_values(2)=='c']

Though it's not exactly the same as yours. It feels more natural to
me. I keep wanting to just slice on a MultiIndex, but I get the
impression this would be non-trivial to implement ie.,

df[df.index[:,2]=='c']

>> Are there wildcards, something like df.ix[WILDCARD, WILDCARD, 'c'] maybe?
>>
>>
>>
>
> I think this merits a new API function or adding to an existing one.
> e.g. I could modify DataFrame.xs to work like:
>
> df.xs('c', axis=0, level=2)
>
> any thoughts?
>

xs is where I first looked for this functionality. I think this API is
good for my use case (MultiIndex panel data).

wesm · 2012-01-16T05:14:59Z

I added a level parameter to xs. I will think about what other syntactic conveniences might look like, but this should ease the pain for now

wesm · 2012-01-16T05:15:32Z

So this looks like


In [2]: df
Out[2]: 
       0        1       2       3       4     
x y z -0.02688  1.2360 -1.1740 -1.0260  1.5080
a b c  0.91560  1.7420  0.4483  1.0050  0.3641
p q r  0.20190 -0.3895 -0.8484  0.9992 -0.2274

In [3]: df.xs('c', level=2)
Out[3]: 
     0       1      2       3      4     
a b  0.9156  1.742  0.4483  1.005  0.3641

lodagro · 2012-01-16T07:57:15Z

Thanks.

level=0 has an issue in the example below

Selecting on multiple levels can be done with chaining xs (but tricky on the level number of course, key/label referencing could help here).

In [7]: df
Out[7]:
                      A       B       C       D        E
one two three   four
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
a   q   20      4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30      3    -0.6662 -0.5243 -0.3580  0.89145  2.5838

In [8]: df.xs('a', level=0)
Out[8]:
Empty DataFrame
Columns: Index([A, B, C, D, E], dtype=object)
Index: MultiIndex([], dtype=object)

In [9]: df.xs('a')
Out[9]:
                  A       B      C       D        E
two three   four
b   10.0032 5    -0.5109 -2.336 -0.4645  0.05076  0.3640
q   20      4     0.4473  1.415  0.2834  1.00661  0.1744

In [10]: df.xs('q', level=1)
Out[10]:
                A       B       C       D       E
one three four
a   20    4     0.4473  1.4152  0.2834  1.0066  0.1744
x   30    3    -0.6662 -0.5243 -0.3580  0.8915  2.5838

In [11]: df.xs('q', level=1).xs(4, level=2)
Out[11]:
           A       B      C       D      E
one three
a   20     0.4473  1.415  0.2834  1.007  0.1744

In [12]:

wesm · 2012-01-16T21:39:25Z

OK I fixed the bug you described. I also modified the parsers so that this works:

In [1]: df = read_clipboard(sep='\s+')

In [2]: df
Out[2]: 
                      A       B       C       D        E     
one two three   four                                         
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
a   q   20.0    4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30.0    3    -0.6662 -0.5243 -0.3580  0.89145  2.5838

I'm gonna have a quick look to see if multi-level xs is low-hanging fruit, otherwise I'm going to put this one to bed for a while

…Index.droplevel, GH #371

wesm · 2012-01-16T22:00:56Z

Look at me go:

In [1]: df = read_clipboard(sep='\s+')

In [2]: df
Out[2]: 
                      A       B       C       D        E     
one two three   four                                         
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
a   q   20.0    4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30.0    3    -0.6662 -0.5243 -0.3580  0.89145  2.5838

In [3]: df.xs(('a', 4), level=['one', 'four'])
Out[3]: 
           A       B      C       D      E     
two three                                      
q   20.0   0.4473  1.415  0.2834  1.007  0.1744

lodagro · 2012-01-17T07:49:03Z

You do really go :-)

Excellent!

* master: (313 commits) TST: more Python 2.5 sadness TST: Python 2.5 float formatting changed TST: cast to i8 when checking margins BUG: DataFrame.join on keys produce wrong result, does not preserve order DOC: release notes ENH: xs level can take multiple levels, pass multiple levels to MultiIndex.droplevel, GH pandas-dev#371 BUG: fix bugs related to comments in pandas-dev#371 BUG: fix TextParser with list buglet, enable parsing of DataFrame output with index names BUG: convert tuples in concat to MultiIndex BUG: don't lose index names when adding row margin ENH: add margins to crosstab ENH: add crosstab function and test ENH: crosstab prototype function, API needs fleshing out, GH pandas-dev#170 BUG: fix buglet with xs with level, GH pandas-dev#371 TST: add test_sql.py module TST: testing, cleanup of io.sql module TST: indexing testing with minor Series.__getitem__ refactoring ENH: hack toward pandas-dev#629 BUG: check for non-contiguous memory in SeriesGrouper, causing segfault ENH: add ability to pass list of dicts to DataFrame.append (GH pandas-dev#464) ...

Expose more calls in the BSONStore

wesm added a commit that referenced this issue Jan 16, 2012

BUG: fix buglet with xs with level, GH #371

9bc20d6

wesm closed this as completed Jan 16, 2012

wesm reopened this Jan 16, 2012

wesm added a commit that referenced this issue Jan 16, 2012

BUG: fix bugs related to comments in #371

a4b0a2e

wesm added a commit that referenced this issue Jan 16, 2012

ENH: xs level can take multiple levels, pass multiple levels to Multi…

304433a

…Index.droplevel, GH #371

wesm closed this as completed Jan 16, 2012

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019

Merge pull request pandas-dev#371 from manahl/feature/extend_bsonstore

96e574c

Expose more calls in the BSONStore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wildcard for indexing with MultiIndex #371

Wildcard for indexing with MultiIndex #371

lodagro commented Nov 16, 2011

wesm commented Nov 25, 2011

wesm commented Jan 16, 2012

wesm commented Jan 16, 2012

lodagro commented Jan 16, 2012

wesm commented Jan 16, 2012

wesm commented Jan 16, 2012

lodagro commented Jan 17, 2012

Wildcard for indexing with MultiIndex #371

Wildcard for indexing with MultiIndex #371

Comments

lodagro commented Nov 16, 2011

wesm commented Nov 25, 2011

wesm commented Jan 16, 2012

wesm commented Jan 16, 2012

lodagro commented Jan 16, 2012

wesm commented Jan 16, 2012

wesm commented Jan 16, 2012

lodagro commented Jan 17, 2012