Skip to content

Wildcard for indexing with MultiIndex #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lodagro opened this issue Nov 16, 2011 · 7 comments
Closed

Wildcard for indexing with MultiIndex #371

lodagro opened this issue Nov 16, 2011 · 7 comments
Milestone

Comments

@lodagro
Copy link
Contributor

lodagro commented Nov 16, 2011

based on mailing list discussion, adding here for better traceability.

Is there a way to use wild cards when using MultiIndex?

As an example, let`s work with the following DataFrame:

In [10]: df
Out[10]:
         0        1       2       3       4
x  y  z -0.02688  1.236  -1.174  -1.026   1.508
a  b  c  0.9156   1.742   0.4483  1.005   0.3641
p  q  r  0.2019  -0.3895 -0.8484  0.9992 -0.2274

Selecting all rows where level 0 of the MultiIndex == 'a' is easy:

In [11]: df.ix['a']
Out[11]:
      0       1      2       3      4
b  c  0.9156  1.742  0.4483  1.005  0.3641

But what if i want to select all rows where level 3 of the MultiIndex == 'c', and i don`t care what the other levels are?
This is what can be done:

In [12]: df.swaplevel(0, 2).swaplevel(1, 2).ix['c']
Out[12]:
      0       1      2       3      4
a  b  0.9156  1.742  0.4483  1.005  0.3641

Wes idea is that this merits a new API function or adding to an existing one.
e.g. to modify DataFrame.xs to work like:

df.xs('c', axis=0, level=2)

Other idea is to do something something similar as in numpy, and use : as wildcard.
For example in numpy:

In [7]: y = np.arange(27).reshape(3,3,3)

In [8]: y

Out[8]: array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [9]: y[:, :, 0]

Out[9]: array([[ 0,  3,  6],
       [ 9, 12, 15],
       [18, 21, 24]])

y[:, :, 0] indexes all elements with "level 2" == 0.

This would translate to df.ix[:, :, 'c'] on the first example.
Another example on a DataFrames with MultIndex with a lot of levels and selecting on more than one level: df.ix[:, :, 'c', :, 10]
This approach looks familiar, and no other api method is needed for indexing when using wildcard.

@wesm
Copy link
Member

wesm commented Nov 25, 2011

from @jseabold:

You could also do

df[df.index.get_level_values(2)=='c']

Though it's not exactly the same as yours. It feels more natural to
me. I keep wanting to just slice on a MultiIndex, but I get the
impression this would be non-trivial to implement ie.,

df[df.index[:,2]=='c']

>> Are there wildcards, something like df.ix[WILDCARD, WILDCARD, 'c'] maybe?
>>
>>
>>
>
> I think this merits a new API function or adding to an existing one.
> e.g. I could modify DataFrame.xs to work like:
>
> df.xs('c', axis=0, level=2)
>
> any thoughts?
>

xs is where I first looked for this functionality. I think this API is
good for my use case (MultiIndex panel data).

@wesm
Copy link
Member

wesm commented Jan 16, 2012

I added a level parameter to xs. I will think about what other syntactic conveniences might look like, but this should ease the pain for now

@wesm wesm closed this as completed Jan 16, 2012
@wesm
Copy link
Member

wesm commented Jan 16, 2012

So this looks like


In [2]: df
Out[2]: 
       0        1       2       3       4     
x y z -0.02688  1.2360 -1.1740 -1.0260  1.5080
a b c  0.91560  1.7420  0.4483  1.0050  0.3641
p q r  0.20190 -0.3895 -0.8484  0.9992 -0.2274

In [3]: df.xs('c', level=2)
Out[3]: 
     0       1      2       3      4     
a b  0.9156  1.742  0.4483  1.005  0.3641

@lodagro
Copy link
Contributor Author

lodagro commented Jan 16, 2012

Thanks.

level=0 has an issue in the example below

Selecting on multiple levels can be done with chaining xs (but tricky on the level number of course, key/label referencing could help here).

In [7]: df
Out[7]:
                      A       B       C       D        E
one two three   four
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
a   q   20      4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30      3    -0.6662 -0.5243 -0.3580  0.89145  2.5838

In [8]: df.xs('a', level=0)
Out[8]:
Empty DataFrame
Columns: Index([A, B, C, D, E], dtype=object)
Index: MultiIndex([], dtype=object)

In [9]: df.xs('a')
Out[9]:
                  A       B      C       D        E
two three   four
b   10.0032 5    -0.5109 -2.336 -0.4645  0.05076  0.3640
q   20      4     0.4473  1.415  0.2834  1.00661  0.1744

In [10]: df.xs('q', level=1)
Out[10]:
                A       B       C       D       E
one three four
a   20    4     0.4473  1.4152  0.2834  1.0066  0.1744
x   30    3    -0.6662 -0.5243 -0.3580  0.8915  2.5838

In [11]: df.xs('q', level=1).xs(4, level=2)
Out[11]:
           A       B      C       D      E
one three
a   20     0.4473  1.415  0.2834  1.007  0.1744

In [12]:                                                             

@wesm wesm reopened this Jan 16, 2012
wesm added a commit that referenced this issue Jan 16, 2012
@wesm
Copy link
Member

wesm commented Jan 16, 2012

OK I fixed the bug you described. I also modified the parsers so that this works:

In [1]: df = read_clipboard(sep='\s+')

In [2]: df
Out[2]: 
                      A       B       C       D        E     
one two three   four                                         
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
a   q   20.0    4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30.0    3    -0.6662 -0.5243 -0.3580  0.89145  2.5838

I'm gonna have a quick look to see if multi-level xs is low-hanging fruit, otherwise I'm going to put this one to bed for a while

wesm added a commit that referenced this issue Jan 16, 2012
@wesm
Copy link
Member

wesm commented Jan 16, 2012

Look at me go:

In [1]: df = read_clipboard(sep='\s+')

In [2]: df
Out[2]: 
                      A       B       C       D        E     
one two three   four                                         
a   b   10.0032 5    -0.5109 -2.3358 -0.4645  0.05076  0.3640
a   q   20.0    4     0.4473  1.4152  0.2834  1.00661  0.1744
x   q   30.0    3    -0.6662 -0.5243 -0.3580  0.89145  2.5838

In [3]: df.xs(('a', 4), level=['one', 'four'])
Out[3]: 
           A       B      C       D      E     
two three                                      
q   20.0   0.4473  1.415  0.2834  1.007  0.1744

@wesm wesm closed this as completed Jan 16, 2012
@lodagro
Copy link
Contributor Author

lodagro commented Jan 17, 2012

You do really go :-)

Excellent!

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jan 19, 2012
* master: (313 commits)
  TST: more Python 2.5 sadness
  TST: Python 2.5 float formatting changed
  TST: cast to i8 when checking margins
  BUG: DataFrame.join on keys produce wrong result, does not preserve order
  DOC: release notes
  ENH: xs level can take multiple levels, pass multiple levels to MultiIndex.droplevel, GH pandas-dev#371
  BUG: fix bugs related to comments in pandas-dev#371
  BUG: fix TextParser with list buglet, enable parsing of DataFrame output with index names
  BUG: convert tuples in concat to MultiIndex
  BUG: don't lose index names when adding row margin
  ENH: add margins to crosstab
  ENH: add crosstab function and test
  ENH: crosstab prototype function, API needs fleshing out, GH pandas-dev#170
  BUG: fix buglet with xs with level, GH pandas-dev#371
  TST: add test_sql.py module
  TST: testing, cleanup of io.sql module
  TST: indexing testing with minor Series.__getitem__ refactoring
  ENH: hack toward pandas-dev#629
  BUG: check for non-contiguous memory in SeriesGrouper, causing segfault
  ENH: add ability to pass list of dicts to DataFrame.append (GH pandas-dev#464)
  ...
dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants