Skip to content

BUG: Series.xs() inconsistent with DataFrame.xs() with MultiIndex #5684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Dec 12, 2013 · 4 comments · Fixed by #6259
Closed

BUG: Series.xs() inconsistent with DataFrame.xs() with MultiIndex #5684

TomAugspurger opened this issue Dec 12, 2013 · 4 comments · Fixed by #6259
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@TomAugspurger
Copy link
Contributor

Edited to clarify the bug

Series.xs slice fails with string index labels and MultiIndex:

In [1]: idx = pd.MultiIndex.from_tuples([('a', 'one'), ('a', 'two'), ('b', 'one'), ('b', 'two')])

In [2]: df = pd.Series(np.random.randn(4), index=idx)

In [4]: df.index.set_names(['L1', 'L2'], inplace=True)

In [5]: df
Out[5]: 
L1  L2 
a   one   -0.136418
    two   -0.346941
b   one   -1.468534
    two    1.217693
dtype: float64

In [6]: df.xs('one', level='L2')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-52601adf5184> in <module>()
----> 1 df.xs('one', level='L2')

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in _xs(self, key, axis, level, copy)
    437 
    438     def _xs(self, key, axis=0, level=None, copy=True):
--> 439         return self.__getitem__(key)
    440 
    441     xs = _xs

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in __getitem__(self, key)
    482     def __getitem__(self, key):
    483         try:
--> 484             return self.index.get_value(self, key)
    485         except InvalidIndexError:
    486             pass

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/index.pyc in get_value(self, series, key)
   2294                     raise InvalidIndexError(key)
   2295                 else:
-> 2296                     raise e1
   2297             except Exception:  # pragma: no cover
   2298                 raise e1

KeyError: 'one'

The same slice works on a DataFrame.

Previous post below:

In [12]: idx = pd.MultiIndex.from_tuples([('a', 0), ('a', 1), ('b', 0), ('b', 1)])

In [13]: df = pd.Series(np.random.randn(4), index=idx)

In [14]: df
Out[14]: 
a  0    0.876121
   1    0.638050
b  0    0.965934
   1    1.061716
dtype: float64

In [15]: df.xs(0, level=1)   # returns scaler
Out[15]: 0.87612104445620753

In [16]: df.index.names = ['L1', 'L2']

In [27]: df.xs(0, level='L2')   # returns scaler
Out[27]: -0.98585685847339011

In [28]: df.xs(0, level='L1')   # No key error
Out[28]: -0.98585685847339011

Works for DataFrames:

In [30]: df.xs(0, level='L2')
Out[30]: 
           0
L1          
a  -0.985857
b   0.648114

[2 rows x 1 columns]

Series.xs also seems to fail on string index labels?

In [50]: idx = pd.MultiIndex.from_tuples([('a', 'one'), ('a', 'two'), ('b', 'one'), ('b', 'two')])
In [53]: df = pd.Series(np.random.randn(4), index=idx)
In [56]: df.xs('one', level='L2')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-56-52601adf5184> in <module>()
----> 1 df.xs('one', level='L2')

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in _xs(self, key, axis, level, copy)
    437 
    438     def _xs(self, key, axis=0, level=None, copy=True):
--> 439         return self.__getitem__(key)
    440 
    441     xs = _xs

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in __getitem__(self, key)
    482     def __getitem__(self, key):
    483         try:
--> 484             return self.index.get_value(self, key)
    485         except InvalidIndexError:
    486             pass

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/index.pyc in get_value(self, series, key)
   2294                     raise InvalidIndexError(key)
   2295                 else:
-> 2296                     raise e1
   2297             except Exception:  # pragma: no cover
   2298                 raise e1

KeyError: 'one'

So I guess this is about 3 errors on Series.xs (possibly related?):

  1. Returning scalers when it should return a Series when the label is an integer
  2. Not raising key errors when the label is an integer
  3. Failing on slices for level>1 when the label is a string.

EDIT: Oh, and I know that .loc / .ix will work for these. I was just surprised by the results.

@jreback
Copy link
Contributor

jreback commented Dec 12, 2013

xs has some of the ix like behavior where it will interpret an integer as a positional argument if it doesn't find the label. So showing your examples:

In [1]: idx = pd.MultiIndex.from_tuples([('a', 0), ('a', 1), ('b', 0), ('b', 1)])

In [2]: s = pd.Series(np.random.randn(4), index=idx)

In [3]: s
Out[3]: 
a  0    0.789393
   1   -0.554426
b  0    1.986913
   1    0.501087
dtype: float64

This is correct (though prob not what you want)

In [4]: s.xs(0,level=1)
Out[4]: 0.78939286728150548

You should set names like this, FYI

In [7]: s.index.set_names(['L1','L2'],inplace=True)

In [8]: s
Out[8]: 
L1  L2
a   0     0.789393
    1    -0.554426
b   0     1.986913
    1     0.501087
dtype: float64

(9) is correct

In [9]: s.xs(0,level='L2')
Out[9]: 0.78939286728150548

ix like behavior

In [10]: s.xs(0,level='L1')
Out[10]: 0.78939286728150548

correct

In [11]: s.xs('a',level='L1')
Out[11]: 
L2
0     0.789393
1    -0.554426
dtype: float64

Your last example
``df.xs('one',level='L2')` doesn't have the level L2 defined?

@TomAugspurger
Copy link
Contributor Author

For the last one, I missed a copy paste:

In [65]: df
Out[65]: 
L1  L2 
a   one    1.041627
    two   -0.553980
b   one   -1.491640
    two   -0.026511
dtype: float64

In [66]: df.xs('one', level='L2')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-66-52601adf5184> in <module>()
----> 1 df.xs('one', level='L2')

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in _xs(self, key, axis, level, copy)
    437 
    438     def _xs(self, key, axis=0, level=None, copy=True):
--> 439         return self.__getitem__(key)
    440 
    441     xs = _xs

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/series.pyc in __getitem__(self, key)
    482     def __getitem__(self, key):
    483         try:
--> 484             return self.index.get_value(self, key)
    485         except InvalidIndexError:
    486             pass

/Users/tom/Envs/pandas-dev/lib/python2.7/site-packages/pandas-0.13.0rc1_27_g4d5ca5c-py2.7-macosx-10.8-x86_64.egg/pandas/core/index.pyc in get_value(self, series, key)
   2294                     raise InvalidIndexError(key)
   2295                 else:
-> 2296                     raise e1
   2297             except Exception:  # pragma: no cover
   2298                 raise e1

KeyError: 'one'

And I guess the most surprising thing here is the inconsistency across DataFrames and Series:

In [69]: df = pd.DataFrame(df)

In [70]: df.xs('one', level='L2')
Out[70]: 
           0
L1          
a   1.041627
b  -1.491640

[2 rows x 1 columns]

Sorry this is such a mess of an issue. I think the last one is the only real bug then? I'll edit my post/title to reflect that, if it's actually a bug.

@jreback
Copy link
Contributor

jreback commented Dec 12, 2013

yep...last one is a bug then. xs is implemented separately for Series/DataFrame so I am not suprised (they really should be integrated....)

@jreback
Copy link
Contributor

jreback commented Dec 12, 2013

@TomAugspurger maybe just edit the Top section to reflect the bug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants