Skip to content

BUG: MultiIndex Slice returns incorrect slice (compared to .xs()) #27591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
groutr opened this issue Jul 25, 2019 · 1 comment · Fixed by #42329
Closed

BUG: MultiIndex Slice returns incorrect slice (compared to .xs()) #27591

groutr opened this issue Jul 25, 2019 · 1 comment · Fixed by #42329
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@groutr
Copy link
Contributor

groutr commented Jul 25, 2019

Code Sample, a copy-pastable example if possible

import pandas as pd
l1 = list('abc')
l2 = [(0, 1), (1, 0)]
l3 = [0, 1]
cols = pd.MultiIndex.from_product([l1, l2, l3], names=['x', 'y', 'z'])
df = pd.DataFrame(index=range(5), columns=cols)

# This returns the expected slice
expected = df.xs((l1[0], l2[0], l3[0]), level=[0, 1, 2], axis=1)
print(expected)

# This returns an empty Series (?!)
empty1 = df.loc[:, (l1[0], l2[0], l3[0])]
# This returns an empty Series also
empty2 = df.loc[:, pd.IndexSlice[l1[0], l2[0], l3[0]]]
print(empty1)
print(empty2)

# if we omit the l2 dim on slice, we get something back
try3 = df.loc[:, pd.IndexSlice[l1[0], :, l3[0]]]
print(try3)

# Oddly enough, we can still set values on the dataframe and it works as expected.
df.loc[:, (l1[0], l2[0], l3[0])] = list(range(10, 15))
print(df)

Problem description

As far as I can understand the, all three indexing operations should provide the same result. Slicing seems to be the favored approach in the documentation, yet it doesn't return the expected result in this case.

Wrapping the tuple in a python object doesn't seem to work either, returning a KeyError.

class MyTuple(object):
    def __init__(self, data):
        self.data = data
    def __eq__(self, other):
        return self.data == other.data
    def __hash__(self):
        return hash(self.data)
    def __repr__(self):
        return repr(self.data)
    def __str__(self):
        return repr(self)

l22 = [MyTuple((0,1)), MyTuple((1, 0))]
df2 = pd.DataFrame(index=range(5), columns=pd.MultiIndex.from_product([l1, l22, l3]))
df.loc[:, pd.IndexSlice[l1[0], l22[0], l3[0]]]

Expected Output

The expected result is a single series with 5 NaN values:

       a                                                                                                                                                          
  (0, 1)                                                                                                                                                          
       0                                                                                                                                                          
0    NaN                                                                                                                                                          
1    NaN                                                                                                                                                          
2    NaN                                                                                                                                                          
3    NaN                                                                                                                                                          
4    NaN

This is the value of expected in the example above.

Output of pd.show_versions()

Note that pandas 0.24.2 is being used because compatibility with Python 2 is needed.

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-696.10.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.3.1
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: 1.7.0
bottleneck: 1.2.1
tables: None
numexpr: None
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.3.1
pymysql: 0.9.3
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jul 26, 2019
@jbrockmendel
Copy link
Member

Best guess ATM is that this is treating the tuple in the second level as a sequence of labels so is going through the MultiIndex.get_locs path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants