Skip to content

BUG: inconsisten multi-level indexing when levels are dropped #10521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
feldman4 opened this issue Jul 6, 2015 · 5 comments · Fixed by #38150
Closed

BUG: inconsisten multi-level indexing when levels are dropped #10521

feldman4 opened this issue Jul 6, 2015 · 5 comments · Fixed by #38150
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@feldman4
Copy link

feldman4 commented Jul 6, 2015

DF.loc['A', :, 1] returns a DataFrame with the full MultiIndex if the second level has more than one entry, and a truncated index if the second level has only one entry (same as, e.g., DF.loc['A',0,1]). Is this the intended behavior?

@jreback
Copy link
Contributor

jreback commented Jul 6, 2015

pls show the frame u r indexing as well as pd.show_versions()

@feldman4
Copy link
Author

feldman4 commented Jul 6, 2015

Here is a simplified example. In the real scenario I am using a selector of the form [idx1, :, idx2] where the resulting DataFrame may have one or several values for the second level.

df = pd.DataFrame({'a': list('abcd'), 'b': ['a','b']*2, 'c': ['a']*4, 'd':0})
df = df.set_index(['a', 'b', 'c']).sortlevel()
print df
print df.loc['a',:,'a']

df_ = df[:1]
print df_
print df_.loc['a',:,'a']
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-504.23.4.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C

pandas: 0.16.2
nose: 1.3.7
Cython: 0.21
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.5.0
IPython: 3.2.0
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: 0.5.7
lxml: 3.4.0
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.7
pymysql: 0.6.6.None
psycopg2: None

@jreback
Copy link
Contributor

jreback commented Jul 7, 2015

This is hitting an 'older' piece of code that I haven't really touched (well afraid is more like it :)
that actually should be eliminated because it has results that might drop levels, when in fact that decision should be taken at a higher level.

The solution is to basically use slicers, which always give you back the original dimension (the full dims). These are like .xs but with consistent semantics.

Note that you MUST fully specify all dimensions (e.g. index & columns)

In [5]: df.loc[('a',slice(None),'a'),:]
Out[5]: 
       d
a b c   
a a a  0

In [4]: df_.loc[('a',slice(None),'a'),:]
Out[4]: 
       d
a b c   
a a a  0

I would actually write this like this, with the IndexSlice helper to make nicer look-and-feel.

In [8]: idx = pd.IndexSlice

In [9]: df_.loc[idx['a',:,'a'],:]
Out[9]: 
       d
a b c   
a a a  0

So I guess this is a bug. Want to dive in?

@jreback jreback changed the title multiindex : indexing BUG: inconsisten multi-level indexing when levels are dropped Jul 7, 2015
@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jul 7, 2015
@jreback jreback added this to the Next Major Release milestone Jul 7, 2015
@feldman4
Copy link
Author

feldman4 commented Jul 9, 2015

I'm happy with the slicer notation, though I hope others don't have to figure this out by backtracking mysterious behavior in their code.

@jreback
Copy link
Contributor

jreback commented Jul 9, 2015

yep, this is an older wart, which requires removing quite of bit of older code (and then fixing anything that the 'newer' way doesn't support). Not that hard, but requires some effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants