Skip to content

Indexing MultiIndex level depends on chosen level #12827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
toobaz opened this issue Apr 8, 2016 · 18 comments
Closed

Indexing MultiIndex level depends on chosen level #12827

toobaz opened this issue Apr 8, 2016 · 18 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@toobaz
Copy link
Member

toobaz commented Apr 8, 2016

Code Sample, a copy-pastable example if possible

In [2]: ss = pd.Series(range(4),
   ...:                index=pd.MultiIndex.from_product([[1,2], [1,2]]))

In [3]: ss.loc[:, 1].index.__class__
Out[3]: pandas.indexes.numeric.Int64Index

In [4]: ss.loc[1,:].index.__class__
Out[4]: pandas.indexes.multi.MultiIndex

Expected Output

The first: the level I am indexing on should disappear from the index.

This is related to http://stackoverflow.com/q/36459311/2858145 (I think)

output of pd.show_versions()

In [5]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8

pandas: 0.18.0+88.g5e11243.dirty
nose: 1.3.6
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.2
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+755fa81
xarray: None
IPython: 2.4.1
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.0rc2
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None
@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

This might actually not be related to the SO question: I tested on 0.14.1, and it behaves the same (while the SO question refers to a change from 0.15.2).

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

By ONLY passing a selector on the first level you are getting a de-leveling.

In [7]: ss.loc[1]
Out[7]: 
1    0
2    1
dtype: int64

In [8]: ss.loc[1, :]
Out[8]: 
1  1    0
   2    1
dtype: int64

@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

Sure, but is there a justification for the incoherent behaviour?
It gets worse when you have more than two levels. Think of a MultiIndex with three levels rather than two, and ss.loc[1,:,1] where you want a single level and you get three instead (two of which are useless).

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

how is this 'incorherent'? its as expected, you are getting exactly what you asked for.

@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

If ss.index has three levels,

ss.loc[1,1,:] has three levels.
ss.loc[1,:,1] has three levels.
ss.loc[:,1,1] has one level.

I find this quite unexpected. Compare with a three-dimensional numpy array: c[1,1,:] has one dimension, precisely as c[:,1,1].
(or compare with Panels, or even just DataFrames...)

As far as I can tell, the behaviour is also useless and (certainly to me) annoying: there is no way having two levels which I know have each only one value (because I explicitly indexed it) is useful.

Oh, and by the way (but I am not sure it is the same problem).

ss.ix[:,1,1] has one level
ss.ix[1,1] has one level
ss.ix[1,1,:] raises TypeError

In general, in Python, wouldn't you expect indexing with : to coincide with not indexing on that dimension?!

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

try the same with [1] and see, you are explicity asking it to drop levels

@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

I know how to get what I need. I'm just saying there is an inconsistency. Which also hinders writing more abstract code.

For instance, if I write the function

def filter_on_multiindex_level(obj, level, value):
    """Filter on multiindex level with value value"""
    indexer = [value if i == level else slice(None) for i in range(len(obj.index.levels))]
    return obj.loc[indexer]

... I will get differently shaped indices depending on the value of level.

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

pls read the documentation, the point is it is supposed to drop levels.

@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

May I ask you a pointer? I did look at http://pandas.pydata.org/pandas-docs/stable/advanced.html before complaining.

The point is precisely I want to drop levels.

(by the way: you don't need to get too abstract: how do you get ss.loc[1].loc[:,1] with a single .loc call?!)

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

not really sure what you want.
This indexing is exactly the same as a 'regular' index, except that it will try to drop levels if you give it a scalar. here i am not.

In [13]: ss.loc[[1],[1]]
Out[13]: 
1  1    0
dtype: int64
``

@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

I want ss.loc[1,...] (replace the dots with any kind of indexers) to also drop levels (where scalars are given, so including the first level, since a scalar is given).

In other words, I want the number of levels returned to be "initial_number_of_levels - number_of_scalars_passed" independently from the first indexer being or not a scalar.

@jreback
Copy link
Contributor

jreback commented Apr 8, 2016

@toobaz I have no idea what you want. I have given you the tools to look thru things.

@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

Funny... seems so obvious to me :-)
(conceptually, not talking about implementation)
Will try to be more clear in mailing list.

@joshuamosesb
Copy link

@jreback & @toobaz I've added a comment to original post SO query to clarify the real inconsistency:

  • print loc_sub seems to be ok,
  • but the resulting value for loc_bub.index seems to return the original data-frame's index, instead of the sliced subset's index (as its printed)

@toobaz
Copy link
Member Author

toobaz commented Apr 8, 2016

@joshuamosesb I have answered there... this bug is actually unrelated

@toobaz
Copy link
Member Author

toobaz commented Nov 4, 2016

After multiple times I incurred in this bug, I happened to notice that my example in the above comment could be misinterpreted (it refers to the three-levels example, not the initially given two-levels one)... so since I bet this bug will be reconsidered sooner or later, let me add a clearer example, for the records.

Take

ss = pd.Series(range(8), index=pd.MultiIndex.from_product([[1,2], [3,4], [5,6]]))

With the current syntax, it is impossible to get the result of ss.loc[1].loc[:,5] with a single .loc. This is because ss.loc[1,:,5] will have a tree-levels MultiIndex instead.

Compare for instance with ss.loc[:,3].loc[:,5], which you get simply as ss.loc[:,3,5] (while if you want to keep the indices, you can just do ss.loc[:,[3],[5]]).

See this "discussion" for a more complete argument.

@toobaz
Copy link
Member Author

toobaz commented Feb 12, 2018

(fully indexed) DataFrames exhibit the problem on levels other than 1 - and this leads to further inconsistency:

In [2]: ss = pd.Series(range(4),
   ...:                 index=pd.MultiIndex.from_product([[1,2], [1,2]]))
   ...:                 

In [3]: df = pd.DataFrame({10 : ss, 11 : ss})

In [4]: ss.loc[:, 1].index.__class__
Out[4]: pandas.core.indexes.numeric.Int64Index

In [5]: df.loc[(slice(None), 1), 10].index.__class__
Out[5]: pandas.core.indexes.multi.MultiIndex

In [6]: df[10].loc[(slice(None), 1)].index.__class__
Out[6]: pandas.core.indexes.numeric.Int64Index

(the above cannot be tested without specifying the columns because of #16396)

@mroeschke
Copy link
Member

The OP code example in both calls now return an Index so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

6 participants