Indexing MultiIndex level depends on chosen level #12827

toobaz · 2016-04-08T13:45:29Z

Code Sample, a copy-pastable example if possible

In [2]: ss = pd.Series(range(4),
   ...:                index=pd.MultiIndex.from_product([[1,2], [1,2]]))

In [3]: ss.loc[:, 1].index.__class__
Out[3]: pandas.indexes.numeric.Int64Index

In [4]: ss.loc[1,:].index.__class__
Out[4]: pandas.indexes.multi.MultiIndex

Expected Output

The first: the level I am indexing on should disappear from the index.

This is related to http://stackoverflow.com/q/36459311/2858145 (I think)

output of `pd.show_versions()`

In [5]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-1-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: it_IT.utf8

pandas: 0.18.0+88.g5e11243.dirty
nose: 1.3.6
pip: 1.5.6
setuptools: 18.4
Cython: 0.23.2
numpy: 1.10.4
scipy: 0.16.0
statsmodels: 0.8.0.dev0+755fa81
xarray: None
IPython: 2.4.1
sphinx: 1.3.1
patsy: 0.3.0-dev
dateutil: 2.2
pytz: 2012c
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.0rc2
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.4.0
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

toobaz · 2016-04-08T13:50:07Z

This might actually not be related to the SO question: I tested on 0.14.1, and it behaves the same (while the SO question refers to a change from 0.15.2).

jreback · 2016-04-08T14:18:36Z

By ONLY passing a selector on the first level you are getting a de-leveling.

In [7]: ss.loc[1]
Out[7]: 
1    0
2    1
dtype: int64

In [8]: ss.loc[1, :]
Out[8]: 
1  1    0
   2    1
dtype: int64

toobaz · 2016-04-08T14:30:22Z

Sure, but is there a justification for the incoherent behaviour?
It gets worse when you have more than two levels. Think of a MultiIndex with three levels rather than two, and ss.loc[1,:,1] where you want a single level and you get three instead (two of which are useless).

jreback · 2016-04-08T14:32:29Z

how is this 'incorherent'? its as expected, you are getting exactly what you asked for.

toobaz · 2016-04-08T15:14:59Z

If ss.index has three levels,

ss.loc[1,1,:] has three levels.
ss.loc[1,:,1] has three levels.
ss.loc[:,1,1] has one level.

I find this quite unexpected. Compare with a three-dimensional numpy array: c[1,1,:] has one dimension, precisely as c[:,1,1].
(or compare with Panels, or even just DataFrames...)

As far as I can tell, the behaviour is also useless and (certainly to me) annoying: there is no way having two levels which I know have each only one value (because I explicitly indexed it) is useful.

Oh, and by the way (but I am not sure it is the same problem).

ss.ix[:,1,1] has one level
ss.ix[1,1] has one level
ss.ix[1,1,:] raises TypeError

In general, in Python, wouldn't you expect indexing with : to coincide with not indexing on that dimension?!

jreback · 2016-04-08T15:16:47Z

try the same with [1] and see, you are explicity asking it to drop levels

toobaz · 2016-04-08T15:33:27Z

I know how to get what I need. I'm just saying there is an inconsistency. Which also hinders writing more abstract code.

For instance, if I write the function

def filter_on_multiindex_level(obj, level, value):
    """Filter on multiindex level with value value"""
    indexer = [value if i == level else slice(None) for i in range(len(obj.index.levels))]
    return obj.loc[indexer]

... I will get differently shaped indices depending on the value of level.

jreback · 2016-04-08T15:34:26Z

pls read the documentation, the point is it is supposed to drop levels.

toobaz · 2016-04-08T15:40:04Z

May I ask you a pointer? I did look at http://pandas.pydata.org/pandas-docs/stable/advanced.html before complaining.

The point is precisely I want to drop levels.

(by the way: you don't need to get too abstract: how do you get ss.loc[1].loc[:,1] with a single .loc call?!)

jreback · 2016-04-08T15:51:13Z

not really sure what you want.
This indexing is exactly the same as a 'regular' index, except that it will try to drop levels if you give it a scalar. here i am not.

In [13]: ss.loc[[1],[1]]
Out[13]: 
1  1    0
dtype: int64
``

toobaz · 2016-04-08T15:59:06Z

I want ss.loc[1,...] (replace the dots with any kind of indexers) to also drop levels (where scalars are given, so including the first level, since a scalar is given).

In other words, I want the number of levels returned to be "initial_number_of_levels - number_of_scalars_passed" independently from the first indexer being or not a scalar.

jreback · 2016-04-08T16:09:17Z

@toobaz I have no idea what you want. I have given you the tools to look thru things.

toobaz · 2016-04-08T16:13:52Z

Funny... seems so obvious to me :-)
(conceptually, not talking about implementation)
Will try to be more clear in mailing list.

joshuamosesb · 2016-04-08T20:42:28Z

@jreback & @toobaz I've added a comment to original post SO query to clarify the real inconsistency:

print loc_sub seems to be ok,
but the resulting value for loc_bub.index seems to return the original data-frame's index, instead of the sliced subset's index (as its printed)

toobaz · 2016-04-08T21:15:01Z

@joshuamosesb I have answered there... this bug is actually unrelated

toobaz · 2016-11-04T18:33:05Z

After multiple times I incurred in this bug, I happened to notice that my example in the above comment could be misinterpreted (it refers to the three-levels example, not the initially given two-levels one)... so since I bet this bug will be reconsidered sooner or later, let me add a clearer example, for the records.

Take

ss = pd.Series(range(8), index=pd.MultiIndex.from_product([[1,2], [3,4], [5,6]]))

With the current syntax, it is impossible to get the result of ss.loc[1].loc[:,5] with a single .loc. This is because ss.loc[1,:,5] will have a tree-levels MultiIndex instead.

Compare for instance with ss.loc[:,3].loc[:,5], which you get simply as ss.loc[:,3,5] (while if you want to keep the indices, you can just do ss.loc[:,[3],[5]]).

See this "discussion" for a more complete argument.

toobaz · 2018-02-12T11:01:58Z

(fully indexed) DataFrames exhibit the problem on levels other than 1 - and this leads to further inconsistency:

In [2]: ss = pd.Series(range(4),
   ...:                 index=pd.MultiIndex.from_product([[1,2], [1,2]]))
   ...:                 

In [3]: df = pd.DataFrame({10 : ss, 11 : ss})

In [4]: ss.loc[:, 1].index.__class__
Out[4]: pandas.core.indexes.numeric.Int64Index

In [5]: df.loc[(slice(None), 1), 10].index.__class__
Out[5]: pandas.core.indexes.multi.MultiIndex

In [6]: df[10].loc[(slice(None), 1)].index.__class__
Out[6]: pandas.core.indexes.numeric.Int64Index

(the above cannot be tested without specifying the columns because of #16396)

mroeschke · 2024-09-12T21:44:59Z

The OP code example in both calls now return an Index so closing

jreback closed this as completed Apr 8, 2016

jreback added Usage Question MultiIndex labels Apr 8, 2016

toobaz mentioned this issue Jul 9, 2016

KeyError when indexing the MultiIndex of a DataFrame with ":" as first field #13597

Closed

toobaz mentioned this issue Dec 4, 2017

.loc with (scalar, slice(None)) on MultiIndex does not drop first level #18631

Open

jorisvandenbossche reopened this Dec 6, 2017

mroeschke removed the Usage Question label Oct 9, 2019

jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Sep 22, 2020

toobaz mentioned this issue Oct 13, 2020

Partial Selection on MultiIndex: The need for empty slice support & dict indexing #4036

Closed

toobaz mentioned this issue Nov 14, 2020

Setting value of DataFrame[MultiIndex] via .loc partial indexing fails #22493

Open

mroeschke added the Bug label Apr 23, 2021

jbrockmendel mentioned this issue Jun 30, 2021

BUG: not dropping scalar-indexes MultiIndex levels #42319

Merged

4 tasks

mroeschke closed this as completed Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing MultiIndex level depends on chosen level #12827

Indexing MultiIndex level depends on chosen level #12827

toobaz commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

joshuamosesb commented Apr 8, 2016

toobaz commented Apr 8, 2016

toobaz commented Nov 4, 2016 •

edited

Loading

toobaz commented Feb 12, 2018 •

edited

Loading

mroeschke commented Sep 12, 2024

Indexing MultiIndex level depends on chosen level #12827

Indexing MultiIndex level depends on chosen level #12827

Comments

toobaz commented Apr 8, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

jreback commented Apr 8, 2016

toobaz commented Apr 8, 2016

joshuamosesb commented Apr 8, 2016

toobaz commented Apr 8, 2016

toobaz commented Nov 4, 2016 • edited Loading

toobaz commented Feb 12, 2018 • edited Loading

mroeschke commented Sep 12, 2024

output of `pd.show_versions()`

toobaz commented Nov 4, 2016 •

edited

Loading

toobaz commented Feb 12, 2018 •

edited

Loading