Indexing a multi-index does not respect order of list indexer #10710

jorisvandenbossche · 2015-07-31T07:16:36Z

In [40]: cols = pd.MultiIndex.from_product([['A', 'B', 'C'],[1,2]])

In [41]: df = pd.DataFrame(np.random.randn(5,6), columns=cols)

In [42]: df.loc[:, ['B', 'A']]
Out[42]:
          A                   B
          1         2         1         2
0 -0.990868  1.577803  1.023247 -0.423165
1 -1.180170  1.236143 -1.484085  0.044929
2  1.665502 -0.711081  0.227827  0.651859
3 -0.659154  0.154327 -1.548650 -0.070550
4 -0.232819  0.100959 -0.102296  0.260816

The text was updated successfully, but these errors were encountered:

gdevanla · 2015-08-02T01:08:38Z

This line here calls the union() on the indexer object. The union() method in turn sorts the returned values.
https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L5447

Therefore, we end up with array([0,1,2,3]), whereas what we need is array([2,3,0,1])

One option would be to, gather the values from union() and re-arrange after this line to align with the provided order of keys.

toobaz · 2019-07-03T15:07:49Z

One subtlety I hadn't considered: df.loc[:, ['B', 'A']] could be interpreted in two ways:

df.loc[:, (['B', 'A'], slice(None))], that is, passing only a list of labels for the first level
df.loc[:, [('B', ), ('A', )]], that is, passing a list of partial indexers, that each happens to only specify the first level

The current behavior is not, strictly speaking, buggy: it just follows the first interpretation. And this is not surprising... as the second currently doesn't work!

I do think that users tend to expect the second. Moreover, if any user were expect the first, the second still wouldn't be too annoying (just a matter of reordering the keys passed).

But first... we should make df.loc[:, [('B', ), ('A', )]] work.

(In progress)

guidoavvisati · 2020-02-07T19:31:16Z

This was actually rather misleading for me, also considering pandas behaviour with single indexed columns. There, you can select a shuffled view by passing in a list with the correct order, i.e.
df[["C", "B", "A"]] would work.

In my case, I tried to combine this knowledge with the multi index, expecting the first use case as mentioned by @toobaz

import pandas as pd
import numpy as np
idx = pd.IndexSlice

cols = pd.MultiIndex.from_product([['A', 'B', 'C'],[1,2]])
df = pd.DataFrame(np.random.randn(5,6), columns=cols)
df.loc[:, idx[["C", "A"], 1]]

          A         C
          1         1
0 -1.496248  0.486082
1  1.745915  0.478067
2  0.024262 -0.494881
3 -0.151359  0.148930
4  1.205520  0.374346

and I was bitten rather badly by it, for obvious reasons

scottcode · 2020-05-01T16:13:17Z

I found this thread, because I too would like to specify an ordering and have it reflected in the result (interpretation 2 from @toobaz ).

mroeschke · 2021-04-18T22:34:53Z

Order seems to be respected now on master. Could use a test.

In [11]: In [40]: cols = pd.MultiIndex.from_product([['A', 'B', 'C'],[1,2]])
    ...:
    ...: In [41]: df = pd.DataFrame(np.random.randn(5,6), columns=cols)

In [12]:  df.loc[:, ['B', 'A']]
Out[12]:
          B                   A
          1         2         1         2
0  0.125019 -1.735490  2.021583 -0.028753
1  1.165933 -0.662543  0.277452  1.142789
2 -0.711480  0.494695 -0.568173  0.761984
3  0.115019  0.811708 -1.075778 -0.098008
4 -0.084524 -0.506630 -1.277899 -0.874620

In [13]: pd.__version__
Out[13]: '1.3.0.dev0+1352.g70435eba76.dirty'

guidoavvisati · 2021-04-24T11:55:02Z

I have tested this with pandas version 1.2.4 and I can confirm that it does indeed work.

jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jul 31, 2015

jorisvandenbossche mentioned this issue Apr 18, 2017

Subsetting dataframe w/ Multiindex does not preserve queried order #16038

Closed

jorisvandenbossche mentioned this issue Nov 20, 2017

Partial indexing does not respect key order/repetitions #18345

Closed

This was referenced Aug 20, 2019

BUG: Fixed groupby quantile for listlike q #27827

Merged

BUG: Fix groupby quantile array #28113

Merged

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Apr 18, 2021

mroeschke mentioned this issue May 9, 2021

TST: Add regression tests for old issues #41389

Merged

10 tasks

simonjayhawkins added this to the 1.3 milestone May 9, 2021

jreback closed this as completed in #41389 May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing a multi-index does not respect order of list indexer #10710

Indexing a multi-index does not respect order of list indexer #10710

jorisvandenbossche commented Jul 31, 2015

gdevanla commented Aug 2, 2015

toobaz commented Jul 3, 2019 •

edited

Loading

guidoavvisati commented Feb 7, 2020 •

edited

Loading

scottcode commented May 1, 2020

mroeschke commented Apr 18, 2021

guidoavvisati commented Apr 24, 2021

Indexing a multi-index does not respect order of list indexer #10710

Indexing a multi-index does not respect order of list indexer #10710

Comments

jorisvandenbossche commented Jul 31, 2015

gdevanla commented Aug 2, 2015

toobaz commented Jul 3, 2019 • edited Loading

guidoavvisati commented Feb 7, 2020 • edited Loading

scottcode commented May 1, 2020

mroeschke commented Apr 18, 2021

guidoavvisati commented Apr 24, 2021

toobaz commented Jul 3, 2019 •

edited

Loading

guidoavvisati commented Feb 7, 2020 •

edited

Loading