Skip to content

Indexing a multi-index does not respect order of list indexer #10710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jul 31, 2015 · 6 comments · Fixed by #41389
Closed

Indexing a multi-index does not respect order of list indexer #10710

jorisvandenbossche opened this issue Jul 31, 2015 · 6 comments · Fixed by #41389
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jorisvandenbossche
Copy link
Member

In [40]: cols = pd.MultiIndex.from_product([['A', 'B', 'C'],[1,2]])

In [41]: df = pd.DataFrame(np.random.randn(5,6), columns=cols)

In [42]: df.loc[:, ['B', 'A']]
Out[42]:
          A                   B
          1         2         1         2
0 -0.990868  1.577803  1.023247 -0.423165
1 -1.180170  1.236143 -1.484085  0.044929
2  1.665502 -0.711081  0.227827  0.651859
3 -0.659154  0.154327 -1.548650 -0.070550
4 -0.232819  0.100959 -0.102296  0.260816
@jorisvandenbossche jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jul 31, 2015
@gdevanla
Copy link

gdevanla commented Aug 2, 2015

This line here calls the union() on the indexer object. The union() method in turn sorts the returned values.
https://github.com/pydata/pandas/blob/master/pandas/core/index.py#L5447

Therefore, we end up with array([0,1,2,3]), whereas what we need is array([2,3,0,1])

One option would be to, gather the values from union() and re-arrange after this line to align with the provided order of keys.

@toobaz
Copy link
Member

toobaz commented Jul 3, 2019

One subtlety I hadn't considered: df.loc[:, ['B', 'A']] could be interpreted in two ways:

  • df.loc[:, (['B', 'A'], slice(None))], that is, passing only a list of labels for the first level
  • df.loc[:, [('B', ), ('A', )]], that is, passing a list of partial indexers, that each happens to only specify the first level

The current behavior is not, strictly speaking, buggy: it just follows the first interpretation. And this is not surprising... as the second currently doesn't work!

I do think that users tend to expect the second. Moreover, if any user were expect the first, the second still wouldn't be too annoying (just a matter of reordering the keys passed).

But first... we should make df.loc[:, [('B', ), ('A', )]] work.

(In progress)

@guidoavvisati
Copy link

guidoavvisati commented Feb 7, 2020

This was actually rather misleading for me, also considering pandas behaviour with single indexed columns. There, you can select a shuffled view by passing in a list with the correct order, i.e.
df[["C", "B", "A"]] would work.

In my case, I tried to combine this knowledge with the multi index, expecting the first use case as mentioned by @toobaz

import pandas as pd
import numpy as np
idx = pd.IndexSlice

cols = pd.MultiIndex.from_product([['A', 'B', 'C'],[1,2]])
df = pd.DataFrame(np.random.randn(5,6), columns=cols)
df.loc[:, idx[["C", "A"], 1]]

          A         C
          1         1
0 -1.496248  0.486082
1  1.745915  0.478067
2  0.024262 -0.494881
3 -0.151359  0.148930
4  1.205520  0.374346

and I was bitten rather badly by it, for obvious reasons

@scottcode
Copy link

I found this thread, because I too would like to specify an ordering and have it reflected in the result (interpretation 2 from @toobaz ).

@mroeschke
Copy link
Member

Order seems to be respected now on master. Could use a test.

In [11]: In [40]: cols = pd.MultiIndex.from_product([['A', 'B', 'C'],[1,2]])
    ...:
    ...: In [41]: df = pd.DataFrame(np.random.randn(5,6), columns=cols)

In [12]:  df.loc[:, ['B', 'A']]
Out[12]:
          B                   A
          1         2         1         2
0  0.125019 -1.735490  2.021583 -0.028753
1  1.165933 -0.662543  0.277452  1.142789
2 -0.711480  0.494695 -0.568173  0.761984
3  0.115019  0.811708 -1.075778 -0.098008
4 -0.084524 -0.506630 -1.277899 -0.874620

In [13]: pd.__version__
Out[13]: '1.3.0.dev0+1352.g70435eba76.dirty'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Apr 18, 2021
@guidoavvisati
Copy link

I have tested this with pandas version 1.2.4 and I can confirm that it does indeed work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants