Skip to content

Add an option not to sort levels in MultiIndex.from_product? #14672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shoyer opened this issue Nov 16, 2016 · 6 comments
Open

Add an option not to sort levels in MultiIndex.from_product? #14672

shoyer opened this issue Nov 16, 2016 · 6 comments

Comments

@shoyer
Copy link
Member

shoyer commented Nov 16, 2016

Currently, from_product always sorts levels in the resulting MultiIndex. This means that the result does not necessarily have lexsorted labels/codes.

PR #14062 adds an option to not sort levels when calling from_product. Compare:

In [4]: pd.MultiIndex.from_product([['a', 'b'], [2, 1, 0]], sort_levels=False)
Out[4]:
MultiIndex(levels=[['a', 'b'], [2, 1, 0]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
           sortorder=0)

In [5]: pd.MultiIndex.from_product([['a', 'b'], [2, 1, 0]], sort_levels=True)
Out[5]:
MultiIndex(levels=[['a', 'b'], [0, 1, 2]],
           labels=[[0, 0, 0, 1, 1, 1], [2, 1, 0, 2, 1, 0]])

Using this option yields a few benefits:

  1. It's simpler -- resulting levels on the MultiIndex are exactly those you passed in.
  2. It's marginally faster -- you don't need to sort the levels.
  3. The resulting MultiIndex is always lex-sorted. This is handy if you want to be able to index it efficiently.

The downside is that the result can be a little less intuitive, because levels and labels do not have the same sort order (#14015).

I'm suggesting this option because it was useful for xarray (to fix pydata/xarray#980) and might also be relevant for other advanced users.

@piotrjurkiewicz
Copy link

Any progress on this?

@jreback
Copy link
Contributor

jreback commented Dec 30, 2018

@piotrjurkiewicz pandas is community supported; you are welcome to submit a patch

@ivasve
Copy link

ivasve commented Jul 6, 2022

There hasn't been any change here since 2016. I suggest closing this issue for the following reason: I believe this proposed change should not be done, as it would violate the main Mathematical logic of MultiIndex.from_product. It is a Cartesian product based on sets which are Ordered. Of course, I might be wrong, but here is one citation as an example: "A Cartesian Product is defined on an ordered set of Sets." https://www.sciencedirect.com/topics/computer-science/cartesian-product

@kuraga
Copy link

kuraga commented Jul 6, 2022

@ivasve but ordered != sorted.

@ivasve
Copy link

ivasve commented Jul 6, 2022

@kuraga Could you describe your thoughts in more detail, please? What part exactly are you referring to and why. Maybe I am missing something, I just need more info :-) Thanks.

@kuraga
Copy link

kuraga commented Jul 6, 2022

@ivasve , with sort=False result's index will be ordered (there will be first index, second index, ..., last index). But its order will be undefined, so it won't be lexigraphical, etc.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants