Skip to content

DOC: MultiIndex sort docs #13108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
max-sixty opened this issue May 7, 2016 · 5 comments
Closed

DOC: MultiIndex sort docs #13108

max-sixty opened this issue May 7, 2016 · 5 comments

Comments

@max-sixty
Copy link
Contributor

max-sixty commented May 7, 2016

I found this confusing, despite being a moderately competent pandas user:

from http://pandas.pydata.org/pandas-docs/stable/advanced.html#the-need-for-sortedness-with-multiindex

Caveat emptor: the present implementation of MultiIndex requires that the labels be sorted for some of the slicing / indexing routines to work correctly. You can think about breaking the axis into unique groups, where at the hierarchical level of interest, each distinct group shares a label, but no two have the same label. However, the MultiIndex does not enforce this: you are responsible for ensuring that things are properly sorted. There is an important new method sort_index to sort an axis within a MultiIndex so that its labels are grouped and sorted by the original ordering of the associated factor at that level. Note that this does not necessarily mean the labels will be sorted lexicographically!

Is this right, that calling sort_index doesn't guarantee lex sortedness? How to guarantee it then?

And this:

Some indexing will work even if the data are not sorted, but will be rather inefficient and will also return a copy of the data rather than a view:

...seems to contradict this:

Thus, if you try to index at a depth at which the index is not sorted, it will raise an exception.

...neither of which seems tightly consistent with the passage above.

Am I misunderstanding something?

@jreback
Copy link
Contributor

jreback commented May 7, 2016

Thus, if you try to index at a depth at which the index is not sorted, it will raise an exception.
This has been somewhat mitigated recently (we will now show a PerformanceWarning when you do this. So doc's could be amended.

.sort_index() does guaranteed that we can index propely. So you could certainly simpify this. The key point is that its the USERS's reponsiblity for this

@max-sixty
Copy link
Contributor Author

max-sixty commented May 7, 2016

Can this whole section be simplified to:

  • For MultiIndex-ed objects to be indexed & sliced effectively, they need to be sorted.
  • .sort_index() sorts the index
  • Where they're not sorted, pandas will show a PerformanceWarning*

...and potentially keep a couple of the examples as examples of sorting MultiIndexed dfs

  • is this correct - there's no KeyError?

@jreback
Copy link
Contributor

jreback commented May 7, 2016

yes that would be a nice improvement

iirc we have an example somewhere of a Perdormancewarning in the docs but not sure where

@max-sixty
Copy link
Contributor Author

OK PR-ing.

What's the policy on when we remove version warnings? This is probably not needed / over emphasized:

Warning In 0.15.0 Index has internally been refactored to no longer sub-class ndarray but instead subclass PandasObject, similarly to the rest of the pandas objects. This should be a transparent change with only very limited API implications (See the Internal Refactoring)

@jreback
Copy link
Contributor

jreback commented May 7, 2016

yeah u can leave I guess once in s while I take those types of things out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants