Skip to content

Commit af7bdd3

Browse files
max-sixtyjreback
authored andcommitted
DOC: update multiindex sort docs
closes pandas-dev#13112 closes pandas-dev#13108
1 parent 1500336 commit af7bdd3

File tree

1 file changed

+32
-41
lines changed

1 file changed

+32
-41
lines changed

doc/source/advanced.rst

+32-41
Original file line numberDiff line numberDiff line change
@@ -296,10 +296,6 @@ As usual, **both sides** of the slicers are included as this is label indexing.
296296
297297
df.loc[(slice('A1','A3'),.....)]
298298
299-
.. warning::
300-
301-
You will need to make sure that the selection axes are fully lexsorted!
302-
303299
.. ipython:: python
304300
305301
def mklbl(prefix,n):
@@ -477,31 +473,24 @@ allowing you to permute the hierarchical index levels in one step:
477473
478474
df[:5].reorder_levels([1,0], axis=0)
479475
480-
The need for sortedness with :class:`~pandas.MultiIndex`
481-
--------------------------------------------------------
476+
Sorting a :class:`~pandas.MultiIndex`
477+
-------------------------------------
482478

483-
**Caveat emptor**: the present implementation of ``MultiIndex`` requires that
484-
the labels be sorted for some of the slicing / indexing routines to work
485-
correctly. You can think about breaking the axis into unique groups, where at
486-
the hierarchical level of interest, each distinct group shares a label, but no
487-
two have the same label. However, the ``MultiIndex`` does not enforce this:
488-
**you are responsible for ensuring that things are properly sorted**. There is
489-
an important new method ``sort_index`` to sort an axis within a ``MultiIndex``
490-
so that its labels are grouped and sorted by the original ordering of the
491-
associated factor at that level. Note that this does not necessarily mean the
492-
labels will be sorted lexicographically!
479+
For MultiIndex-ed objects to be indexed & sliced effectively, they need
480+
to be sorted. As with any index, you can use ``sort_index``.
493481

494482
.. ipython:: python
495483
496484
import random; random.shuffle(tuples)
497485
s = pd.Series(np.random.randn(8), index=pd.MultiIndex.from_tuples(tuples))
498486
s
487+
s.sort_index()
499488
s.sort_index(level=0)
500489
s.sort_index(level=1)
501490
502491
.. _advanced.sortlevel_byname:
503492

504-
Note, you may also pass a level name to ``sort_index`` if the MultiIndex levels
493+
You may also pass a level name to ``sort_index`` if the MultiIndex levels
505494
are named.
506495

507496
.. ipython:: python
@@ -510,46 +499,48 @@ are named.
510499
s.sort_index(level='L1')
511500
s.sort_index(level='L2')
512501
513-
Some indexing will work even if the data are not sorted, but will be rather
514-
inefficient and will also return a copy of the data rather than a view:
515-
516-
.. ipython:: python
517-
518-
s['qux']
519-
s.sort_index(level=1)['qux']
520-
521502
On higher dimensional objects, you can sort any of the other axes by level if
522503
they have a MultiIndex:
523504

524505
.. ipython:: python
525506
526507
df.T.sort_index(level=1, axis=1)
527508
528-
The ``MultiIndex`` object has code to **explicitly check the sort depth**. Thus,
529-
if you try to index at a depth at which the index is not sorted, it will raise
530-
an exception. Here is a concrete example to illustrate this:
509+
Indexing will work even if the data are not sorted, but will be rather
510+
inefficient (and show a ``PerformanceWarning``). It will also
511+
return a copy of the data rather than a view:
531512

532513
.. ipython:: python
533514
534-
tuples = [('a', 'a'), ('a', 'b'), ('b', 'a'), ('b', 'b')]
535-
idx = pd.MultiIndex.from_tuples(tuples)
536-
idx.lexsort_depth
515+
dfm = pd.DataFrame({'jim': [0, 0, 1, 1],
516+
'joe': ['x', 'x', 'z', 'y'],
517+
'jolie': np.random.rand(4)})
518+
dfm = dfm.set_index(['jim', 'joe'])
519+
dfm
520+
521+
.. code-block:: ipython
522+
523+
In [4]: dfm.loc[(1, 'z')]
524+
PerformanceWarning: indexing past lexsort depth may impact performance.
537525
538-
reordered = idx[[1, 0, 3, 2]]
539-
reordered.lexsort_depth
526+
Out[4]:
527+
jolie
528+
jim joe
529+
1 z 0.64094
540530
541-
s = pd.Series(np.random.randn(4), index=reordered)
542-
s.ix['a':'a']
531+
The ``is_lexsorted()`` method on an ``Index`` show if the index is sorted, and the ``lexsort_depth`` property returns the sort depth:
543532

544-
However:
533+
.. ipython:: python
545534
546-
::
535+
dfm.index.is_lexsorted()
536+
dfm.index.lexsort_depth
547537
548-
>>> s.ix[('a', 'b'):('b', 'a')]
549-
Traceback (most recent call last)
550-
...
551-
KeyError: Key length (3) was greater than MultiIndex lexsort depth (2)
538+
.. ipython:: python
552539
540+
dfm = dfm.sort_index()
541+
dfm
542+
dfm.index.is_lexsorted()
543+
dfm.index.lexsort_depth
553544
554545
Take Methods
555546
------------

0 commit comments

Comments
 (0)