diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst index ca903dadc6eb1..c455fbb8d0687 100644 --- a/doc/source/advanced.rst +++ b/doc/source/advanced.rst @@ -113,7 +113,13 @@ of the index is up to you: pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6]) We've "sparsified" the higher levels of the indexes to make the console output a -bit easier on the eyes. +bit easier on the eyes. Note that how the index is displayed can be controlled using the +``multi_sparse`` option in ``pandas.set_options()``: + +.. ipython:: python + + with pd.option_context('display.multi_sparse', False): + df It's worth keeping in mind that there's nothing preventing you from using tuples as atomic labels on an axis: @@ -129,15 +135,6 @@ can find yourself working with hierarchically-indexed data without creating a ``MultiIndex`` explicitly yourself. However, when loading data from a file, you may wish to generate your own ``MultiIndex`` when preparing the data set. -Note that how the index is displayed by be controlled using the -``multi_sparse`` option in ``pandas.set_options()``: - -.. ipython:: python - - pd.set_option('display.multi_sparse', False) - df - pd.set_option('display.multi_sparse', True) - .. _advanced.get_level_values: Reconstructing the level labels @@ -180,14 +177,13 @@ For example: .. ipython:: python -   # original MultiIndex -   df.columns +   df.columns # original MultiIndex - # sliced - df[['foo','qux']].columns + df[['foo','qux']].columns # sliced This is done to avoid a recomputation of the levels in order to make slicing -highly performant. If you want to see the actual used levels. +highly performant. If you want to see only the used levels, you can use the +:func:`MultiIndex.get_level_values` method. .. ipython:: python @@ -196,7 +192,7 @@ highly performant. If you want to see the actual used levels. # for a specific level df[['foo','qux']].columns.get_level_values(0) -To reconstruct the ``MultiIndex`` with only the used levels, the +To reconstruct the ``MultiIndex`` with only the used levels, the ``remove_unused_levels`` method may be used. .. versionadded:: 0.20.0 @@ -231,15 +227,33 @@ Advanced indexing with hierarchical index ----------------------------------------- Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc`` is a -bit challenging, but we've made every effort to do so. For example the -following works as you would expect: +bit challenging, but we've made every effort to do so. In general, MultiIndex +keys take the form of tuples. For example, the following works as you would expect: .. ipython:: python df = df.T df - df.loc['bar'] - df.loc['bar', 'two'] + df.loc[('bar', 'two'),] + +Note that ``df.loc['bar', 'two']`` would also work in this example, but this shorthand +notation can lead to ambiguity in general. + +If you also want to index a specific column with ``.loc``, you must use a tuple +like this: + +.. ipython:: python + + df.loc[('bar', 'two'), 'A'] + +You don't have to specify all levels of the ``MultiIndex`` by passing only the +first elements of the tuple. For example, you can use "partial" indexing to +get all elements with ``bar`` in the first level as follows: + +df.loc['bar'] + +This is a shortcut for the slightly more verbose notation ``df.loc[('bar',),]`` (equivalent +to ``df.loc['bar',]`` in this example). "Partial" slicing also works quite nicely. @@ -260,6 +274,24 @@ Passing a list of labels or tuples works similar to reindexing: df.loc[[('bar', 'two'), ('qux', 'one')]] +.. info:: + + It is important to note that tuples and lists are not treated identically + in pandas when it comes to indexing. Whereas a tuple is interpreted as one + multi-level key, a list is used to specify several keys. Or in other words, + tuples go horizontally (traversing levels), lists go vertically (scanning levels). + +Importantly, a list of tuples indexes several complete ``MultiIndex`` keys, +whereas a tuple of lists refer to several values within a level: + +.. ipython:: python + + s = pd.Series([1, 2, 3, 4, 5, 6], + index=pd.MultiIndex.from_product([["A", "B"], ["c", "d", "e"]])) + s.loc[[("A", "c"), ("B", "d")]] # list of tuples + s.loc[(["A", "B"], ["c", "d"])] # tuple of lists + + .. _advanced.mi_slicers: Using slicers @@ -317,7 +349,7 @@ Basic multi-index slicing using slices, lists, and labels. dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :] -You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax +You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax using ``:``, rather than using ``slice(None)``. .. ipython:: python @@ -626,7 +658,7 @@ Index Types ----------- We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex`` -are shown :ref:`here `, and information about +are shown :ref:`here `, and information about `TimedeltaIndex`` is found :ref:`here `. In the following sub-sections we will highlight some other index types. @@ -671,9 +703,9 @@ The ``CategoricalIndex`` is **preserved** after indexing: df2.loc['a'].index -Sorting the index will sort by the order of the categories (Recall that we -created the index with ``CategoricalDtype(list('cab'))``, so the sorted -order is ``cab``.). +Sorting the index will sort by the order of the categories (recall that we +created the index with ``CategoricalDtype(list('cab'))``, so the sorted +order is ``cab``). .. ipython:: python @@ -726,7 +758,7 @@ Int64Index and RangeIndex Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see :ref:`here `. -``Int64Index`` is a fundamental basic index in pandas. +``Int64Index`` is a fundamental basic index in pandas. This is an Immutable array implementing an ordered, sliceable set. Prior to 0.18.0, the ``Int64Index`` would provide the default index for all ``NDFrame`` objects. @@ -765,7 +797,7 @@ The only positional indexing is via ``iloc``. sf.iloc[3] A scalar index that is not found will raise a ``KeyError``. -Slicing is primarily on the values of the index when using ``[],ix,loc``, and +Slicing is primarily on the values of the index when using ``[],ix,loc``, and **always** positional when using ``iloc``. The exception is when the slice is boolean, in which case it will always be positional.