Skip to content

Commit 405ed25

Browse files
cbrnrjorisvandenbossche
authored andcommitted
DOC: improve docs to clarify MultiIndex indexing (#19507)
1 parent d59aad6 commit 405ed25

File tree

1 file changed

+59
-27
lines changed

1 file changed

+59
-27
lines changed

doc/source/advanced.rst

+59-27
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,13 @@ of the index is up to you:
113113
pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
114114
115115
We've "sparsified" the higher levels of the indexes to make the console output a
116-
bit easier on the eyes.
116+
bit easier on the eyes. Note that how the index is displayed can be controlled using the
117+
``multi_sparse`` option in ``pandas.set_options()``:
118+
119+
.. ipython:: python
120+
121+
with pd.option_context('display.multi_sparse', False):
122+
df
117123
118124
It's worth keeping in mind that there's nothing preventing you from using
119125
tuples as atomic labels on an axis:
@@ -129,15 +135,6 @@ can find yourself working with hierarchically-indexed data without creating a
129135
``MultiIndex`` explicitly yourself. However, when loading data from a file, you
130136
may wish to generate your own ``MultiIndex`` when preparing the data set.
131137

132-
Note that how the index is displayed by be controlled using the
133-
``multi_sparse`` option in ``pandas.set_options()``:
134-
135-
.. ipython:: python
136-
137-
pd.set_option('display.multi_sparse', False)
138-
df
139-
pd.set_option('display.multi_sparse', True)
140-
141138
.. _advanced.get_level_values:
142139

143140
Reconstructing the level labels
@@ -180,14 +177,13 @@ For example:
180177

181178
.. ipython:: python
182179
183-
  # original MultiIndex
184-
  df.columns
180+
  df.columns # original MultiIndex
185181
186-
# sliced
187-
df[['foo','qux']].columns
182+
df[['foo','qux']].columns # sliced
188183
189184
This is done to avoid a recomputation of the levels in order to make slicing
190-
highly performant. If you want to see the actual used levels.
185+
highly performant. If you want to see only the used levels, you can use the
186+
:func:`MultiIndex.get_level_values` method.
191187

192188
.. ipython:: python
193189
@@ -196,7 +192,7 @@ highly performant. If you want to see the actual used levels.
196192
# for a specific level
197193
df[['foo','qux']].columns.get_level_values(0)
198194
199-
To reconstruct the ``MultiIndex`` with only the used levels, the
195+
To reconstruct the ``MultiIndex`` with only the used levels, the
200196
``remove_unused_levels`` method may be used.
201197

202198
.. versionadded:: 0.20.0
@@ -231,15 +227,33 @@ Advanced indexing with hierarchical index
231227
-----------------------------------------
232228

233229
Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc`` is a
234-
bit challenging, but we've made every effort to do so. For example the
235-
following works as you would expect:
230+
bit challenging, but we've made every effort to do so. In general, MultiIndex
231+
keys take the form of tuples. For example, the following works as you would expect:
236232

237233
.. ipython:: python
238234
239235
df = df.T
240236
df
241-
df.loc['bar']
242-
df.loc['bar', 'two']
237+
df.loc[('bar', 'two'),]
238+
239+
Note that ``df.loc['bar', 'two']`` would also work in this example, but this shorthand
240+
notation can lead to ambiguity in general.
241+
242+
If you also want to index a specific column with ``.loc``, you must use a tuple
243+
like this:
244+
245+
.. ipython:: python
246+
247+
df.loc[('bar', 'two'), 'A']
248+
249+
You don't have to specify all levels of the ``MultiIndex`` by passing only the
250+
first elements of the tuple. For example, you can use "partial" indexing to
251+
get all elements with ``bar`` in the first level as follows:
252+
253+
df.loc['bar']
254+
255+
This is a shortcut for the slightly more verbose notation ``df.loc[('bar',),]`` (equivalent
256+
to ``df.loc['bar',]`` in this example).
243257

244258
"Partial" slicing also works quite nicely.
245259

@@ -260,6 +274,24 @@ Passing a list of labels or tuples works similar to reindexing:
260274
261275
df.loc[[('bar', 'two'), ('qux', 'one')]]
262276
277+
.. info::
278+
279+
It is important to note that tuples and lists are not treated identically
280+
in pandas when it comes to indexing. Whereas a tuple is interpreted as one
281+
multi-level key, a list is used to specify several keys. Or in other words,
282+
tuples go horizontally (traversing levels), lists go vertically (scanning levels).
283+
284+
Importantly, a list of tuples indexes several complete ``MultiIndex`` keys,
285+
whereas a tuple of lists refer to several values within a level:
286+
287+
.. ipython:: python
288+
289+
s = pd.Series([1, 2, 3, 4, 5, 6],
290+
index=pd.MultiIndex.from_product([["A", "B"], ["c", "d", "e"]]))
291+
s.loc[[("A", "c"), ("B", "d")]] # list of tuples
292+
s.loc[(["A", "B"], ["c", "d"])] # tuple of lists
293+
294+
263295
.. _advanced.mi_slicers:
264296

265297
Using slicers
@@ -317,7 +349,7 @@ Basic multi-index slicing using slices, lists, and labels.
317349
dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]
318350
319351
320-
You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax
352+
You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax
321353
using ``:``, rather than using ``slice(None)``.
322354

323355
.. ipython:: python
@@ -626,7 +658,7 @@ Index Types
626658
-----------
627659

628660
We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex``
629-
are shown :ref:`here <timeseries.overview>`, and information about
661+
are shown :ref:`here <timeseries.overview>`, and information about
630662
`TimedeltaIndex`` is found :ref:`here <timedeltas.timedeltas>`.
631663

632664
In the following sub-sections we will highlight some other index types.
@@ -671,9 +703,9 @@ The ``CategoricalIndex`` is **preserved** after indexing:
671703
672704
df2.loc['a'].index
673705
674-
Sorting the index will sort by the order of the categories (Recall that we
675-
created the index with ``CategoricalDtype(list('cab'))``, so the sorted
676-
order is ``cab``.).
706+
Sorting the index will sort by the order of the categories (recall that we
707+
created the index with ``CategoricalDtype(list('cab'))``, so the sorted
708+
order is ``cab``).
677709

678710
.. ipython:: python
679711
@@ -726,7 +758,7 @@ Int64Index and RangeIndex
726758
727759
Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see :ref:`here <whatsnew_0180.float_indexers>`.
728760
729-
``Int64Index`` is a fundamental basic index in pandas.
761+
``Int64Index`` is a fundamental basic index in pandas.
730762
This is an Immutable array implementing an ordered, sliceable set.
731763
Prior to 0.18.0, the ``Int64Index`` would provide the default index for all ``NDFrame`` objects.
732764
@@ -765,7 +797,7 @@ The only positional indexing is via ``iloc``.
765797
sf.iloc[3]
766798
767799
A scalar index that is not found will raise a ``KeyError``.
768-
Slicing is primarily on the values of the index when using ``[],ix,loc``, and
800+
Slicing is primarily on the values of the index when using ``[],ix,loc``, and
769801
**always** positional when using ``iloc``. The exception is when the slice is
770802
boolean, in which case it will always be positional.
771803

0 commit comments

Comments
 (0)