From 80ee7c31d9e03edb290bf7ffde760c5ba0b6b964 Mon Sep 17 00:00:00 2001
From: Clemens Brunner <clemens.brunner@gmail.com>
Date: Fri, 2 Feb 2018 09:25:26 +0100
Subject: [PATCH 1/4] Improve docs to clarify MultiIndex indexing

---
 doc/source/advanced.rst | 74 ++++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 23 deletions(-)

diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
index ca903dadc6eb1..be115ca08d44e 100644
--- a/doc/source/advanced.rst
+++ b/doc/source/advanced.rst
@@ -113,7 +113,14 @@ of the index is up to you:
    pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])
 
 We've "sparsified" the higher levels of the indexes to make the console output a
-bit easier on the eyes.
+bit easier on the eyes. Note that how the index is displayed can be controlled using the
+``multi_sparse`` option in ``pandas.set_options()``:
+
+.. ipython:: python
+
+   pd.set_option('display.multi_sparse', False)
+   df
+   pd.set_option('display.multi_sparse', True)
 
 It's worth keeping in mind that there's nothing preventing you from using
 tuples as atomic labels on an axis:
@@ -129,15 +136,6 @@ can find yourself working with hierarchically-indexed data without creating a
 ``MultiIndex`` explicitly yourself. However, when loading data from a file, you
 may wish to generate your own ``MultiIndex`` when preparing the data set.
 
-Note that how the index is displayed by be controlled using the
-``multi_sparse`` option in ``pandas.set_options()``:
-
-.. ipython:: python
-
-   pd.set_option('display.multi_sparse', False)
-   df
-   pd.set_option('display.multi_sparse', True)
-
 .. _advanced.get_level_values:
 
 Reconstructing the level labels
@@ -180,14 +178,13 @@ For example:
 
 .. ipython:: python
 
-   # original MultiIndex
-   df.columns
+   df.columns  # original MultiIndex
 
-   # sliced
-   df[['foo','qux']].columns
+   df[['foo','qux']].columns  # sliced
 
 This is done to avoid a recomputation of the levels in order to make slicing
-highly performant. If you want to see the actual used levels.
+highly performant. If you want to see only the used levels, you can use the
+`get_level_values()` method.
 
 .. ipython:: python
 
@@ -196,7 +193,7 @@ highly performant. If you want to see the actual used levels.
    # for a specific level
    df[['foo','qux']].columns.get_level_values(0)
 
-To reconstruct the ``MultiIndex`` with only the used levels, the 
+To reconstruct the ``MultiIndex`` with only the used levels, the
 ``remove_unused_levels`` method may be used.
 
 .. versionadded:: 0.20.0
@@ -231,16 +228,31 @@ Advanced indexing with hierarchical index
 -----------------------------------------
 
 Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc`` is a
-bit challenging, but we've made every effort to do so. For example the
-following works as you would expect:
+bit challenging, but we've made every effort to do so. In general, MultiIndex
+keys take the form of tuples. For example, the following works as you would expect:
 
 .. ipython:: python
 
    df = df.T
    df
-   df.loc['bar']
    df.loc['bar', 'two']
 
+If you also want to index a specific column with ``.loc``, you have to use
+parentheses around the tuple like this:
+
+.. ipython:: python
+
+   df.loc[('bar', 'two'), 'A']
+
+You don't have to specify all levels of the ``MultiIndex`` by passing only the
+first elements of the tuple. For example, you can use this partially indexing to
+get all elements in the ``bar`` level as follows:
+
+df.loc['bar']
+
+This is identical to the slightly more verbose notation ``df.loc['bar',]`` using
+a tuple with one element.
+
 "Partial" slicing also works quite nicely.
 
 .. ipython:: python
@@ -260,6 +272,22 @@ Passing a list of labels or tuples works similar to reindexing:
 
    df.loc[[('bar', 'two'), ('qux', 'one')]]
 
+.. warning::
+
+   It is important to note that tuples and lists are not treated identically
+   in pandas.
+
+Importantly, a list of tuples indexes several complete ``MultiIndex`` keys,
+whereas a tuple of lists refer to several values within a level:
+
+.. ipython:: python
+
+   s = pd.Series([1, 2, 3, 4],
+                 index=pd.MultiIndex.from_product([["A", "B"], ["c", "d"]]))
+   s.loc[[("A", "c"), ("B", "d")]]  # list of tuples
+   s.loc[(["A", "B"], ["c", "d"])]  # tuple of lists
+
+
 .. _advanced.mi_slicers:
 
 Using slicers
@@ -317,7 +345,7 @@ Basic multi-index slicing using slices, lists, and labels.
    dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]
 
 
-You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax 
+You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax
 using ``:``, rather than using ``slice(None)``.
 
 .. ipython:: python
@@ -626,7 +654,7 @@ Index Types
 -----------
 
 We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex``
-are shown :ref:`here <timeseries.overview>`, and information about 
+are shown :ref:`here <timeseries.overview>`, and information about
 `TimedeltaIndex`` is found :ref:`here <timedeltas.timedeltas>`.
 
 In the following sub-sections we will highlight some other index types.
@@ -726,7 +754,7 @@ Int64Index and RangeIndex
 
    Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see :ref:`here <whatsnew_0180.float_indexers>`.
 
-``Int64Index`` is a fundamental basic index in pandas. 
+``Int64Index`` is a fundamental basic index in pandas.
 This is an Immutable array implementing an ordered, sliceable set.
 Prior to 0.18.0, the ``Int64Index`` would provide the default index for all ``NDFrame`` objects.
 
@@ -765,7 +793,7 @@ The only positional indexing is via ``iloc``.
    sf.iloc[3]
 
 A scalar index that is not found will raise a ``KeyError``.
-Slicing is primarily on the values of the index when using ``[],ix,loc``, and 
+Slicing is primarily on the values of the index when using ``[],ix,loc``, and
 **always** positional when using ``iloc``. The exception is when the slice is
 boolean, in which case it will always be positional.
 

From 14d770c15d979853b70cc43a2ce570f6305e5349 Mon Sep 17 00:00:00 2001
From: Clemens Brunner <clemens.brunner@gmail.com>
Date: Fri, 2 Feb 2018 10:01:53 +0100
Subject: [PATCH 2/4] Address comments

---
 doc/source/advanced.rst | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
index be115ca08d44e..eb0c60df2c4d8 100644
--- a/doc/source/advanced.rst
+++ b/doc/source/advanced.rst
@@ -245,13 +245,13 @@ parentheses around the tuple like this:
    df.loc[('bar', 'two'), 'A']
 
 You don't have to specify all levels of the ``MultiIndex`` by passing only the
-first elements of the tuple. For example, you can use this partially indexing to
-get all elements in the ``bar`` level as follows:
+first elements of the tuple. For example, you can use *partial* indexing to
+get all elements with ``bar`` in the first level as follows:
 
 df.loc['bar']
 
-This is identical to the slightly more verbose notation ``df.loc['bar',]`` using
-a tuple with one element.
+This is a shortcut for the slightly more verbose notation ``df.loc['bar',]`` (equivalent
+to ``df.loc[('bar',)]``).
 
 "Partial" slicing also works quite nicely.
 
@@ -275,7 +275,8 @@ Passing a list of labels or tuples works similar to reindexing:
 .. warning::
 
    It is important to note that tuples and lists are not treated identically
-   in pandas.
+   in pandas. Whereas a tuple is interpreted as one multi-level key, a list is
+   used to specify several keys.
 
 Importantly, a list of tuples indexes several complete ``MultiIndex`` keys,
 whereas a tuple of lists refer to several values within a level:

From e9ba3dac2c48639be5f9f9ff8e11f377e585f983 Mon Sep 17 00:00:00 2001
From: Clemens Brunner <clemens.brunner@gmail.com>
Date: Tue, 13 Feb 2018 09:49:24 +0100
Subject: [PATCH 3/4] Address comments

---
 doc/source/advanced.rst | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
index eb0c60df2c4d8..c583509bef98e 100644
--- a/doc/source/advanced.rst
+++ b/doc/source/advanced.rst
@@ -118,9 +118,8 @@ bit easier on the eyes. Note that how the index is displayed can be controlled u
 
 .. ipython:: python
 
-   pd.set_option('display.multi_sparse', False)
-   df
-   pd.set_option('display.multi_sparse', True)
+   with pd.option_context('display.multi_sparse', False):
+       df
 
 It's worth keeping in mind that there's nothing preventing you from using
 tuples as atomic labels on an axis:
@@ -184,7 +183,7 @@ For example:
 
 This is done to avoid a recomputation of the levels in order to make slicing
 highly performant. If you want to see only the used levels, you can use the
-`get_level_values()` method.
+:func:`MultiIndex.get_level_values` method.
 
 .. ipython:: python
 
@@ -235,23 +234,26 @@ keys take the form of tuples. For example, the following works as you would expe
 
    df = df.T
    df
-   df.loc['bar', 'two']
+   df.loc[('bar', 'two'),]
+
+Note that ``df.loc['bar', 'two']`` would also work in this example, but this shorthand
+notation can lead to ambiguity in general.
 
-If you also want to index a specific column with ``.loc``, you have to use
-parentheses around the tuple like this:
+If you also want to index a specific column with ``.loc``, you must use a tuple
+like this:
 
 .. ipython:: python
 
    df.loc[('bar', 'two'), 'A']
 
 You don't have to specify all levels of the ``MultiIndex`` by passing only the
-first elements of the tuple. For example, you can use *partial* indexing to
+first elements of the tuple. For example, you can use "partial" indexing to
 get all elements with ``bar`` in the first level as follows:
 
 df.loc['bar']
 
-This is a shortcut for the slightly more verbose notation ``df.loc['bar',]`` (equivalent
-to ``df.loc[('bar',)]``).
+This is a shortcut for the slightly more verbose notation ``df.loc[('bar',),]`` (equivalent
+to ``df.loc['bar',]`` in this example).
 
 "Partial" slicing also works quite nicely.
 
@@ -272,11 +274,12 @@ Passing a list of labels or tuples works similar to reindexing:
 
    df.loc[[('bar', 'two'), ('qux', 'one')]]
 
-.. warning::
+.. info::
 
    It is important to note that tuples and lists are not treated identically
-   in pandas. Whereas a tuple is interpreted as one multi-level key, a list is
-   used to specify several keys.
+   in pandas when it comes to indexing. Whereas a tuple is interpreted as one
+   multi-level key, a list is used to specify several keys. Or in other words,
+   tuples go horizontally (traversing levels), lists go vertically (scanning levels).
 
 Importantly, a list of tuples indexes several complete ``MultiIndex`` keys,
 whereas a tuple of lists refer to several values within a level:

From 7cef2d37b9383d492230ea9a1245261dd17c88b6 Mon Sep 17 00:00:00 2001
From: Clemens Brunner <clemens.brunner@gmail.com>
Date: Tue, 13 Feb 2018 11:11:23 +0100
Subject: [PATCH 4/4] Update example and fix typo

---
 doc/source/advanced.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst
index c583509bef98e..c455fbb8d0687 100644
--- a/doc/source/advanced.rst
+++ b/doc/source/advanced.rst
@@ -286,8 +286,8 @@ whereas a tuple of lists refer to several values within a level:
 
 .. ipython:: python
 
-   s = pd.Series([1, 2, 3, 4],
-                 index=pd.MultiIndex.from_product([["A", "B"], ["c", "d"]]))
+   s = pd.Series([1, 2, 3, 4, 5, 6],
+                 index=pd.MultiIndex.from_product([["A", "B"], ["c", "d", "e"]]))
    s.loc[[("A", "c"), ("B", "d")]]  # list of tuples
    s.loc[(["A", "B"], ["c", "d"])]  # tuple of lists
 
@@ -703,9 +703,9 @@ The ``CategoricalIndex`` is **preserved** after indexing:
 
    df2.loc['a'].index
 
-Sorting the index will sort by the order of the categories (Recall that we 
-created the index with ``CategoricalDtype(list('cab'))``, so the sorted 
-order is ``cab``.). 
+Sorting the index will sort by the order of the categories (recall that we
+created the index with ``CategoricalDtype(list('cab'))``, so the sorted
+order is ``cab``).
 
 .. ipython:: python