DOC: improve docs to clarify MultiIndex indexing #19507

cbrnr · 2018-02-02T08:26:47Z

As per our discussion in #16943. Let me know what you think. I'm not quite happy with the new warning box, ideas how to improve the message are welcome.

toobaz · 2018-02-02T08:41:56Z

doc/source/advanced.rst

+   df.loc[('bar', 'two'), 'A']
+
+You don't have to specify all levels of the ``MultiIndex`` by passing only the
+first elements of the tuple. For example, you can use this partially indexing to


"this partially indexing" -> "partial indexing"?

(maybe I would also put "partial" between quotes, or in italic)

toobaz · 2018-02-02T08:42:11Z

doc/source/advanced.rst

+
+You don't have to specify all levels of the ``MultiIndex`` by passing only the
+first elements of the tuple. For example, you can use this partially indexing to
+get all elements in the ``bar`` level as follows:


with bar in the first level?

toobaz · 2018-02-02T08:44:42Z

doc/source/advanced.rst

+
+df.loc['bar']
+
+This is identical to the slightly more verbose notation ``df.loc['bar',]`` using


Maybe "This is a shortcut for the sligthly more verbose notation df.loc['bar',] (equivalent to df.loc[('bar',)])"

toobaz · 2018-02-02T08:45:36Z

doc/source/advanced.rst

+.. warning::
+
+   It is important to note that tuples and lists are not treated identically
+   in pandas.


I think the wording is OK, but I would add another sentence stating the different role (multi-level key vs. list of keys).

cbrnr · 2018-02-02T09:02:13Z

@toobaz done

codecov · 2018-02-02T09:12:46Z

Codecov Report

Merging #19507 into master will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #19507      +/-   ##
==========================================
+ Coverage   91.57%    91.6%   +0.02%     
==========================================
  Files         150      150              
  Lines       48817    48817              
==========================================
+ Hits        44704    44718      +14     
+ Misses       4113     4099      -14

Flag	Coverage Δ
#multiple	`89.97% <ø> (+0.02%)`	⬆️
#single	`41.72% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/util/testing.py	`83.85% <0%> (+0.2%)`	⬆️
pandas/plotting/_converter.py	`66.95% <0%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9551c8...7cef2d3. Read the comment docs.

jorisvandenbossche

Nice work! Added some comments

jorisvandenbossche · 2018-02-02T11:08:02Z

doc/source/advanced.rst


 .. ipython:: python

   df = df.T
   df
-   df.loc['bar']
   df.loc['bar', 'two']


this is also a bit a dubious example IMO (try df.loc['bar', 'A'] ..). Should we maybe recommend to do df.loc[('bar', 'two')] as good practice? (I know in practice it does not matter, as df.loc[('bar', 'A')] works just as well, but it "looks" clearer. Unless we recommend df.loc[('bar', 'two'),] (extra comma) which is not ambiguous I think)

YES. showing lists just perpetuates the confusion.

Good point, this is really not obvious. It gets confusing once we mix row and column indices inside loc. Also, we're getting a bit into Python syntax issues here. Parentheses are not required to specify a tuple, so df.loc[('bar', 'two')] is exactly identical to df.loc['bar', 'two']. I think we should try to relate MultiIndex usage to the standard use case of loc where you specify rows, columns. In fact, we should probably include the worst case scenario where both rows and columns have a MultIindex...

Not sure what to do, let's discuss.

correct they are not required (and that's the problem). However it is much better to be explicit when using mutli-indexes.

OK, then let's recommend the use of parens around tuples then. But I'm not sure recommending a nested tuple adds much to resolve the confusion (i.e. df.loc[('bar', 'two'),] is more confusing than df.loc[('bar', 'two')], because I thought that we now distinguish between tuples and lists).

I wouldn't even personally put in terms of recommentations... I would say that the "default" is df.loc[('bar', 'two'),], and that df.loc['bar', 'two'] is a shortcut which however can lead to ambiguity.

... although then we need to mention that ambiguity is resolved in favour of multiple levels, rather than multiple axes (yeah, there are exceptions currently, but they are bugs)

Yep, +1 on using df.loc[('bar', 'two'),] in the example itself, and mentioning df.loc['bar', 'two'] gives the same in this case but can lead to ambiguity.

jorisvandenbossche · 2018-02-02T11:09:05Z

doc/source/advanced.rst

+df.loc['bar']
+
+This is a shortcut for the slightly more verbose notation ``df.loc['bar',]`` (equivalent
+to ``df.loc[('bar',)]``).


add a final comma to be fully explicit? df.loc[('bar', ),]

Hm, see my reply above...

jorisvandenbossche · 2018-02-02T11:10:26Z

doc/source/advanced.rst

+.. warning::
+
+   It is important to note that tuples and lists are not treated identically
+   in pandas. Whereas a tuple is interpreted as one multi-level key, a list is


in pandas "when it comes to multi-indexing" ? to make clear that in many other places we don't make this distinction (or is that already clear from the context?)

I would say "when it comes to indexing", since pd.Series(range(3)).loc[(1,2)] doesn't work (and I think this is good).

jorisvandenbossche · 2018-02-02T11:11:09Z

doc/source/advanced.rst

+   used to specify several keys.
+
+Importantly, a list of tuples indexes several complete ``MultiIndex`` keys,
+whereas a tuple of lists refer to several values within a level:


I think we need to explain this a bit more in detail

Yeah, it could also be just an info box. It is more an advanced example than something you should really know.

I'm OK with an info box, but I wish I had really known this in my analysis (which I think isn't that advanced). Anyway, any kind of box is good to attract attention.

I don't know whether it is "serious" enough to be written in the docs, but when I happen to discuss this in talks, I always say "tuples go horizontally [traversing levels], lists go vertically [scanning levels]".

jreback · 2018-02-02T11:16:12Z

doc/source/advanced.rst

+
+.. ipython:: python
+
+   pd.set_option('display.multi_sparse', False)


use these with the context manager, e.g. pd.option_context('display.multi_sparse', False)

jreback · 2018-02-02T11:16:33Z

doc/source/advanced.rst

@@ -180,14 +178,13 @@ For example:

 .. ipython:: python

-   # original MultiIndex
-   df.columns
+   df.columns  # original MultiIndex


put comments on a separate line

Actually I put them next to the commands because otherwise they look like they don't belong to the code (since the prompts are also shown). See http://pandas.pydata.org/pandas-docs/stable/advanced.html#defined-levels for an example how it looks right now.

I agree in the actual html output it might be clearer to have it on a single line (in general we should avoid comments in long code blocks, and just put that as text between multiple code-blocks, but in this case I think it is fine)

@jreback OK with putting the comments on the same lines?

jreback · 2018-02-02T11:17:01Z

doc/source/advanced.rst


 This is done to avoid a recomputation of the levels in order to make slicing
-highly performant. If you want to see the actual used levels.
+highly performant. If you want to see only the used levels, you can use the
+`get_level_values()` method.


if you reference a method, use :func:`MultiIndex.get_level_values`

jreback · 2018-02-02T11:18:09Z

doc/source/advanced.rst


 .. ipython:: python

   df = df.T
   df
-   df.loc['bar']
   df.loc['bar', 'two']


YES. showing lists just perpetuates the confusion.

jreback · 2018-02-02T11:18:38Z

doc/source/advanced.rst

   df.loc['bar', 'two']

+If you also want to index a specific column with ``.loc``, you have to use


This is not very clear. You must use a tuple is more explicit.

jorisvandenbossche · 2018-02-09T14:50:21Z

@cbrnr Can you update based on the comments?

cbrnr · 2018-02-13T08:50:31Z

I've updated the docs based on all comments. Please check if it is OK now.

jorisvandenbossche

Looks good to me!
Added one minor comment

jorisvandenbossche · 2018-02-13T10:00:38Z

doc/source/advanced.rst

+.. ipython:: python
+
+   s = pd.Series([1, 2, 3, 4],
+                 index=pd.MultiIndex.from_product([["A", "B"], ["c", "d"]]))


maybe add here a "e" in ["c", "d", "e"], so the second example below actually does a selection

cbrnr · 2018-02-13T10:53:33Z

CircleCI error is unrelated - could someone restart it please?

jorisvandenbossche · 2018-02-15T09:00:51Z

@cbrnr Thanks a lot!
(circle ci error was indeed unrelated, connectivity issue)

cbrnr mentioned this pull request Feb 2, 2018

MultiIndex row indexing with .loc fail with tuple but work with list of indices #16943

Closed

toobaz reviewed Feb 2, 2018

View reviewed changes

jorisvandenbossche added Docs MultiIndex labels Feb 2, 2018

jorisvandenbossche reviewed Feb 2, 2018

View reviewed changes

jreback requested changes Feb 2, 2018

View reviewed changes

cbrnr added 3 commits February 13, 2018 09:32

Improve docs to clarify MultiIndex indexing

80ee7c3

Address comments

14d770c

Address comments

e9ba3da

jorisvandenbossche approved these changes Feb 13, 2018

View reviewed changes

Update example and fix typo

7cef2d3

jorisvandenbossche changed the title ~~Improve docs to clarify MultiIndex indexing~~ DOC: improve docs to clarify MultiIndex indexing Feb 15, 2018

jorisvandenbossche merged commit 405ed25 into pandas-dev:master Feb 15, 2018

jorisvandenbossche added this to the 0.23.0 milestone Feb 15, 2018

cbrnr deleted the multiindex_docs branch February 15, 2018 09:06

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

DOC: improve docs to clarify MultiIndex indexing (pandas-dev#19507)

bdd6a33


		df.loc['bar']

		This is identical to the slightly more verbose notation ``df.loc['bar',]`` using


		.. ipython:: python

		pd.set_option('display.multi_sparse', False)

		df.loc['bar', 'two']

		If you also want to index a specific column with ``.loc``, you have to use

Uh oh!

DOC: improve docs to clarify MultiIndex indexing #19507

DOC: improve docs to clarify MultiIndex indexing #19507

Uh oh!

Conversation

cbrnr commented Feb 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cbrnr commented Feb 2, 2018

Uh oh!

codecov bot commented Feb 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Feb 2, 2018 •

edited

Loading