Skip to content

DOC: update the DataFrame.loc[] docstring #20229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 14, 2018
48 changes: 45 additions & 3 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1413,10 +1413,13 @@ def _get_slice_axis(self, slice_obj, axis=None):


class _LocIndexer(_LocationIndexer):
"""Purely label-location based indexer for selection by label.
"""
Selects a group of rows and columns by label(s) or a boolean array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Select" is not fully correct as it is not only for getting, but also for setting?
Or is that general enough? (@jreback @TomAugspurger ) To set you of course also need to select the location to set ..
The "see also" is using "access" now instead of "select"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I had meant to switch to using access all around, although I'm not sure if that gets around the problem you mentioned. Perhaps, it's enough to mention you can use loc to get and set in the extended summary? I'm struggling to think of a good word that implies both getting and setting...but I will keep thinking on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I can also add some examples of using loc for setting values below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would be good


``.loc[]`` is primarily label based, but may also be used with a
boolean array.
boolean array. Note that if no row or column labels are specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this sentence with "Note that..." is correct. I woudl remove it.

the labels will default to the integers 0 to n - 1, with n being
the number of rows/columns, respectively.

Allowed inputs are:

Expand All @@ -1426,14 +1429,53 @@ class _LocIndexer(_LocationIndexer):
- A list or array of labels, e.g. ``['a', 'b', 'c']``.
- A slice object with labels, e.g. ``'a':'f'`` (note that contrary
to usual python slices, **both** the start and the stop are included!).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we could use the .. warning:: directive for this comment (instead of a note in brackets ended with the exclamation mark).

- A boolean array.
- A boolean array, e.g. [True, False, True].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify that this should be the same length as the axis being sliced.

- A ``callable`` function with one argument (the calling Series, DataFrame
or Panel) and that returns valid output for indexing (one of the above)

``.loc`` will raise a ``KeyError`` when the items are not found.

See more at :ref:`Selection by Label <indexing.label>`

See Also
--------
at : Selects a single value for a row/column label pair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFrame. for all these.

iat : Selects a single value for a row/column pair by integer position
iloc : Selects group of rows and columns by integer position(s)

Examples
--------
>>> df = pd.DataFrame([[12, 2, 3], [0, 4, 1], [10, 20, 30]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe number the values consectively, so in the output it is easier to see which row was returned

... index=['r0', 'r1', 'r2'], columns=['c0', 'c1', 'c2'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this cases makes more sense to use a dataframe with data looking more real. It's just an opinion, but I'd understand easier/faster .loc['falcon', 'max_speed'] than .loc['r1', 'c2']. I'd also use just 2 columns, I think it should be enough and makes things simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I have added some labels some more meaningful labels. Let me know if you like it or have any other feedback on this matter.

>>> df
c0 c1 c2
r0 12 2 3
r1 0 4 1
r2 10 20 30
>>> df.loc['r1']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank lines in between cases

c0 0
c1 4
c2 1
Name: r1, dtype: int64
>>> df.loc[['r1', 'r2']]
c0 c1 c2
r1 0 4 1
r2 10 20 30
>>> df.loc['r0', 'c1']
2
>>> df.loc['r0':'r1', 'c0']
r0 12
r1 0
Name: c0, dtype: int64
>>> df.loc[[False, False, True]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have small bits of text breaking these up. Like "Indexing with a boolean array."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a very common thing to do (directly), the boolean indexing right below is MUCH more important.

c0 c1 c2
r2 10 20 30
>>> df.loc[df['c1'] > 10]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not a callable but a boolean series, same for the example below. I think this is a nice example to keep though, but would explain it a bit different (frame it as a boolean Series that is calculated from the frame itself)

c0 c1 c2
r2 10 20 30
>>> df.loc[df['c1'] > 10, ['c0', 'c2']]
c0 c2
r2 10 30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a second example series or DataFrame where the index values are integers, but not 0-len(df)? And then show how .loc uses the labels and not the positions?

In that second example, could you also show a slice like df.loc[2:5] and show that it's closed on the right, so the label 5 is included?

"""

_valid_types = ("labels (MUST BE IN THE INDEX), slices of labels (BOTH "
Expand Down