Skip to content

DOC: update the DataFrame.loc[] docstring #20229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 14, 2018
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
226 changes: 222 additions & 4 deletions pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1413,7 +1413,8 @@ def _get_slice_axis(self, slice_obj, axis=None):


class _LocIndexer(_LocationIndexer):
"""Purely label-location based indexer for selection by label.
"""
Access a group of rows and columns by label(s) or a boolean array.

``.loc[]`` is primarily label based, but may also be used with a
boolean array.
Expand All @@ -1426,14 +1427,231 @@ class _LocIndexer(_LocationIndexer):
- A list or array of labels, e.g. ``['a', 'b', 'c']``.
- A slice object with labels, e.g. ``'a':'f'`` (note that contrary
to usual python slices, **both** the start and the stop are included!).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we could use the .. warning:: directive for this comment (instead of a note in brackets ended with the exclamation mark).

- A boolean array.
- A boolean array of the same length as the axis being sliced,
e.g. ``[True, False, True]``.
- A ``callable`` function with one argument (the calling Series, DataFrame
or Panel) and that returns valid output for indexing (one of the above)

``.loc`` will raise a ``KeyError`` when the items are not found.

See more at :ref:`Selection by Label <indexing.label>`

See Also
--------
DateFrame.at : Access a single value for a row/column label pair
DateFrame.iloc : Access group of rows and columns by integer position(s)
Series.loc : Access group of values using labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add DataFrame.xs too.


Examples
--------
**Getting values**

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
... index=['r0', 'r1', 'r2'], columns=['c0', 'c1', 'c2'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this cases makes more sense to use a dataframe with data looking more real. It's just an opinion, but I'd understand easier/faster .loc['falcon', 'max_speed'] than .loc['r1', 'c2']. I'd also use just 2 columns, I think it should be enough and makes things simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I have added some labels some more meaningful labels. Let me know if you like it or have any other feedback on this matter.

>>> df
c0 c1 c2
r0 1 2 3
r1 4 5 6
r2 7 8 9

Single label. Note this returns the row as a Series.

>>> df.loc['r1']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank lines in between cases

c0 4
c1 5
c2 6
Name: r1, dtype: int64

List with a single label. Note using ``[[]]`` returns a DataFrame.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slightly redudant as you are showing the example with a list below


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only a single blank line (below as well)

>>> df.loc[['r1']]
c0 c1 c2
r1 4 5 6

Single label for row and column

>>> df.loc['r0', 'c1']
2

A list of labels

>>> df.loc[['r1', 'r2']]
c0 c1 c2
r1 4 5 6
r2 7 8 9

Slice with labels for row and single label for column. Note that
contrary to usual python slices, both the start and the stop are
included!

>>> df.loc['r0':'r1', 'c0']
r0 1
r1 4
Name: c0, dtype: int64

Boolean list with the same length as the row axis

>>> df.loc[[False, False, True]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have small bits of text breaking these up. Like "Indexing with a boolean array."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a very common thing to do (directly), the boolean indexing right below is MUCH more important.

c0 c1 c2
r2 7 8 9

Conditional that returns a boolean Series

>>> df.loc[df['c1'] > 6]
c0 c1 c2
r2 7 8 9

Conditional that returns a boolean Series with column labels specified

>>> df.loc[df['c1'] > 6, ['c0', 'c2']]
c0 c2
r2 7 9

Callable that returns a boolean Series

>>> df.loc[lambda df: df['c1'] == 8]
c0 c1 c2
r2 7 8 9

**Setting values**

Set value for all items matching the list of labels

>>> df.loc[['r1', 'r2'], ['c1']] = 50
>>> df
c0 c1 c2
r0 1 2 3
r1 4 50 6
r2 7 50 9

Set value for an entire row

>>> df.loc['r0'] = 10
>>> df
c0 c1 c2
r0 10 10 10
r1 4 50 6
r2 7 50 9

Set value for an entire column

>>> df.loc[:, 'c0'] = 30
>>> df
c0 c1 c2
r0 30 10 10
r1 30 50 6
r2 30 50 9

Set value for rows matching callable condition

>>> df.loc[df['c2'] < 10] = 0
>>> df
c0 c1 c2
r0 30 10 10
r1 0 0 0
r2 0 0 0

**Getting values on a DataFrame with an index that has integer labels**

Another example using integers for the index

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]],
... index=[7, 8, 9], columns=['c0', 'c1', 'c2'])
>>> df
c0 c1 c2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice examples! can you add one using a MultiIndex for the index, and show selecting with tuples. sure this is getting long, but these examples are useful.

7 1 2 3
8 4 5 6
9 7 8 9

Slice with integer labels for rows. Note that contrary to usual
python slices, both the start and the stop are included!

>>> df.loc[7:9]
c0 c1 c2
7 1 2 3
8 4 5 6
9 7 8 9

**Getting values with a MultiIndex**

A number of examples using a DataFrame with a MultiIndex

>>> tuples = [('r0', 'bar'), ('r0', 'foo'), ('r1', 'bar'),
... ('r1', 'foo'), ('r2', 'bar'), ('r2', 'baz')]
>>> index = pd.MultiIndex.from_tuples(tuples)
>>> values = [[12,2,3], [0,4,1], [10,20,30],
... [1, 4, 1], [7, 1, 2], [16, 36, 40]]
>>> df = pd.DataFrame(values, columns=['c0', 'c1', 'c2'], index=index)
>>> df
c0 c1 c2
r0 bar 12 2 3
foo 0 4 1
r1 bar 10 20 30
foo 1 4 1
r2 bar 7 1 2
baz 16 36 40

Single label. Note this returns a DataFrame with a single index.

>>> df.loc['r0']
c0 c1 c2
bar 12 2 3
foo 0 4 1

Single index tuple. Note this returns a Series.

>>> df.loc[('r0', 'bar')]
c0 12
c1 2
c2 3
Name: (r0, bar), dtype: int64

Single label for row and column. Similar to passing in a tuple, this
returns a Series.

>>> df.loc['r0', 'foo']
c0 0
c1 4
c2 1
Name: (r0, foo), dtype: int64

Single tuple. Note using ``[[]]`` returns a DataFrame.

>>> df.loc[[('r0', 'bar')]]
c0 c1 c2
r0 bar 12 2 3

Single tuple for the index with a single label for the column

>>> df.loc[('r0', 'foo'), 'c1']
4

Boolean list

>>> df.loc[[True, False, True, False, True, True]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above (remove this example)

c0 c1 c2
r0 bar 12 2 3
r1 bar 10 20 30
r2 bar 7 1 2
baz 16 36 40

Slice from index tuple to single label

>>> df.loc[('r0', 'foo'):'r1']
c0 c1 c2
r0 foo 0 4 1
r1 bar 10 20 30
foo 1 4 1

Slice from index tuple to index tuple

>>> df.loc[('r0', 'foo'):('r1', 'bar')]
c0 c1 c2
r0 foo 0 4 1
r1 bar 10 20 30

Raises
------
KeyError:
when items are not found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when any items are not found

"""

_valid_types = ("labels (MUST BE IN THE INDEX), slices of labels (BOTH "
Expand Down