-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the DataFrame.loc[] docstring #20229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
2f359b9
1a93d2a
a3238d9
78f342c
c28a796
64c698b
0902b36
a23a8e9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1413,7 +1413,8 @@ def _get_slice_axis(self, slice_obj, axis=None): | |
|
||
|
||
class _LocIndexer(_LocationIndexer): | ||
"""Purely label-location based indexer for selection by label. | ||
""" | ||
Access a group of rows and columns by label(s) or a boolean array. | ||
|
||
``.loc[]`` is primarily label based, but may also be used with a | ||
boolean array. | ||
|
@@ -1426,14 +1427,140 @@ class _LocIndexer(_LocationIndexer): | |
- A list or array of labels, e.g. ``['a', 'b', 'c']``. | ||
- A slice object with labels, e.g. ``'a':'f'`` (note that contrary | ||
to usual python slices, **both** the start and the stop are included!). | ||
- A boolean array. | ||
- A boolean array of the same length as the axis being sliced, | ||
e.g. ``[True, False, True]``. | ||
- A ``callable`` function with one argument (the calling Series, DataFrame | ||
or Panel) and that returns valid output for indexing (one of the above) | ||
|
||
``.loc`` will raise a ``KeyError`` when the items are not found. | ||
|
||
See more at :ref:`Selection by Label <indexing.label>` | ||
|
||
See Also | ||
-------- | ||
DateFrame.at | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add Series.loc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added below this |
||
Access a single value for a row/column label pair | ||
DateFrame.iat | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can remove .iat from here (leave the .iloc though) |
||
Access a single value for a row/column pair by integer position | ||
DateFrame.iloc | ||
Access group of rows and columns by integer position(s) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment here as on the other PR:
|
||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame([[12, 2, 3], [0, 4, 1], [10, 20, 30]], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would maybe number the values consectively, so in the output it is easier to see which row was returned |
||
... index=['r0', 'r1', 'r2'], columns=['c0', 'c1', 'c2']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think in this cases makes more sense to use a dataframe with data looking more real. It's just an opinion, but I'd understand easier/faster There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. I have added some labels some more meaningful labels. Let me know if you like it or have any other feedback on this matter. |
||
>>> df | ||
c0 c1 c2 | ||
r0 12 2 3 | ||
r1 0 4 1 | ||
r2 10 20 30 | ||
|
||
Single label for row (note it would be faster to use ``DateFrame.at`` in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't need the note about perf There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can mention that this returns a Series and using |
||
this case) | ||
|
||
>>> df.loc['r1'] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blank lines in between cases |
||
c0 0 | ||
c1 4 | ||
c2 1 | ||
Name: r1, dtype: int64 | ||
|
||
Single label for row and column (note it would be faster to use | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same, remove the commentary |
||
``DateFrame.at`` in this case) | ||
|
||
>>> df.loc['r0', 'c1'] | ||
2 | ||
|
||
A list of labels | ||
|
||
>>> df.loc[['r1', 'r2']] | ||
c0 c1 c2 | ||
r1 0 4 1 | ||
r2 10 20 30 | ||
|
||
Slice with labels for row and single label for column. Note that | ||
contrary to usual python slices, both the start and the stop are | ||
included! | ||
|
||
>>> df.loc['r0':'r1', 'c0'] | ||
r0 12 | ||
r1 0 | ||
Name: c0, dtype: int64 | ||
|
||
Boolean list with the same length as the row axis | ||
|
||
>>> df.loc[[False, False, True]] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be nice to have small bits of text breaking these up. Like "Indexing with a boolean array." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not a very common thing to do (directly), the boolean indexing right below is MUCH more important. |
||
c0 c1 c2 | ||
r2 10 20 30 | ||
|
||
Callable that returns valid output for indexing | ||
|
||
>>> df.loc[df['c1'] > 10] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is actually not a callable but a boolean series, same for the example below. I think this is a nice example to keep though, but would explain it a bit different (frame it as a boolean Series that is calculated from the frame itself) |
||
c0 c1 c2 | ||
r2 10 20 30 | ||
|
||
Callable that returns valid output with column labels specified | ||
|
||
>>> df.loc[df['c1'] > 10, ['c0', 'c2']] | ||
c0 c2 | ||
r2 10 30 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you make a second example series or DataFrame where the index values are integers, but not In that second example, could you also show a slice like |
||
|
||
Set value for all items matching the list of labels | ||
|
||
>>> df.loc[['r1', 'r2'], ['c1']] = 70 | ||
>>> df | ||
c0 c1 c2 | ||
r0 12 2 3 | ||
r1 0 70 1 | ||
r2 10 70 30 | ||
|
||
Set value for an entire row | ||
|
||
>>> df.loc['r0'] = 70 | ||
>>> df | ||
c0 c1 c2 | ||
r0 70 70 70 | ||
r1 0 70 1 | ||
r2 10 70 30 | ||
|
||
Set value for an entire column | ||
|
||
>>> df.loc[:, 'c0'] = 30 | ||
>>> df | ||
c0 c1 c2 | ||
r0 30 70 70 | ||
r1 30 70 1 | ||
r2 30 70 30 | ||
|
||
Set value for rows matching callable condition | ||
|
||
>>> df.loc[df['c2'] < 10] = 0 | ||
>>> df | ||
c0 c1 c2 | ||
r0 30 70 70 | ||
r1 0 0 0 | ||
r2 30 70 30 | ||
|
||
Another example using integers for the index | ||
|
||
>>> df = pd.DataFrame([[12, 2, 3], [0, 4, 1], [10, 20, 30]], | ||
... index=[7, 8, 9], columns=['c0', 'c1', 'c2']) | ||
>>> df | ||
c0 c1 c2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice examples! can you add one using a MultiIndex for the index, and show selecting with tuples. sure this is getting long, but these examples are useful. |
||
7 12 2 3 | ||
8 0 4 1 | ||
9 10 20 30 | ||
|
||
Slice with integer labels for rows. Note that contrary to usual | ||
python slices, both the start and the stop are included! | ||
|
||
>>> df.loc[7:9] | ||
c0 c1 c2 | ||
7 12 2 3 | ||
8 0 4 1 | ||
9 10 20 30 | ||
|
||
Raises | ||
------ | ||
KeyError: | ||
when items are not found | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when any items are not found |
||
""" | ||
|
||
_valid_types = ("labels (MUST BE IN THE INDEX), slices of labels (BOTH " | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we could use the
.. warning::
directive for this comment (instead of a note in brackets ended with the exclamation mark).