-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improve the docstring of DataFrame.nlargest #20255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3835,34 +3835,87 @@ def sortlevel(self, level=0, axis=0, ascending=True, inplace=False, | |
inplace=inplace, sort_remaining=sort_remaining) | ||
|
||
def nlargest(self, n, columns, keep='first'): | ||
"""Get the rows of a DataFrame sorted by the `n` largest | ||
values of `columns`. | ||
""" | ||
Return the `n` first rows ordered by `columns` in descending order. | ||
|
||
Return the `n` first rows with the largest values in `columns`, in | ||
descending order. The columns that are not specified are returned as | ||
well, but not used for ordering. | ||
|
||
Parameters | ||
---------- | ||
n : int | ||
Number of items to retrieve | ||
columns : list or str | ||
Column name or names to order by | ||
Number of rows to return. | ||
columns : iterable or single value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. str or iterable |
||
Column label(s) to order by. | ||
keep : {'first', 'last'}, default 'first' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Default value is implied for first element in the possible values, so no need to explicitly say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd I think we changed the guide on this (some time last week) .. :) |
||
Where there are duplicate values: | ||
- ``first`` : take the first occurrence. | ||
- ``last`` : take the last occurrence. | ||
|
||
- `first` : prioritize the first occurrence(s) | ||
- `last` : prioritize the last occurrence(s) | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
The `n` first rows ordered by the given columns in descending | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor stylistic comment but "first |
||
order. | ||
|
||
See Also | ||
-------- | ||
DataFrame.nsmallest : Return the `n` first rows ordered by `columns` in | ||
ascending order. | ||
|
||
Notes | ||
----- | ||
This function cannot be used with all column types. For example, when | ||
specifying columns with `object` or `category` dtypes, ``TypeError`` is | ||
raised. | ||
|
||
Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add an example to differentiate between "first" and "last" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice suggestion. Done. |
||
-------- | ||
>>> df = pd.DataFrame({'a': [1, 10, 8, 11, -1], | ||
>>> df = pd.DataFrame({'a': [1, 10, 8, 10, -1], | ||
... 'b': list('abdce'), | ||
... 'c': [1.0, 2.0, np.nan, 3.0, 4.0]}) | ||
>>> df | ||
a b c | ||
0 1 a 1.0 | ||
1 10 b 2.0 | ||
2 8 d NaN | ||
3 10 c 3.0 | ||
4 -1 e 4.0 | ||
|
||
In the following example, we will use ``nlargest`` to select the three | ||
rows having the largest values in column "a". | ||
|
||
>>> df.nlargest(3, 'a') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the DataFrame constructor here is not trivial can we add a line to print the DataFrame as is, giving visual contrast to the examples? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the review. Done. |
||
a b c | ||
3 11 c 3 | ||
1 10 b 2 | ||
2 8 d NaN | ||
a b c | ||
1 10 b 2.0 | ||
3 10 c 3.0 | ||
2 8 d NaN | ||
|
||
When using ``keep='last'``, ties are resolved in reverse order: | ||
|
||
>>> df.nlargest(3, 'a', keep='last') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great thanks! I would just add a short blurb to direct the users attention to what's important here and in the example above. So maybe before the previous example say "In the following example, we will use |
||
a b c | ||
3 10 c 3.0 | ||
1 10 b 2.0 | ||
2 8 d NaN | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add one more example selecting multiple columns? |
||
|
||
To order by the largest values in column "a" and then "c", we can | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great example. I would also add something to Notes touching on this behavior - something like "When |
||
specify multiple columns like in the next example. | ||
|
||
>>> df.nlargest(3, ['a', 'c']) | ||
a b c | ||
3 10 c 3.0 | ||
1 10 b 2.0 | ||
2 8 d NaN | ||
|
||
The dtype of column "b" is `object` and attempting to get its largest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to generalize could instead say "Attempting to use |
||
values raises a ``TypeError`` exception: | ||
|
||
>>> df.nlargest(3, 'b') | ||
Traceback (most recent call last): | ||
TypeError: Column 'b' has dtype object, cannot use method 'nlargest' with this dtype | ||
""" | ||
return algorithms.SelectNFrame(self, | ||
n=n, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm right, this is equivalent to using
df.sort_values(columns, ascending=False).head(n)
, isn't it? If that's the case, I'd explicitly say it in the extended summary, and I'd add both methods to theSee Also
section.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it’s an equivalent result, but this is much more performany