-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improve the docstring of DataFrame.nlargest #20255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3835,34 +3835,44 @@ def sortlevel(self, level=0, axis=0, ascending=True, inplace=False, | |
inplace=inplace, sort_remaining=sort_remaining) | ||
|
||
def nlargest(self, n, columns, keep='first'): | ||
"""Get the rows of a DataFrame sorted by the `n` largest | ||
values of `columns`. | ||
""" | ||
Return the `n` largest rows sorted by `columns`. | ||
|
||
Sort the DataFrame by `columns` in descending order and return the top | ||
`n` rows. | ||
|
||
Parameters | ||
---------- | ||
n : int | ||
Number of items to retrieve | ||
Number of items to retrieve. | ||
columns : list or str | ||
Column name or names to order by | ||
Column name or names to retrieve values from. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think "to order by" is more correct, as otherwise it seems that you only return the values from those columns |
||
keep : {'first', 'last'}, default 'first' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Default value is implied for first element in the possible values, so no need to explicitly say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @WillAyd I think we changed the guide on this (some time last week) .. :) |
||
Where there are duplicate values: | ||
- ``first`` : take the first occurrence. | ||
- ``last`` : take the last occurrence. | ||
- `first` : take the first occurrence; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't need punctuation at the end of either bullet point There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First, I removed them, but the validator complained about the missing full stop. Because of that, I kept this way. If I should remove them and ignore the validation error, let me know. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm OK now I know why I've seen this a few times. @datapythonista @jorisvandenbossche I would suggest that we ignore errors for a lack of punctuation at the end of bulleted lists - any objection to that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we can certainly ignore errors in such cases of bullet points There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After looking at your example do you think it makes more sense to word this as "prioritize the first occurrence(s)"? I assume it's more or less doing that as opposed to a full take right? Put in other words, if |
||
- `last` : take the last occurrence. | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
The `n` largest rows in the DataFrame, sorted by the given columns | ||
in descending order. | ||
|
||
See Also | ||
-------- | ||
DataFrame.nsmallest : Return the `n` smallest rows sorted by given | ||
columns. | ||
|
||
Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add an example to differentiate between "first" and "last" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice suggestion. Done. |
||
-------- | ||
>>> df = pd.DataFrame({'a': [1, 10, 8, 11, -1], | ||
... 'b': list('abdce'), | ||
... 'c': [1.0, 2.0, np.nan, 3.0, 4.0]}) | ||
>>> df.nlargest(3, 'a') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since the DataFrame constructor here is not trivial can we add a line to print the DataFrame as is, giving visual contrast to the examples? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the review. Done. |
||
a b c | ||
3 11 c 3 | ||
1 10 b 2 | ||
2 8 d NaN | ||
a b c | ||
3 11 c 3.0 | ||
1 10 b 2.0 | ||
2 8 d NaN | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add one more example selecting multiple columns? |
||
""" | ||
return algorithms.SelectNFrame(self, | ||
n=n, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't actually sort, can you re-word, we just pick the largest n values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I was not aware of the internals. Done.