Skip to content

DOC: Improve the docstring of DataFrame.nlargest #20255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 17, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 20 additions & 10 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3835,34 +3835,44 @@ def sortlevel(self, level=0, axis=0, ascending=True, inplace=False,
inplace=inplace, sort_remaining=sort_remaining)

def nlargest(self, n, columns, keep='first'):
"""Get the rows of a DataFrame sorted by the `n` largest
values of `columns`.
"""
Return the `n` largest rows sorted by `columns`.

Sort the DataFrame by `columns` in descending order and return the top
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't actually sort, can you re-word, we just pick the largest n values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I was not aware of the internals. Done.

`n` rows.

Parameters
----------
n : int
Number of items to retrieve
Number of items to retrieve.
columns : list or str
Column name or names to order by
Column name or names to retrieve values from.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "to order by" is more correct, as otherwise it seems that you only return the values from those columns

keep : {'first', 'last'}, default 'first'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default value is implied for first element in the possible values, so no need to explicitly say default 'first' again

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd I think we changed the guide on this (some time last week) .. :)
For explicitness, I think it is good to keep the " default 'first' ". A new reader cannot know the rule of "default is fist value in the set of options"

Where there are duplicate values:
- ``first`` : take the first occurrence.
- ``last`` : take the last occurrence.
- `first` : take the first occurrence;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need punctuation at the end of either bullet point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I removed them, but the validator complained about the missing full stop. Because of that, I kept this way. If I should remove them and ignore the validation error, let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK now I know why I've seen this a few times. @datapythonista @jorisvandenbossche I would suggest that we ignore errors for a lack of punctuation at the end of bulleted lists - any objection to that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can certainly ignore errors in such cases of bullet points

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at your example do you think it makes more sense to word this as "prioritize the first occurrence(s)"? I assume it's more or less doing that as opposed to a full take right?

Put in other words, if n is 2 and you have three duplicate values it wouldn't return all three but rather prioritize the 2 that come first, right?

- `last` : take the last occurrence.

Returns
-------
DataFrame
The `n` largest rows in the DataFrame, sorted by the given columns
in descending order.

See Also
--------
DataFrame.nsmallest : Return the `n` smallest rows sorted by given
columns.

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example to differentiate between "first" and "last" keep parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion. Done.

--------
>>> df = pd.DataFrame({'a': [1, 10, 8, 11, -1],
... 'b': list('abdce'),
... 'c': [1.0, 2.0, np.nan, 3.0, 4.0]})
>>> df.nlargest(3, 'a')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the DataFrame constructor here is not trivial can we add a line to print the DataFrame as is, giving visual contrast to the examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Done.

a b c
3 11 c 3
1 10 b 2
2 8 d NaN
a b c
3 11 c 3.0
1 10 b 2.0
2 8 d NaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add one more example selecting multiple columns?

"""
return algorithms.SelectNFrame(self,
n=n,
Expand Down