Skip to content

BUG: nlargest/nsmallest can now consider nan values like sort_values(ascending=True).head(n) #43060

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 7, 2021

Conversation

usersblock
Copy link
Contributor

@usersblock usersblock commented Aug 15, 2021

nsmallest/nlargest are now consistent with x.sort_values(ascending=False).head(n)/x.sort_values(ascending=True).head(n)
Edit: I've changed test_nlargest_misc(self) in \tests\serie\methods\test_nlargest.py to reflect the fact that Nans are included

@pep8speaks
Copy link

pep8speaks commented Aug 15, 2021

Hello @usersblock! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-07 07:41:45 UTC

@usersblock usersblock closed this Aug 16, 2021
@usersblock usersblock reopened this Aug 16, 2021
@simonjayhawkins
Copy link
Member

@usersblock i've closed #42997 as a duplicate of #28984, can you update the references to the issue also include the code sample from #28984 as a test.

@simonjayhawkins simonjayhawkins added the Numeric Operations Arithmetic, Comparison, and Logical operations label Aug 16, 2021
@usersblock
Copy link
Contributor Author

Updated reference and added new test to test_nlargest.py in frames\methods

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@usersblock can you add a release note to doc/source/whatsnew/v1.4.0.rst

tm.assert_series_equal(ser.nlargest(), ser.iloc[[4, 0, 3, 2]])
tm.assert_series_equal(ser.nsmallest(), ser.iloc[[2, 3, 0, 4]])
tm.assert_series_equal(ser.nlargest(), ser.iloc[[4, 0, 3, 2, 1]])
tm.assert_series_equal(ser.nsmallest(), ser.iloc[[2, 3, 0, 4, 1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rewrite this test using result= and expected= while making changes in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added result/expected and added entry into the doc

@simonjayhawkins simonjayhawkins added this to the 1.4 milestone Aug 25, 2021
@jreback
Copy link
Contributor

jreback commented Aug 31, 2021

ok can you add a replica of the test in the OP for df.groupby('grp').apply(lambda x: x.sort_values('z', ascending=False).head(1))

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree that this should be behind a dropna=True (for back compat) parameter. can you add this ?


if self.keep == "last":
# reverse indices
inds = narr - 1 - inds

return dropped.iloc[inds]
return concat([dropped.iloc[inds], nan_index])[:findex]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont' you need to use .iloc here? as this will not be a positional lookup (it might work with a range index though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now

@jreback jreback merged commit 16d2f59 into pandas-dev:master Sep 7, 2021
@jreback
Copy link
Contributor

jreback commented Sep 7, 2021

thanks @usersblock very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nlargest gives a zero-row dataframe when ordering columns are all NaN
5 participants