Skip to content

DOC: add nlargest/nsmallest to API.rst #10145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RomanPekar opened this issue May 15, 2015 · 5 comments · Fixed by #10206
Closed

DOC: add nlargest/nsmallest to API.rst #10145

RomanPekar opened this issue May 15, 2015 · 5 comments · Fixed by #10206
Labels
Milestone

Comments

@RomanPekar
Copy link
Contributor

At the moment there's no easy way to partially sort the DataFrame to get top N rows, could be useful feature.

There's partial sorting available for numpy within the bottleneck library.

Here's link to SO question about numpy partial sort - http://stackoverflow.com/questions/10337533/a-fast-way-to-find-the-largest-n-elements-in-an-numpy-array

@jreback
Copy link
Contributor

jreback commented May 15, 2015

http://pandas.pydata.org/pandas-docs/stable/basics.html?highlight=nlargest#smallest-largest-values

(this is even faster than partsort as its grabbing the top values)

though these are not mentioned in API.rst. want to do a PR to add them?

@jreback jreback added the Docs label May 15, 2015
@jreback jreback added this to the 0.17.0 milestone May 15, 2015
@jreback jreback changed the title ENH: Partial sorting on the DataFrame, could be useful to get top N rows DOC: add nlargest/nsmallest to API.rst May 15, 2015
@RomanPekar
Copy link
Contributor Author

Ok, but this one is for Series, my issue was about getting top N rows from DataFrame ordered by columns I want, similar to SQL select top (@N) * from <Table> order by <col1> asc, <col2> desc without actually sorting the whole dataset

@jreback
Copy link
Contributor

jreback commented May 15, 2015

df.apply(Series.nlargest)

@RomanPekar
Copy link
Contributor Author

Thanks again, but it still will not work if I want to sort by several columns. As an example - suppose I have this dataset:

    A   B
0   1   5
1   2   5
2   2   4
3   2   3
4   3   2
5   3   1

And I want to take top 4 records ordered by A desc, B desc (rows 5, 4, 1, 2)

@jreback
Copy link
Contributor

jreback commented May 15, 2015

In [42]: df.sort(['A','B'],ascending=[0,0])
Out[42]: 
   A  B
4  3  2
5  3  1
1  2  5
2  2  4
3  2  3
0  1  5

I don't think there is an easy way to do this w/o a full-sort. partsort (and most other algos operate on a 1-d array).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants