-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH nlargest and nsmallest Series methods #5534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
needs tests on other dtypes |
check all! float, int, datetime64 (use i8), timedelts64 (same), object that are numbers, object that are strings some u may want to simple raise as notimplemented |
And make sure to include things with nan so we specify that behavior. |
@hayd, good reason, though I still prefer |
@@ -1821,6 +1822,88 @@ def _try_kind_sort(arr): | |||
return self._constructor(arr[sortedIdx], index=self.index[sortedIdx])\ | |||
.__finalize__(self) | |||
|
|||
def nlargest(self, n=5, take_last=False, sort=True): | |||
''' | |||
Returns the largest n rows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detail, but colon can just be a full stop
Added two small detail comments on the docs. Two other things:
|
Should be using quick select algorithm ? (Julia does this fast.. at least with one dim array... in a more general way.) |
Numerical recipes tend to reincarnate from langauge to language, all the way back The cython code you're relying on from algos.pyx probably originated here: and from a quick look at the graph there, it seems wirth and quickselect have similar perf |
moving for 0.13.1 @hayd ok? |
@hayd ready to merge? |
@@ -60,6 +60,7 @@ New features | |||
- Clipboard functionality now works with PySide (:issue:`4282`) | |||
- New ``extract`` string method returns regex matches more conveniently | |||
(:issue:`4685`) | |||
- Add nsmallest and nlargest Series methods (:issue:`3960`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u move these notes to 0.13.1..thxs
There were some comments on docs that would ideally be handled before merging. |
@hayd can you address the comments, otherwise looks ok |
@jreback I have a few issues assigned I think. Will do in the next few days. I had tried implementing quick select to compare perf. IIRC The issue with the above is it's significantly slower than sorting and taking head for "small" (and even quite big) Series. The more conditions (type checks) the slower it was getting. Perhaps a solution is to check length before doing it and if "small" then sort and head :s |
@hayd absolutely |
@hayd hows this going? |
@hayd see if you can do this soon...thxs |
ok will do right this minute. |
@jreback to put these in vbench, should I create a series_methods file? (Not sure where to put them). Basically atm slightly slower if lots of values are the kth. |
i think their is an algos or something's? u can either create series_methiss or put near rank |
methods |
@hayd bump to 0.14? |
@jreback Will finish this off this evening, it's passing tests but want to do a quick refactor. If it gets left behind in 0.13.1 no biggy... |
Gave it a refactor, and put the numpy versions into util. Suspect can be tweaked faster. @jreback I tagged 0.14 but could change. Worried as kth_largest can cause stack overflow if called incorrectly (though it's only called with i8, int and float without NaNs so should be ok). Perhaps should sit in master for a while just in case ? |
@hayd look ok to me can merge or wait to 0.14, up 2 u pprob want to add an example in docs - maybe in min/max section |
Would prefer to wait tbh, would like to kick it in master for a bit. |
sure |
@hayd looks ready to go
|
There were some doc comments above from me that still should be adressed I think |
@hayd can you rebase and address the comments....this should go in soon...thanks |
@hayd would merge this soon |
@jreback Sketched some time next week to fix up the several PRs I have in play atm. Thanks for pinging! |
ping! |
ping |
@hayd ping....need to get this in ASAP |
ping! |
@hayd looks ok....let's get this in ASAP |
@hayd ping! |
@hayd can you rebase! didn't we discuss calling these |
ping! |
@cpcloud I want to get this in, but needs a rebase / maybe test fixes, can you give a shot? |
Sure no prob. Will be home in a bout an hour will give it a whirlygig then. |
bump to 0.14.1, but @cpcloud if you are able to workon this would be great |
I've got a pr coming for this today (on a plane to NYC right now). Mostly just a bit of clean up and dealing with the object dtype and special cases of n <= 0 or n >= len(series). Might need a few more test cases |
is there any consensus on how to deal with unorderable types? e.g., disallow ... punt to numpy? python 3 disallows str int comparisons ... @jreback thoughts when u get a chance? |
you could do something like this: https://github.com/pydata/pandas/blob/master/pandas/core/algorithms.py#L141 but I would just raise (e.g. if its a object dtype and you have mixed str/number) |
closing ... submitting a new pr based on this one |
closes #3960.
Implemented using kth_largest, could cython using better algo later.