ENH: add nlargest nsmallest to Series #7113

cpcloud · 2014-05-13T14:29:25Z

cpcloud · 2014-05-13T14:31:38Z

pandas/core/series.py

@@ -1762,7 +1770,17 @@ def _try_kind_sort(arr):
        good = ~bad
        idx = pa.arange(len(self))

-        argsorted = _try_kind_sort(arr[good])
+        def _try_kind_sort(arr, kind='mergesort'):


crap this is screwed up ...i chose the wrong thing in the merge conflict

jreback · 2014-05-13T14:39:11Z

doc/source/v0.13.1.txt

@@ -128,6 +128,7 @@ API changes
      import pandas.core.common as com
      com.array_equivalent(np.array([0, np.nan]), np.array([0, np.nan]))
      np.array_equal(np.array([0, np.nan]), np.array([0, np.nan]))
+- Add nsmallest and nlargest Series methods (:issue:`3960`)


move this to 0.14.0

jreback · 2014-05-13T14:41:41Z

and pls do a vbench to verify these don't really change anything else

I think @hayd had an optimization for these methods somewhere, but push that off to another version (and they may be fine anyhow)

cpcloud · 2014-05-13T14:42:10Z

sure thing

cpcloud · 2014-05-13T16:34:04Z

there was discussion of topk in #3960 , but nlargest and nsmallest are consistent with the heapq module. i think it would be nice to keep that consistency

jreback · 2014-05-13T16:36:13Z

that's cool..fine with the names

cpcloud · 2014-05-13T17:08:07Z

vbench

eval_frame_mult_python                       |  13.9356 |  12.7607 |   1.0921 |
query_store_table                            |   5.0490 |   4.5670 |   1.1055 |
frame_ctor_dtindex_BusinessDayx1             |   1.3620 |   1.2143 |   1.1216 |
reindex_frame_level_reindex                  |   0.7103 |   0.6084 |   1.1676 |
frame_add_no_ne                              |   4.6510 |   3.9537 |   1.1764 |
timeseries_large_lookup_value                |   0.0287 |   0.0243 |   1.1797 |
concat_empty_frames2                         |   1.0597 |   0.7667 |   1.3822 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

i was running some other things at the same time ... @jreback are these okay?

jreback · 2014-05-13T17:09:50Z

yep

jorisvandenbossche · 2014-05-13T18:30:32Z

pandas/core/series.py

+        Parameters
+        ----------
+        n : int, optional, default: ``5``
+        take_last : bool, optional, default: ``False``


Can you add an explanation of the take_last parameter?

Also please follow the format:

kwarg : type Explanation.

yep wasn't finished with docstrings yet, thx for pointing this out

cpcloud · 2014-05-14T05:02:47Z

@jreback @jorisvandenbossche any more comments? will squash and then can merge

jreback · 2014-05-14T05:07:28Z

g2g

jreback · 2014-05-14T18:33:32Z

@cpcloud pls squash down a bit and looks good 2 go

jreback · 2014-05-14T18:35:19Z

maybe a small doc mention here? http://pandas-docs.github.io/pandas-docs-travis/basics.html#sorting-by-index-and-value

cpcloud · 2014-05-14T21:20:02Z

@jreback doc and squish ok?

jreback · 2014-05-14T21:21:12Z

yep go 4 it

ENH: add nlargest nsmallest to Series

jreback · 2014-05-14T22:56:09Z

@cpcloud

seems some of the complex dtypes don't exist on some arch's (windows even though this is 2.7-64 bit). so need to test that numpy can create before testing that they raise.

(Pdb) u
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\series.py(152)__init__()
-> dtype = self._validate_dtype(dtype)
(Pdb) l
147                     index = _ensure_index(index)
148
149                 if data is None:
150                     data = {}
151                 if dtype is not None:
152  ->                 dtype = self._validate_dtype(dtype)
153
154                 if isinstance(data, MultiIndex):
155                     raise NotImplementedError
156                 elif isinstance(data, Index):
157                     # need to copy to avoid aliasing issues
(Pdb) u
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\tests\test_series.py(4025)test_nsmallest_nlargest()
-> Series([3., 2, 1, 2, 5], dtype='complex256'),
(Pdb) l
4020            ]
4021
4022            raising = [
4023                Series([3., 2, 1, 2, '5'], dtype='object'),
4024                Series([3., 2, 1, 2, 5], dtype='object'),
4025 ->             Series([3., 2, 1, 2, 5], dtype='complex256'),
4026                Series([3., 2, 1, 2, 5], dtype='complex128'),
4027            ]
4028
4029            for r in raising:
4030                dt = r.dtype
(Pdb) p np.dtype('complex128')
dtype('complex128')
(Pdb) p np.dtype('complex256')
*** TypeError: TypeError('data type "complex256" not understood',)
(Pdb)

jreback · 2014-05-14T22:56:57Z

or could just totally skip the complex256 I think (easier)

jreback · 2014-05-14T23:02:22Z

easy enough just to take it out: 1533480

cpcloud · 2014-05-14T23:05:40Z

ok sorry about this ... i'll take it out since 128 is enough... really just there to test that the dtype check in select_n fails for the types that cannot be "sanely" sorted

cpcloud · 2014-05-14T23:07:10Z

oh i see... you've already done it

cpcloud · 2014-05-14T23:07:18Z

thanks!

hayd · 2014-05-28T02:51:45Z

@cpcloud thanks for picking up the ball here!

cpcloud · 2014-05-28T03:05:21Z

@hayd absolutely np!

hayd · 2014-07-22T18:13:46Z

ha! graphlab call this topk.

cpcloud reviewed May 13, 2014
View reviewed changes

jreback added API Design labels May 13, 2014

jreback modified the milestones: 0.14.1, 0.14.0 May 13, 2014

jreback reviewed May 13, 2014
View reviewed changes

jorisvandenbossche reviewed May 13, 2014
View reviewed changes

cpcloud self-assigned this May 14, 2014

hayd and others added 3 commits May 14, 2014 16:28

ENH nlargest and nsmallest Series methods

1d9bc57

DOC/REF: add docstrings and DRY it up

a909e16

DOC: doc blurb in basics.rst

6673705

cpcloud added a commit that referenced this pull request May 14, 2014

Merge pull request #7113 from cpcloud/hayd-kth_smallest

fcec82e

ENH: add nlargest nsmallest to Series

cpcloud merged commit fcec82e into pandas-dev:master May 14, 2014

cpcloud deleted the hayd-kth_smallest branch May 14, 2014 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add nlargest nsmallest to Series #7113

ENH: add nlargest nsmallest to Series #7113

cpcloud commented May 13, 2014

cpcloud May 13, 2014

jreback May 13, 2014

cpcloud May 13, 2014

jreback commented May 13, 2014

cpcloud commented May 13, 2014

cpcloud commented May 13, 2014

jreback commented May 13, 2014

cpcloud commented May 13, 2014

jreback commented May 13, 2014

jorisvandenbossche May 13, 2014

cpcloud May 13, 2014

cpcloud commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

cpcloud commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

cpcloud commented May 14, 2014

cpcloud commented May 14, 2014

cpcloud commented May 14, 2014

hayd commented May 28, 2014

cpcloud commented May 28, 2014

hayd commented Jul 22, 2014

ENH: add nlargest nsmallest to Series #7113

ENH: add nlargest nsmallest to Series #7113

Conversation

cpcloud commented May 13, 2014

cpcloud May 13, 2014

Choose a reason for hiding this comment

jreback May 13, 2014

Choose a reason for hiding this comment

cpcloud May 13, 2014

Choose a reason for hiding this comment

jreback commented May 13, 2014

cpcloud commented May 13, 2014

cpcloud commented May 13, 2014

jreback commented May 13, 2014

cpcloud commented May 13, 2014

jreback commented May 13, 2014

jorisvandenbossche May 13, 2014

Choose a reason for hiding this comment

cpcloud May 13, 2014

Choose a reason for hiding this comment

cpcloud commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

cpcloud commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

jreback commented May 14, 2014

cpcloud commented May 14, 2014

cpcloud commented May 14, 2014

cpcloud commented May 14, 2014

hayd commented May 28, 2014

cpcloud commented May 28, 2014

hayd commented Jul 22, 2014