Skip to content

ENH: add nlargest nsmallest to Series #7113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 14, 2014
Merged

ENH: add nlargest nsmallest to Series #7113

merged 3 commits into from
May 14, 2014

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented May 13, 2014

closes #3960
xref: #5534

@@ -1762,7 +1770,17 @@ def _try_kind_sort(arr):
good = ~bad
idx = pa.arange(len(self))

argsorted = _try_kind_sort(arr[good])
def _try_kind_sort(arr, kind='mergesort'):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crap this is screwed up ...i chose the wrong thing in the merge conflict

@jreback jreback modified the milestones: 0.14.1, 0.14.0 May 13, 2014
@@ -128,6 +128,7 @@ API changes
import pandas.core.common as com
com.array_equivalent(np.array([0, np.nan]), np.array([0, np.nan]))
np.array_equal(np.array([0, np.nan]), np.array([0, np.nan]))
- Add nsmallest and nlargest Series methods (:issue:`3960`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to 0.14.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep thanks

@jreback
Copy link
Contributor

jreback commented May 13, 2014

and pls do a vbench to verify these don't really change anything else

I think @hayd had an optimization for these methods somewhere, but push that off to another version (and they may be fine anyhow)

@cpcloud
Copy link
Member Author

cpcloud commented May 13, 2014

sure thing

@cpcloud
Copy link
Member Author

cpcloud commented May 13, 2014

there was discussion of topk in #3960 , but nlargest and nsmallest are consistent with the heapq module. i think it would be nice to keep that consistency

@jreback
Copy link
Contributor

jreback commented May 13, 2014

that's cool..fine with the names

@cpcloud
Copy link
Member Author

cpcloud commented May 13, 2014

vbench

eval_frame_mult_python                       |  13.9356 |  12.7607 |   1.0921 |
query_store_table                            |   5.0490 |   4.5670 |   1.1055 |
frame_ctor_dtindex_BusinessDayx1             |   1.3620 |   1.2143 |   1.1216 |
reindex_frame_level_reindex                  |   0.7103 |   0.6084 |   1.1676 |
frame_add_no_ne                              |   4.6510 |   3.9537 |   1.1764 |
timeseries_large_lookup_value                |   0.0287 |   0.0243 |   1.1797 |
concat_empty_frames2                         |   1.0597 |   0.7667 |   1.3822 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

i was running some other things at the same time ... @jreback are these okay?

@jreback
Copy link
Contributor

jreback commented May 13, 2014

yep

Parameters
----------
n : int, optional, default: ``5``
take_last : bool, optional, default: ``False``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an explanation of the take_last parameter?

Also please follow the format:

kwarg : type
    Explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep wasn't finished with docstrings yet, thx for pointing this out

@cpcloud cpcloud self-assigned this May 14, 2014
@cpcloud
Copy link
Member Author

cpcloud commented May 14, 2014

@jreback @jorisvandenbossche any more comments? will squash and then can merge

@jreback
Copy link
Contributor

jreback commented May 14, 2014

g2g

@jreback
Copy link
Contributor

jreback commented May 14, 2014

@cpcloud pls squash down a bit and looks good 2 go

@jreback
Copy link
Contributor

jreback commented May 14, 2014

@cpcloud
Copy link
Member Author

cpcloud commented May 14, 2014

@jreback doc and squish ok?

@jreback
Copy link
Contributor

jreback commented May 14, 2014

yep go 4 it

cpcloud added a commit that referenced this pull request May 14, 2014
ENH: add nlargest nsmallest to Series
@cpcloud cpcloud merged commit fcec82e into pandas-dev:master May 14, 2014
@cpcloud cpcloud deleted the hayd-kth_smallest branch May 14, 2014 21:21
@jreback
Copy link
Contributor

jreback commented May 14, 2014

@cpcloud

seems some of the complex dtypes don't exist on some arch's (windows even though this is 2.7-64 bit). so need to test that numpy can create before testing that they raise.

(Pdb) u
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\core\series.py(152)__init__()
-> dtype = self._validate_dtype(dtype)
(Pdb) l
147                     index = _ensure_index(index)
148
149                 if data is None:
150                     data = {}
151                 if dtype is not None:
152  ->                 dtype = self._validate_dtype(dtype)
153
154                 if isinstance(data, MultiIndex):
155                     raise NotImplementedError
156                 elif isinstance(data, Index):
157                     # need to copy to avoid aliasing issues
(Pdb) u
> c:\users\jeff reback\documents\github\pandas\build\lib.win-amd64-2.7\pandas\tests\test_series.py(4025)test_nsmallest_nlargest()
-> Series([3., 2, 1, 2, 5], dtype='complex256'),
(Pdb) l
4020            ]
4021
4022            raising = [
4023                Series([3., 2, 1, 2, '5'], dtype='object'),
4024                Series([3., 2, 1, 2, 5], dtype='object'),
4025 ->             Series([3., 2, 1, 2, 5], dtype='complex256'),
4026                Series([3., 2, 1, 2, 5], dtype='complex128'),
4027            ]
4028
4029            for r in raising:
4030                dt = r.dtype
(Pdb) p np.dtype('complex128')
dtype('complex128')
(Pdb) p np.dtype('complex256')
*** TypeError: TypeError('data type "complex256" not understood',)
(Pdb)

@jreback
Copy link
Contributor

jreback commented May 14, 2014

or could just totally skip the complex256 I think (easier)

@jreback
Copy link
Contributor

jreback commented May 14, 2014

easy enough just to take it out: 1533480

@cpcloud
Copy link
Member Author

cpcloud commented May 14, 2014

ok sorry about this ... i'll take it out since 128 is enough... really just there to test that the dtype check in select_n fails for the types that cannot be "sanely" sorted

@cpcloud
Copy link
Member Author

cpcloud commented May 14, 2014

oh i see... you've already done it

@cpcloud
Copy link
Member Author

cpcloud commented May 14, 2014

thanks!

@hayd
Copy link
Contributor

hayd commented May 28, 2014

@cpcloud thanks for picking up the ball here!

@cpcloud
Copy link
Member Author

cpcloud commented May 28, 2014

@hayd absolutely np!

@hayd
Copy link
Contributor

hayd commented Jul 22, 2014

ha! graphlab call this topk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: nlargest for DataFrame
4 participants