-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Make parameter keep=False keep duplicates for nlargest/nsmallest #16818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@gfyoung This doesn't make sense to me. The Also, the edit: A better value for the parameter would be |
@tdpetrou : True that there was no discussion to this, but a simple doc-change like #18559 went in without objections from anyone. Everyone (including yourself) could have objected if need be, though to be fair, unless you're tracking The good news is that we haven't released this yet, so we do have time to implement a "third" option for |
@gfyoung Thanks for the response. I proposed a simple solution to this in #18656. It was very easy and basically already done. I think the original reason for using Regardless, it just makes a lot of sense (to me) that you would want to keep the nlargest/nsmallest plus ties and the fix is easy and already complete. |
Your case for using Your implementation is indeed simple but the surfacing of it will require discussion, should other maintainers believe that we should actually have such an option. |
I'm not in favor of |
The option to keep all ties returned from nlargest would be wonderful. Right now I have to use this function in a groupby to find the top two counts for each group, then go back and find all rows that match the top two scores. It would be faster and more efficient to just have the function return all rows that match the top n largest/smallest values and I'm not sure why this feature was removed. Pretty please can you put it back? :) |
@summerela the feature as described here was implemented in 0.24. If that's not what you are looking for better to open a new issue at this point |
I'm running Dask v 1.1.1 and the only allowed arguments for keep are "first" and "last". |
OK if you can open up a new issue with code sample including output of pd.show_versions someone can take a look |
Can do. Thank you! |
it would be useful also to have a keep option of "none"! s = pd.Series([10,9,8,7,7,7,6]) |
Code Sample, a copy-pastable example if possible
Problem description
The docstrings list
False
as one of the possible argument values forkeep
. pandas raises aValueError
when attempting to use this parameter.Expected Output
It would be nice to have nlargest work like this.
Output of
pd.show_versions()
pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.0
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: 0.3.0.post
The text was updated successfully, but these errors were encountered: