Skip to content

DOC: Clarify rank documentation #40016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 3, 2021
Merged

DOC: Clarify rank documentation #40016

merged 4 commits into from
Jun 3, 2021

Conversation

ptype
Copy link
Contributor

@ptype ptype commented Feb 24, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Example of top/bottom behaviour vs ascending:

x = np.array([np.nan, 3.1, 1.4, 2.1, np.nan, 5, 0.5])
pd.Series(x).rank(ascending=False, na_option='bottom', method='min') ->
0    6.0
1    2.0
2    4.0
3    3.0
4    6.0
5    1.0
6    5.0
dtype: float64

pd.Series(x).rank(ascending=True, na_option='bottom', method='min') ->
0    6.0
1    4.0
2    2.0
3    3.0
4    6.0
5    5.0
6    1.0
dtype: float64

pd.Series(x).rank(ascending=True, na_option='top', method='min') ->
0    1.0
1    6.0
2    4.0
3    5.0
4    1.0
5    7.0
6    3.0
dtype: float64

pd.Series(x).rank(ascending=False, na_option='top', method='min') ->
0    1.0
1    6.0
2    4.0
3    5.0
4    1.0
5    7.0
6    3.0
dtype: float64

smallest -> lowest
Top/bottom give the lowest/highest rank, regardless of if ascending is True or False
@lithomas1 lithomas1 added the Docs label Feb 24, 2021
@attack68
Copy link
Contributor

although irrelevant, the last example is wrong, actually gives:

0    1.0
1    4.0
2    6.0
3    5.0
4    1.0
5    3.0
6    7.0
dtype: float64

but this lgtm, na_option is not impacted by ascending and it's strange wording in the docs.

@ptype
Copy link
Contributor Author

ptype commented Feb 24, 2021

Sorry @attack68, yes you are right, I copy pasted the wrong one. Thanks.

@WillAyd
Copy link
Member

WillAyd commented Feb 26, 2021

I think this is a bug rather than an issue with the documentation, no?

@ptype
Copy link
Contributor Author

ptype commented Feb 26, 2021

First I thought that as well, but the behaviour actually makes sense to me in the sense that a ranking goes from 1 (lowest/top) to n (highest/bottom), and na_option determines if you want your nans at the top or bottom of the ranking (regardless of ascending).

@attack68
Copy link
Contributor

First I thought that as well, but the behaviour actually makes sense to me in the sense that a ranking goes from 1 (lowest/top) to n (highest/bottom), and na_option determines if you want your nans at the top or bottom of the ranking (regardless of ascending).

@WillAyd I think the issue is with semantics. If you rank values either 'ascending' or 'descending' and then insert an 'NaN' either at 'top' or 'bottom', does 'top' refer to assigning the highest 'value' to 'NaN' in which case 'ascending' places it with the highest rank and 'descending' the lowest? Or does it place the 'NaN' at the 'top' of the rankings (i.e. lowest rank) or the 'bottom'.

I think the use of 'top' and 'bottom' is terrible. I prefer the latter approach but would think 'na_rep' = {'lowest_rank', 'highest_rank'} is clearer and this is essentially what it is doing for {'top', 'bottom'}.

@ptype
Copy link
Contributor Author

ptype commented Mar 15, 2021

@attack68 Personally I think this can be solved with clearer documentation as it's all about terminology, rather than changing the API. I think it needs to be explained as in #40016 (comment). I'm not a maintainer so up to you guys. Happy to create PR on that though if that is the route you want to go down.

@attack68
Copy link
Contributor

@ptype, sure but is top: assign the top rank (i.e. lowest value) to NaN values a bit clearer why top is used and what it corresponds to?

@ptype
Copy link
Contributor Author

ptype commented Mar 17, 2021

@ptype, sure but is top: assign the top rank (i.e. lowest value) to NaN values a bit clearer why top is used and what it corresponds to?

@attack68 yes, I think so

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 17, 2021
@ptype
Copy link
Contributor Author

ptype commented Apr 20, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

I think it should still be considered for inclusion

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this documentation is clearer, if not ideal. LGTM

@datapythonista
Copy link
Member

@ptype can you merge master, and make sure the CI is green, so we can merge, please? Ping me once it's green, thanks!

@ptype
Copy link
Contributor Author

ptype commented Jun 3, 2021

@datapythonista thanks, I've fetched from master and reran CIs, all green

@datapythonista datapythonista changed the title Rank documentation DOC: Clarify rank documentation Jun 3, 2021
@datapythonista datapythonista merged commit 8f4d591 into pandas-dev:master Jun 3, 2021
@datapythonista
Copy link
Member

Thanks @ptype

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants