Skip to content

pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work #38972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
UlionTse opened this issue Jan 5, 2021 · 11 comments · Fixed by #42402
Closed

pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work #38972

UlionTse opened this issue Jan 5, 2021 · 11 comments · Fixed by #42402
Assignees
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Docs good first issue
Milestone

Comments

@UlionTse
Copy link

UlionTse commented Jan 5, 2021

dense_rank dose not work

@mzeitlin11
Copy link
Member

Hi @UlionTse, thanks for your report. Can you please provide a minimal reproducible example (copy-pastable!). See https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports. Especially useful for diagnosing your issue would be the expected output.

@mzeitlin11 mzeitlin11 added the Needs Info Clarification about behavior needed to assess issue label Jan 5, 2021
@UlionTse
Copy link
Author

UlionTse commented Jan 6, 2021

@mzeitlin11
Input:

import pandas as pd

print(pd.__version__)
df = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3], 'b':[6,5,4,4,6,5,3,3,3]})
df['rk_min']= df.groupby(by=['a'])['b'].rank(ascending=True, method='min', na_option='bottom')
df['rk_dense']= df.groupby(by=['a'])['b'].rank(ascending=True, method='dense', na_option='bottom')
print(df)

Output:

'1.2.0'

   a  b  rk_min  rk_dense
0  1  6     3.0       3.0
1  1  5     2.0       2.0
2  1  4     1.0       1.0
3  2  4     1.0       1.0
4  2  6     3.0       3.0
5  2  5     2.0       2.0
6  3  3     1.0       1.0
7  3  3     1.0       1.0
8  3  3     1.0       1.0

Expected:

   a  b  rk_min  rk_dense
0  1  6     3.0       3.0
1  1  5     2.0       2.0
2  1  4     1.0       1.0
3  2  4     1.0       1.0
4  2  6     3.0       3.0
5  2  5     2.0       2.0
6  3  3     1.0       1.0
7  3  3     1.0       2.0
8  3  3     1.0       3.0

'''
Parameters
method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

min: lowest rank in group.

dense: like ‘min’, but rank always increases by 1 between groups.
'''

@UlionTse UlionTse changed the title pandas.core.groupby.DataFrameGroupBy.rank() dense_rank dose not work pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work Jan 6, 2021
@mzeitlin11
Copy link
Member

Thanks for the response @UlionTse. I believe this is expected behavior - rank always increasing by 1 between groups is referring to the following difference between 'dense' and 'min' (using an example with Series, but is same for GroupBy)

df = pd.DataFrame({'data':[7, 7, 7, 8, 8, 8]})
df['rk_min'] = df["data"].rank(method="min")
df['rk_dense'] = df["data"].rank(method="dense")
print(df)
   data  rk_min  rk_dense
0     7     1.0       1.0
1     7     1.0       1.0
2     7     1.0       1.0
3     8     4.0       2.0
4     8     4.0       2.0
5     8     4.0       2.0

'dense' ranking increases from 1 to 2 on values change (increase by 1 between "groups"), rather than 'min' which increases from 1 to 4 since there are 3 smaller values.

I think the description for 'dense' ranking could certainly be clearer, however, and is especially unfortunate for the GroupBy case because of use of the word group (in a way that doesn't refer to groups in the GroupBy). I think an example like the above would be helpful to clarify the difference between "min" and "dense" ranking, and the description of dense could be probably be reworded as well.

@mzeitlin11 mzeitlin11 added Docs Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed Needs Info Clarification about behavior needed to assess issue labels Jan 6, 2021
@UlionTse
Copy link
Author

UlionTse commented Jan 8, 2021

@mzeitlin11 Thanks. However, the dense rank after grouping is not often used. In most cases, sql: row_num() over(partition by ... order by ...) is used. (pandas: df['rank'] = df.sort_values(by=[...]).groupby(by=[...]).cumcount()+1)
So I hope you add this function to pandas.core.groupby.DataFrameGroupBy.rank().

@mzeitlin11
Copy link
Member

Does rank with method='first' handle what you're requesting here?

@UlionTse
Copy link
Author

@mzeitlin11 OK. Thanks.

@debnathshoham
Copy link
Member

Hi - is this issue still open?

@mzeitlin11
Copy link
Member

Yep! If you'd be interested in addressing it, the idea is to just use a clearer description for what dense ranking means. The best solution might be to just add some examples. I think we should add dataframe/series rank to the see also section as well since that contains some useful material.

@debnathshoham
Copy link
Member

Hi @mzeitlin11 , I would like to work in this. I am new to open source contribution, and might require some guidance.

@debnathshoham
Copy link
Member

take

@mzeitlin11
Copy link
Member

Sounds great, feel free to ask any questions!

debnathshoham added a commit to debnathshoham/pandas that referenced this issue Jul 6, 2021
debnathshoham added a commit to debnathshoham/pandas that referenced this issue Jul 6, 2021
@mroeschke mroeschke added this to the 1.4 milestone Jul 12, 2021
mroeschke pushed a commit that referenced this issue Jul 12, 2021
* DOC: Adding examples to DataFrameGroupBy.rank #38972

* DOC: Adding examples to DataFrameGroupBy.rank #38972

* DOC: made the suggested changes

* DOC: changed as suggested

* DOC: changed as suggested

* DOC: updating with black output

* DOC: updating with black output

* DOC: corrected docstring order
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Docs good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants