pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work #38972

UlionTse · 2021-01-05T14:00:59Z

mzeitlin11 · 2021-01-05T15:53:42Z

Hi @UlionTse, thanks for your report. Can you please provide a minimal reproducible example (copy-pastable!). See https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports. Especially useful for diagnosing your issue would be the expected output.

UlionTse · 2021-01-06T07:17:41Z

@mzeitlin11
Input:

import pandas as pd

print(pd.__version__)
df = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3], 'b':[6,5,4,4,6,5,3,3,3]})
df['rk_min']= df.groupby(by=['a'])['b'].rank(ascending=True, method='min', na_option='bottom')
df['rk_dense']= df.groupby(by=['a'])['b'].rank(ascending=True, method='dense', na_option='bottom')
print(df)

Output:

'1.2.0'

   a  b  rk_min  rk_dense
0  1  6     3.0       3.0
1  1  5     2.0       2.0
2  1  4     1.0       1.0
3  2  4     1.0       1.0
4  2  6     3.0       3.0
5  2  5     2.0       2.0
6  3  3     1.0       1.0
7  3  3     1.0       1.0
8  3  3     1.0       1.0

Expected:

   a  b  rk_min  rk_dense
0  1  6     3.0       3.0
1  1  5     2.0       2.0
2  1  4     1.0       1.0
3  2  4     1.0       1.0
4  2  6     3.0       3.0
5  2  5     2.0       2.0
6  3  3     1.0       1.0
7  3  3     1.0       2.0
8  3  3     1.0       3.0

'''
Parameters
method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

min: lowest rank in group.

dense: like ‘min’, but rank always increases by 1 between groups.
'''

mzeitlin11 · 2021-01-06T19:57:09Z

Thanks for the response @UlionTse. I believe this is expected behavior - rank always increasing by 1 between groups is referring to the following difference between 'dense' and 'min' (using an example with Series, but is same for GroupBy)

df = pd.DataFrame({'data':[7, 7, 7, 8, 8, 8]})
df['rk_min'] = df["data"].rank(method="min")
df['rk_dense'] = df["data"].rank(method="dense")
print(df)

   data  rk_min  rk_dense
0     7     1.0       1.0
1     7     1.0       1.0
2     7     1.0       1.0
3     8     4.0       2.0
4     8     4.0       2.0
5     8     4.0       2.0

'dense' ranking increases from 1 to 2 on values change (increase by 1 between "groups"), rather than 'min' which increases from 1 to 4 since there are 3 smaller values.

I think the description for 'dense' ranking could certainly be clearer, however, and is especially unfortunate for the GroupBy case because of use of the word group (in a way that doesn't refer to groups in the GroupBy). I think an example like the above would be helpful to clarify the difference between "min" and "dense" ranking, and the description of dense could be probably be reworded as well.

UlionTse · 2021-01-08T04:04:27Z

@mzeitlin11 Thanks. However, the dense rank after grouping is not often used. In most cases, sql: row_num() over(partition by ... order by ...) is used. (pandas: df['rank'] = df.sort_values(by=[...]).groupby(by=[...]).cumcount()+1)
So I hope you add this function to pandas.core.groupby.DataFrameGroupBy.rank().

mzeitlin11 · 2021-01-08T18:06:44Z

Does rank with method='first' handle what you're requesting here?

UlionTse · 2021-01-11T02:57:30Z

@mzeitlin11 OK. Thanks.

debnathshoham · 2021-07-05T15:26:21Z

Hi - is this issue still open?

mzeitlin11 · 2021-07-05T16:20:30Z

Yep! If you'd be interested in addressing it, the idea is to just use a clearer description for what dense ranking means. The best solution might be to just add some examples. I think we should add dataframe/series rank to the see also section as well since that contains some useful material.

debnathshoham · 2021-07-05T19:15:06Z

Hi @mzeitlin11 , I would like to work in this. I am new to open source contribution, and might require some guidance.

debnathshoham · 2021-07-05T19:15:14Z

take

mzeitlin11 · 2021-07-05T19:28:40Z

Sounds great, feel free to ask any questions!

* DOC: Adding examples to DataFrameGroupBy.rank #38972 * DOC: Adding examples to DataFrameGroupBy.rank #38972 * DOC: made the suggested changes * DOC: changed as suggested * DOC: changed as suggested * DOC: updating with black output * DOC: updating with black output * DOC: corrected docstring order

mzeitlin11 added the Needs Info Clarification about behavior needed to assess issue label Jan 5, 2021

UlionTse changed the title ~~pandas.core.groupby.DataFrameGroupBy.rank() dense_rank dose not work~~ pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work Jan 6, 2021

mzeitlin11 added Docs Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed Needs Info Clarification about behavior needed to assess issue labels Jan 6, 2021

mzeitlin11 added the good first issue label Mar 27, 2021

github-actions bot assigned debnathshoham Jul 5, 2021

debnathshoham added a commit to debnathshoham/pandas that referenced this issue Jul 6, 2021

DOC: Adding examples to DataFrameGroupBy.rank pandas-dev#38972

adbdcb1

debnathshoham mentioned this issue Jul 6, 2021

DOC: Adding examples to DataFrameGroupBy.rank #38972 #42402

Merged

4 tasks

debnathshoham added a commit to debnathshoham/pandas that referenced this issue Jul 6, 2021

DOC: Adding examples to DataFrameGroupBy.rank pandas-dev#38972

ff71390

mroeschke added this to the 1.4 milestone Jul 12, 2021

mroeschke closed this as completed in #42402 Jul 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work #38972

pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work #38972

UlionTse commented Jan 5, 2021

mzeitlin11 commented Jan 5, 2021

UlionTse commented Jan 6, 2021 •

edited

Loading

mzeitlin11 commented Jan 6, 2021

UlionTse commented Jan 8, 2021 •

edited

Loading

mzeitlin11 commented Jan 8, 2021

UlionTse commented Jan 11, 2021

debnathshoham commented Jul 5, 2021

mzeitlin11 commented Jul 5, 2021

debnathshoham commented Jul 5, 2021

debnathshoham commented Jul 5, 2021

mzeitlin11 commented Jul 5, 2021

pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work #38972

pandas.core.groupby.DataFrameGroupBy.rank() dense_rank does not work #38972

Comments

UlionTse commented Jan 5, 2021

mzeitlin11 commented Jan 5, 2021

UlionTse commented Jan 6, 2021 • edited Loading

mzeitlin11 commented Jan 6, 2021

UlionTse commented Jan 8, 2021 • edited Loading

mzeitlin11 commented Jan 8, 2021

UlionTse commented Jan 11, 2021

debnathshoham commented Jul 5, 2021

mzeitlin11 commented Jul 5, 2021

debnathshoham commented Jul 5, 2021

debnathshoham commented Jul 5, 2021

mzeitlin11 commented Jul 5, 2021

UlionTse commented Jan 6, 2021 •

edited

Loading

UlionTse commented Jan 8, 2021 •

edited

Loading