Skip to content

DOC: Adding examples to DataFrameGroupBy.rank #38972 #42402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 12, 2021
46 changes: 45 additions & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2641,7 +2641,6 @@ def cumcount(self, ascending: bool = True):

@final
@Substitution(name="groupby")
@Appender(_common_see_also)
def rank(
self,
method: str = "average",
Expand Down Expand Up @@ -2675,6 +2674,51 @@ def rank(
Returns
-------
DataFrame with ranking of values within each group

See Also
--------
Series.groupby : Apply a function groupby to a Series.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that other groupby docs include Series/DataFrame .groupby in the See Also, but IMO they're not helpful (especially since they don't link to anything).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have opened #42406 for this

DataFrame.groupby : Apply a function groupby
to each row or column of a DataFrame.
Series.rank : Apply a function rank to a Series.
DataFrame.rank : Apply a function rank
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the screenshot you posted, looks like this doesn't render as a link, so not that useful in current form. I think best to keep scope small and remove changes to the See Also (which could then be tackled as part of #42406 if you're interested!).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted the change on See Also

to each row or column of a DataFrame.

Examples
--------
>>> df = pd.DataFrame({'group': ['a', 'a', 'a', 'b',
... 'a', 'b', 'b', 'b', 'b', 'a'],
... 'value': [.2, .4, .2, 0.01,
... .3, .11, .21, .4, .01, 0.2]})
>>> df
group value
0 a 0.20
1 a 0.40
2 a 0.20
3 b 0.01
4 a 0.30
5 b 0.11
6 b 0.21
7 b 0.40
8 b 0.01
9 a 0.20
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the example would be easier to see how different groups are treated if groups are contiguous, eg a, a, a, a...b, b, b, b

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also think it would be clearer to have fewer distinct values (and maybe use ints instead of floats, with values that are easy to tell at a glance what is smallest, largest, etc

>>> df['average_rank'] = df.groupby('group')['value'].rank('average')
>>> df['min_rank'] = df.groupby('group')['value'].rank('min')
>>> df['max_rank'] = df.groupby('group')['value'].rank('max')
>>> df['dense_rank'] = df.groupby('group')['value'].rank('dense')
>>> df['first_rank'] = df.groupby('group')['value'].rank('first')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be clearer as is, but could be written more concisely along the lines of

for method in ["average", ..., "first"]:
   df[f"{method}_rank"] = df.groupby("group")["value"].rank(method)

>>> df
group value average_rank min_rank max_rank dense_rank first_rank
0 a 0.20 2.0 1.0 3.0 1.0 1.0
1 a 0.40 5.0 5.0 5.0 3.0 5.0
2 a 0.20 2.0 1.0 3.0 1.0 2.0
3 b 0.01 1.5 1.0 2.0 1.0 1.0
4 a 0.30 4.0 4.0 4.0 2.0 4.0
5 b 0.11 3.0 3.0 3.0 2.0 3.0
6 b 0.21 4.0 4.0 4.0 3.0 4.0
7 b 0.40 5.0 5.0 5.0 4.0 5.0
8 b 0.01 1.5 1.0 2.0 1.0 2.0
9 a 0.20 2.0 1.0 3.0 1.0 3.0
"""
if na_option not in {"keep", "top", "bottom"}:
msg = "na_option must be one of 'keep', 'top', or 'bottom'"
Expand Down