Skip to content

BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations #53707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
mvashishtha opened this issue Jun 16, 2023 · 3 comments
Open
1 of 3 tasks
Labels
API - Consistency Internal Consistency of API/Behavior Bug Filters e.g. head, tail, nth Groupby

Comments

@mvashishtha
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

SeriesGroupBy nlargest and nsmallest filter the data and seem to match the description of filtration in the groupby documentation. However, unlike the filtration methods, they always put the group keys in the result, and they follow as_index to decide whether to put the key in the row index or in the columns.

Feature Description

import pandas as pd

 df = pd.DataFrame([['a', 1]], index=['i1'])

# currently this gives a series with a multiindex with (a, 'i1'):
"""
a  i1    1
Name: 1, dtype: int64
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1    1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=True)[1].nlargest(1)

# currently this gives a new RangeIndex with 0 and
# puts the group key as a a column in the dataframe:
"""
   0  1
0  a  1
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1    1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=False)[1].nlargest(1)

Note that this would then match behavior for other filtrations like head()

Alternative Solutions

N/A

Additional Context

In this comment, @TomAugspurger says that nlargest and nsmallest should keep the index because

It can be useful, matches the Series.nlargest behavior, and changing it would be API breaking.

All those reasons apply to SeriesGroupBy.head() and tail(), both of which drop the group keys in the result but filter the data in a very similar way.

@mvashishtha mvashishtha added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 16, 2023
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 17, 2023 via email

@lithomas1 lithomas1 added Groupby API Design Filters e.g. head, tail, nth and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 18, 2023
@rhshadrach
Copy link
Member

+1 as well. I think this could be arguably characterized as a bugfix, but it is also long standing behavior. We could wait and fix for 3.0 as a "breaking change" or introduce an argument (as_filter?) that would then have the default deprecated and then deprecate the argument in 3.x.

@rhshadrach rhshadrach changed the title ENH: make SeriesGroupBy nlargest and nsmallest behave like other filtrations BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations Jun 19, 2023
@rhshadrach rhshadrach added API - Consistency Internal Consistency of API/Behavior and removed API Design labels Jun 19, 2023
@rhshadrach
Copy link
Member

Upon resolution of this issue, I think it's likely that #17477 will also be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Filters e.g. head, tail, nth Groupby
Projects
None yet
Development

No branches or pull requests

4 participants