BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations #53707

mvashishtha · 2023-06-16T23:41:25Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

SeriesGroupBy nlargest and nsmallest filter the data and seem to match the description of filtration in the groupby documentation. However, unlike the filtration methods, they always put the group keys in the result, and they follow as_index to decide whether to put the key in the row index or in the columns.

Feature Description

import pandas as pd

 df = pd.DataFrame([['a', 1]], index=['i1'])

# currently this gives a series with a multiindex with (a, 'i1'):
"""
a  i1    1
Name: 1, dtype: int64
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1    1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=True)[1].nlargest(1)

# currently this gives a new RangeIndex with 0 and
# puts the group key as a a column in the dataframe:
"""
   0  1
0  a  1
"""
# instead it would give just a series with just the original index value 'i1':
"""
i1    1
Name: 1, dtype: int64
"""
df.groupby(0, as_index=False)[1].nlargest(1)

Note that this would then match behavior for other filtrations like head()

Alternative Solutions

N/A

Additional Context

In this comment, @TomAugspurger says that nlargest and nsmallest should keep the index because

It can be useful, matches the Series.nlargest behavior, and changing it would be API breaking.

All those reasons apply to SeriesGroupBy.head() and tail(), both of which drop the group keys in the result but filter the data in a very similar way.

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2023-06-17T11:49:00Z

The comparison to head and tail are reasonable.Do you have a suggestion for how to deprecate the current behavior?

rhshadrach · 2023-06-19T01:57:19Z

+1 as well. I think this could be arguably characterized as a bugfix, but it is also long standing behavior. We could wait and fix for 3.0 as a "breaking change" or introduce an argument (as_filter?) that would then have the default deprecated and then deprecate the argument in 3.x.

rhshadrach · 2023-07-15T17:13:07Z

Upon resolution of this issue, I think it's likely that #17477 will also be closed.

mvashishtha added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 16, 2023

lithomas1 added Groupby API Design Filters e.g. head, tail, nth and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 18, 2023

rhshadrach added Bug and removed Enhancement labels Jun 19, 2023

rhshadrach changed the title ~~ENH: make SeriesGroupBy nlargest and nsmallest behave like other filtrations~~ BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations Jun 19, 2023

rhshadrach added API - Consistency Internal Consistency of API/Behavior and removed API Design labels Jun 19, 2023

This was referenced Jun 19, 2023

BUG: SeriesGroupBy nlargest with as_index=False raises ValueError: Length of values (2) does not match length of index (5) #53706

Open

SeriesGroupBy nlargest and aggregations grouping by list with as_index=False returns dataframe instead of series #53705

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations #53707

BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations #53707

mvashishtha commented Jun 16, 2023

TomAugspurger commented Jun 17, 2023 via email •

edited by MarcoGorelli

Loading

rhshadrach commented Jun 19, 2023

rhshadrach commented Jul 15, 2023

BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations #53707

BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations #53707

Comments

mvashishtha commented Jun 16, 2023

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

TomAugspurger commented Jun 17, 2023 via email • edited by MarcoGorelli Loading

rhshadrach commented Jun 19, 2023

rhshadrach commented Jul 15, 2023

TomAugspurger commented Jun 17, 2023 via email •

edited by MarcoGorelli

Loading