You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SeriesGroupBy nlargest and nsmallest filter the data and seem to match the description of filtration in the groupby documentation. However, unlike the filtration methods, they always put the group keys in the result, and they follow as_index to decide whether to put the key in the row index or in the columns.
Feature Description
importpandasaspddf=pd.DataFrame([['a', 1]], index=['i1'])
# currently this gives a series with a multiindex with (a, 'i1'):"""a i1 1Name: 1, dtype: int64"""# instead it would give just a series with just the original index value 'i1':"""i1 1Name: 1, dtype: int64"""df.groupby(0, as_index=True)[1].nlargest(1)
# currently this gives a new RangeIndex with 0 and# puts the group key as a a column in the dataframe:""" 0 10 a 1"""# instead it would give just a series with just the original index value 'i1':"""i1 1Name: 1, dtype: int64"""df.groupby(0, as_index=False)[1].nlargest(1)
Note that this would then match behavior for other filtrations like head()
+1 as well. I think this could be arguably characterized as a bugfix, but it is also long standing behavior. We could wait and fix for 3.0 as a "breaking change" or introduce an argument (as_filter?) that would then have the default deprecated and then deprecate the argument in 3.x.
rhshadrach
changed the title
ENH: make SeriesGroupBy nlargest and nsmallest behave like other filtrations
BUG: make SeriesGroupBy nlargest and nsmallest behave like other filtrations
Jun 19, 2023
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
SeriesGroupBy nlargest and nsmallest filter the data and seem to match the description of filtration in the groupby documentation. However, unlike the filtration methods, they always put the group keys in the result, and they follow
as_index
to decide whether to put the key in the row index or in the columns.Feature Description
Note that this would then match behavior for other filtrations like
head()
Alternative Solutions
N/A
Additional Context
In this comment, @TomAugspurger says that
nlargest
andnsmallest
should keep the index becauseAll those reasons apply to
SeriesGroupBy.head()
andtail()
, both of which drop the group keys in the result but filter the data in a very similar way.The text was updated successfully, but these errors were encountered: