Skip to content

Commit 16c987e

Browse files
Preserve Index and grouped columns in Groupby.nth (#13442)
In pandas-2.0 `groupby.nth` behavior has changed: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#dataframegroupby-nth-and-seriesgroupby-nth-now-behave-as-filtrations This PR enables preserving the callers index in the end result and returns grouping columns as part of the result. This PR fixes all 12 pytests in `python/cudf/cudf/tests/test_groupby.py::test_groupby_nth`
1 parent 2dafcfc commit 16c987e

File tree

1 file changed

+14
-3
lines changed

1 file changed

+14
-3
lines changed

python/cudf/cudf/core/groupby/groupby.py

+14-3
Original file line numberDiff line numberDiff line change
@@ -802,10 +802,21 @@ def nth(self, n):
802802
"""
803803
Return the nth row from each group.
804804
"""
805-
result = self.agg(lambda x: x.nth(n)).sort_index()
806-
sizes = self.size().sort_index()
807805

808-
return result[sizes > n]
806+
self.obj["__groupbynth_order__"] = range(0, len(self.obj))
807+
# We perform another groupby here to have the grouping columns
808+
# be a part of dataframe columns.
809+
result = self.obj.groupby(self.grouping.keys).agg(lambda x: x.nth(n))
810+
sizes = self.size().reindex(result.index)
811+
812+
result = result[sizes > n]
813+
814+
result._index = self.obj.index.take(
815+
result._data["__groupbynth_order__"]
816+
)
817+
del result._data["__groupbynth_order__"]
818+
del self.obj._data["__groupbynth_order__"]
819+
return result
809820

810821
@_cudf_nvtx_annotate
811822
def ngroup(self, ascending=True):

0 commit comments

Comments
 (0)