BUG: groupby.nth should be a filter #49262

rhshadrach · 2022-10-23T12:54:15Z

closes groupby.nth() labelling conventions changed from 0.17 -> 0.18 #13666 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

mroeschke · 2022-10-24T18:34:49Z

pandas/core/groupby/groupby.py

-        2  3.0
+           A   B
+        1  1 2.0
+        2  2 3.0

        NaNs denote group exhausted when using dropna


This description probably needs updating

mroeschke · 2022-10-24T18:36:28Z

doc/source/whatsnew/v2.0.0.rst

@@ -338,7 +338,7 @@ Groupby/resample/rolling
 - Bug in :meth:`DataFrameGroupBy.sample` raises ``ValueError`` when the object is empty (:issue:`48459`)
 - Bug in :meth:`Series.groupby` raises ``ValueError`` when an entry of the index is equal to the name of the index (:issue:`48567`)
 - Bug in :meth:`DataFrameGroupBy.resample` produces inconsistent results when passing empty DataFrame (:issue:`47705`)
-
+- Bug in :meth:`.DataFrameGroupBy.nth` and :meth:`.SeriesGroupBy.nth` would treat operation as a aggregation whereas it is a filtration; in particular, the result index no longer contains the groupers but rather is filtered from the original index of the input (:issue:`13666`)


Generally like consistency now that nth is a filter, but I think this should be called out in it's own "notable bug fix" section

…filter

rhshadrach · 2022-10-25T11:27:44Z

Thanks @mroeschke - ready for another look.

mroeschke · 2022-10-25T18:07:46Z

pandas/core/groupby/groupby.py


-        NaNs denote group exhausted when using dropna
+        When the specified ``n`` is larger than any of the groups, an
+        empty DataFrame is returned

        >>> g.nth(3, dropna='any')


Do you think this example should be shown in the whatsnew? On the surface, this example appears to be quite different from before

WillAyd · 2022-10-25T19:29:51Z

Hmm I'm not sure I agree with this approach - doesn't this introduce an even larger inconsistency now between nth / first / last? Do we have other groupby functions that work this way?

rhshadrach · 2022-10-25T20:18:53Z

Hmm I'm not sure I agree with this approach - doesn't this introduce an even larger inconsistency now between nth / first / last? Do we have other groupby functions that work this way?

first and last are reductions - they return 1 value per group. nth on the other hand can also return zero or multiple rows. It is because of this I think we should be treating nth as a filtration. The other filtrations, head, tail, and filter all behave as in this PR, namely they merely filter the rows from the input object. In particular, groupby(...).head() now behaves the same as groupby(...).nth([0, 1, 2, 3, 4]).

WillAyd · 2022-10-25T20:52:10Z

Yea I definitely see the argument. I think I'm +/-0 . IIUC this makes nth closer to filtering on rank, but with a predetermined option for a tiebreaker

rhshadrach · 2022-10-25T20:59:09Z

IIUC this makes nth closer to filtering on rank

nth goes off of the input order, making it somewhat different from rank.

mroeschke

I'm okay with this change as nth as a filter makes sense to me and to align the corresponding output. IIRC there was an issue to write groupby().head/tail in terms with nth which this change would help IIUC.

WillAyd · 2022-11-11T01:39:50Z

Awesome work @rhshadrach

rhshadrach added 3 commits October 23, 2022 07:44

BUG: nth should be a filter

15a3aa7

Docs

0e2e5ee

whatsnew

4d89459

rhshadrach added Bug Groupby labels Oct 23, 2022

mroeschke reviewed Oct 24, 2022

View reviewed changes

rhshadrach added 2 commits October 25, 2022 06:27

Merge branch 'main' of https://github.com/pandas-dev/pandas into nth_…

59e883f

…filter

whatsnew note, docstring fixup

f9f1066

mroeschke reviewed Oct 25, 2022

View reviewed changes

rhshadrach and others added 2 commits October 25, 2022 17:10

Add example to whatsnew

cd74b98

Merge branch 'main' into nth_filter

4b0a101

mroeschke approved these changes Nov 9, 2022

View reviewed changes

Merge branch 'main' into nth_filter

5c308e1

WillAyd approved these changes Nov 11, 2022

View reviewed changes

WillAyd merged commit 9fefc8f into pandas-dev:main Nov 11, 2022

rhshadrach deleted the nth_filter branch November 11, 2022 13:00

MarcoGorelli mentioned this pull request Nov 11, 2022

BUG: groupby.nth() providing incorrect results in development code #49644

Closed

3 tasks

codamuse pushed a commit to codamuse/pandas that referenced this pull request Nov 12, 2022

BUG: groupby.nth should be a filter (pandas-dev#49262)

36936a3

mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022

BUG: groupby.nth should be a filter (pandas-dev#49262)

c7c0b3b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: groupby.nth should be a filter #49262

BUG: groupby.nth should be a filter #49262

rhshadrach commented Oct 23, 2022

mroeschke Oct 24, 2022

mroeschke Oct 24, 2022

rhshadrach commented Oct 25, 2022

mroeschke Oct 25, 2022

WillAyd commented Oct 25, 2022

rhshadrach commented Oct 25, 2022

WillAyd commented Oct 25, 2022

rhshadrach commented Oct 25, 2022 •

edited

Loading

mroeschke left a comment •

edited

Loading

WillAyd commented Nov 11, 2022

BUG: groupby.nth should be a filter #49262

BUG: groupby.nth should be a filter #49262

Conversation

rhshadrach commented Oct 23, 2022

mroeschke Oct 24, 2022

Choose a reason for hiding this comment

mroeschke Oct 24, 2022

Choose a reason for hiding this comment

rhshadrach commented Oct 25, 2022

mroeschke Oct 25, 2022

Choose a reason for hiding this comment

WillAyd commented Oct 25, 2022

rhshadrach commented Oct 25, 2022

WillAyd commented Oct 25, 2022

rhshadrach commented Oct 25, 2022 • edited Loading

mroeschke left a comment • edited Loading

Choose a reason for hiding this comment

WillAyd commented Nov 11, 2022

rhshadrach commented Oct 25, 2022 •

edited

Loading

mroeschke left a comment •

edited

Loading