-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: groupby.nth should be a filter #49262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/groupby/groupby.py
Outdated
2 3.0 | ||
A B | ||
1 1 2.0 | ||
2 2 3.0 | ||
|
||
NaNs denote group exhausted when using dropna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description probably needs updating
doc/source/whatsnew/v2.0.0.rst
Outdated
@@ -338,7 +338,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`DataFrameGroupBy.sample` raises ``ValueError`` when the object is empty (:issue:`48459`) | |||
- Bug in :meth:`Series.groupby` raises ``ValueError`` when an entry of the index is equal to the name of the index (:issue:`48567`) | |||
- Bug in :meth:`DataFrameGroupBy.resample` produces inconsistent results when passing empty DataFrame (:issue:`47705`) | |||
- | |||
- Bug in :meth:`.DataFrameGroupBy.nth` and :meth:`.SeriesGroupBy.nth` would treat operation as a aggregation whereas it is a filtration; in particular, the result index no longer contains the groupers but rather is filtered from the original index of the input (:issue:`13666`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally like consistency now that nth
is a filter, but I think this should be called out in it's own "notable bug fix" section
Thanks @mroeschke - ready for another look. |
|
||
NaNs denote group exhausted when using dropna | ||
When the specified ``n`` is larger than any of the groups, an | ||
empty DataFrame is returned | ||
|
||
>>> g.nth(3, dropna='any') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this example should be shown in the whatsnew? On the surface, this example appears to be quite different from before
Hmm I'm not sure I agree with this approach - doesn't this introduce an even larger inconsistency now between nth / first / last? Do we have other groupby functions that work this way? |
|
Yea I definitely see the argument. I think I'm +/-0 . IIUC this makes nth closer to filtering on rank, but with a predetermined option for a tiebreaker |
nth goes off of the input order, making it somewhat different from rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this change as nth as a filter makes sense to me and to align the corresponding output. IIRC there was an issue to write groupby().head/tail
in terms with nth
which this change would help IIUC.
Awesome work @rhshadrach |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.