-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix groupby nth with axis=1 #43926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix groupby nth with axis=1 #43926
Conversation
Hello @johnzangwill! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-10-13 07:59:55 UTC |
out = out.reindex(result_index) | ||
out = self._reindex_output(out) | ||
else: | ||
out.columns = result_index[ids[mask]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we re-use any of the _wrap_foo_result methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls add a whatsnew note for 1.4, groupby bug fix section
pandas/core/groupby/groupby.py
Outdated
@@ -2545,18 +2545,25 @@ def nth( | |||
# Drop NA values in grouping | |||
mask = mask & (ids != -1) | |||
|
|||
out = self._selected_obj[mask] | |||
if self.axis == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually let's make a method
self._selected_obj_for_axis(mask: np.ndarray[bool] | None)
as i think we use this in several places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually let's make a method
self._selected_obj_for_axis(mask: np.ndarray[bool] | None)
as i think we use this in several places.
@jreback, I had already factored these out in my other branch but I have done it here too, and made them both _mask_selected_obj(mask)
…zangwill/pandas into fix_groupby_nth_column_axis
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit, pls merge master and ping on green
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -503,6 +503,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`) | |||
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`) | |||
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`) | |||
- Bug in :meth:`GroupBy.nth` failing on column groupings (``axis=1``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say axis=1 (instead)
@jreback all green apart from |
Still all green apart from another quite frequent ci testing failure:
Which I am sure is nothing to do with my code... |
doc/source/whatsnew/v1.4.0.rst
Outdated
@@ -505,6 +505,7 @@ Groupby/resample/rolling | |||
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`) | |||
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`) | |||
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`) | |||
- Bug in :meth:`GroupBy.nth` failing on ``axis=1`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the PR number here (as the issue number)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the PR number here (as the issue number)
Done
return self._mask_selected_obj(mask) | ||
|
||
@final | ||
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT: | ||
if self.axis == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string here
Done
thanks @johnzangwill very nice |
return self._mask_selected_obj(mask) | ||
|
||
@final | ||
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npt.NDArray[bool]
Fixes a bug that no one has reported and probably no one cares about. But overlaps slightly with #42947, so I thought that I would make it a separate PR.
Current behaviour:
df.groupby(df.iloc[1], axis=1).nth(0)
crashes ungracefully.