Skip to content

BUG: Fix groupby nth with axis=1 #43926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 13, 2021

Conversation

johnzangwill
Copy link
Contributor

Fixes a bug that no one has reported and probably no one cares about. But overlaps slightly with #42947, so I thought that I would make it a separate PR.

Current behaviour: df.groupby(df.iloc[1], axis=1).nth(0) crashes ungracefully.

@pep8speaks
Copy link

pep8speaks commented Oct 8, 2021

Hello @johnzangwill! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-10-13 07:59:55 UTC

out = out.reindex(result_index)
out = self._reindex_output(out)
else:
out.columns = result_index[ids[mask]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we re-use any of the _wrap_foo_result methods?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls add a whatsnew note for 1.4, groupby bug fix section

@@ -2545,18 +2545,25 @@ def nth(
# Drop NA values in grouping
mask = mask & (ids != -1)

out = self._selected_obj[mask]
if self.axis == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually let's make a method

self._selected_obj_for_axis(mask: np.ndarray[bool] | None)

as i think we use this in several places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually let's make a method

self._selected_obj_for_axis(mask: np.ndarray[bool] | None)

as i think we use this in several places.

@jreback, I had already factored these out in my other branch but I have done it here too, and made them both _mask_selected_obj(mask)

@johnzangwill
Copy link
Contributor Author

also pls add a whatsnew note for 1.4, groupby bug fix section

Done

@jreback jreback added this to the 1.4 milestone Oct 10, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, pls merge master and ping on green

@@ -503,6 +503,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`)
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`)
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`)
- Bug in :meth:`GroupBy.nth` failing on column groupings (``axis=1``)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say axis=1 (instead)

@johnzangwill
Copy link
Contributor Author

johnzangwill commented Oct 11, 2021

@jreback all green apart from Python Dev / actions-310-dev (macOS-latest) (pull_request) 60 minute timeout

@johnzangwill
Copy link
Contributor Author

small nit, pls merge master and ping on green

Still all green apart from another quite frequent ci testing failure:

>           raise AssertionError(f"Caused unexpected warning(s): {repr(extra_warnings)}")
E           AssertionError: Caused unexpected warning(s): [('ResourceWarning', ResourceWarning('unclosed file <_io.BufferedRandom name=13>'), '/home/runner/work/pandas/pandas/pandas/io/formats/excel.py', 179)]

Which I am sure is nothing to do with my code...

@@ -505,6 +505,7 @@ Groupby/resample/rolling
- Bug in :meth:`GroupBy.apply` with time-based :class:`Grouper` objects incorrectly raising ``ValueError`` in corner cases where the grouping vector contains a ``NaT`` (:issue:`43500`, :issue:`43515`)
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`)
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`)
- Bug in :meth:`GroupBy.nth` failing on ``axis=1``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the PR number here (as the issue number)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the PR number here (as the issue number)

Done

return self._mask_selected_obj(mask)

@final
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT:
if self.axis == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string here

Done

@jreback jreback merged commit b159c75 into pandas-dev:master Oct 13, 2021
@jreback
Copy link
Contributor

jreback commented Oct 13, 2021

thanks @johnzangwill very nice

return self._mask_selected_obj(mask)

@final
def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npt.NDArray[bool]

@johnzangwill johnzangwill deleted the fix_groupby_nth_column_axis branch October 13, 2021 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants