Skip to content

Commit a51cd10

Browse files
committed
BUG: head and tail not dropping groups with nan
1 parent 2f915b3 commit a51cd10

File tree

3 files changed

+30
-0
lines changed

3 files changed

+30
-0
lines changed

doc/source/whatsnew/v1.4.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -894,6 +894,7 @@ Groupby/resample/rolling
894894
- Bug in :meth:`GroupBy.nth` failing on ``axis=1`` (:issue:`43926`)
895895
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not respecting right bound on centered datetime-like windows, if the index contain duplicates (:issue:`3944`)
896896
- Bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` when using a :class:`pandas.api.indexers.BaseIndexer` subclass that returned unequal start and end arrays would segfault instead of raising a ``ValueError`` (:issue:`44470`)
897+
- Bug in :meth:`GroupBy.head` and :meth:`GroupBy.tail` no dropping groups with ``NaN`` when ``dropna=True`` (:issue:`45089`)
897898
- Fixed bug in :meth:`GroupBy.__iter__` after selecting a subset of columns in a :class:`GroupBy` object, which returned all columns instead of the chosen subset (:issue:`#44821`)
898899
- Bug in :meth:`Groupby.rolling` when non-monotonic data passed, fails to correctly raise ``ValueError`` (:issue:`43909`)
899900
- Fixed bug where grouping by a :class:`Series` that has a categorical data type and length unequal to the axis of grouping raised ``ValueError`` (:issue:`44179`)

pandas/core/groupby/groupby.py

+3
Original file line numberDiff line numberDiff line change
@@ -3580,6 +3580,9 @@ def _mask_selected_obj(self, mask: np.ndarray) -> NDFrameT:
35803580
Series or DataFrame
35813581
Filtered _selected_obj.
35823582
"""
3583+
ids = self.grouper.group_info[0]
3584+
mask = mask & (ids != -1)
3585+
35833586
if self.axis == 0:
35843587
return self._selected_obj[mask]
35853588
else:

pandas/tests/groupby/test_nth.py

+26
Original file line numberDiff line numberDiff line change
@@ -809,3 +809,29 @@ def test_nth_slices_with_column_axis(
809809
}[method](start, stop)
810810
expected = DataFrame([expected_values], columns=expected_columns)
811811
tm.assert_frame_equal(result, expected)
812+
813+
814+
def test_head_tail_dropna_true():
815+
# GH#45089
816+
df = DataFrame(
817+
[["a", "z"], ["b", np.nan], ["c", np.nan], ["c", np.nan]], columns=["X", "Y"]
818+
)
819+
expected = DataFrame([["a", "z"]], columns=["X", "Y"])
820+
821+
result = df.groupby(["X", "Y"]).head(n=1)
822+
tm.assert_frame_equal(result, expected)
823+
824+
result = df.groupby(["X", "Y"]).tail(n=1)
825+
tm.assert_frame_equal(result, expected)
826+
827+
828+
def test_head_tail_dropna_false():
829+
# GH#45089
830+
df = DataFrame([["a", "z"], ["b", np.nan], ["c", np.nan]], columns=["X", "Y"])
831+
expected = DataFrame([["a", "z"], ["b", np.nan], ["c", np.nan]], columns=["X", "Y"])
832+
833+
result = df.groupby(["X", "Y"], dropna=False).head(n=1)
834+
tm.assert_frame_equal(result, expected)
835+
836+
result = df.groupby(["X", "Y"], dropna=False).tail(n=1)
837+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)