Skip to content

Commit 55733b8

Browse files
committed
fixup! BUG: Fix .groupby(categorical, sort=False) failing
1 parent 2aec326 commit 55733b8

File tree

2 files changed

+24
-15
lines changed

2 files changed

+24
-15
lines changed

doc/source/whatsnew/v0.20.0.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ Notably, a new numerical index, ``UInt64Index``, has been created (:issue:`14937
125125
GroupBy on Categoricals
126126
^^^^^^^^^^^^^^^^^^^^^^^
127127

128-
In previous version, ``.groupby(..., sort=False)`` would fail with a ``ValueError`` when grouping on a categorical series with some categories not appearing in the data. (:issue:`13179`)
128+
In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueError`` when grouping on a categorical series with some categories not appearing in the data. (:issue:`13179`)
129129

130130
Now, it works.
131131

pandas/core/categorical.py

+23-14
Original file line numberDiff line numberDiff line change
@@ -604,7 +604,13 @@ def _get_categories(self):
604604

605605
def _codes_for_groupby(self, sort):
606606
"""
607-
Return a Categorical adjusted for groupby
607+
If sort=False, return a copy of self, coded with categories as
608+
returned by .unique(), followed by any categories not appearing in
609+
the data. If sort=True, return self.
610+
611+
This method is needed solely to ensure the categorical index of the
612+
GroupBy result has categories in the order of appearance in the data
613+
(GH-8868).
608614
609615
Parameters
610616
----------
@@ -614,21 +620,24 @@ def _codes_for_groupby(self, sort):
614620
Returns
615621
-------
616622
Categorical
617-
In case of sort=True, self is returned with original categories
618-
preserved. In case of sort=False, the new categories are set
619-
to the order of appearance in codes (unless ordered=True),
620-
followed by any unrepresented categories in original order.
623+
If sort=False, the new categories are set to the order of
624+
appearance in codes (unless ordered=True, in which case the
625+
original order is preserved), followed by any unrepresented
626+
categories in the original order.
621627
"""
622-
cat = self
628+
if sort:
629+
# Already sorted according to self.categories; all is fine
630+
return self
631+
623632
# sort=False should order groups in as-encountered order (GH-8868)
624-
if not sort:
625-
cat = self.unique()
626-
# But all categories should be present, including those missing
627-
# from the data (GH-13179), which .unique() dropped
628-
cat.add_categories(self.categories[
629-
~self.categories.isin(cat.categories)],
630-
inplace=True)
631-
cat = self.reorder_categories(cat.categories)
633+
cat = self.unique()
634+
# But for groupby to work, all categories should be present,
635+
# including those missing from the data (GH-13179), which .unique()
636+
# above dropped
637+
cat.add_categories(
638+
self.categories[~self.categories.isin(cat.categories)],
639+
inplace=True)
640+
cat = self.reorder_categories(cat.categories)
632641
return cat
633642

634643
_ordered = None

0 commit comments

Comments
 (0)