@@ -10,6 +10,61 @@ including other versions of pandas.
10
10
11
11
.. ---------------------------------------------------------------------------
12
12
13
+ .. _whatsnew_151.groupby_categorical_regr :
14
+
15
+ Behavior of ``groupby `` with categorical groupers (:issue: `48645 `)
16
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17
+
18
+ In versions of pandas prior to 1.5, ``groupby `` with ``dropna=False `` would still drop
19
+ NA values when the grouper was a categorical dtype. A fix for this was attempted in
20
+ 1.5, however it introduced a regression where passing ``observed=False `` and
21
+ ``dropna=False `` to ``groupby `` would result in only observed categories. It was found
22
+ that the patch fixing the ``dropna=False `` bug is incompatible with ``observed=False ``,
23
+ and decided that the best resolution is to restore the correct ``observed=False ``
24
+ behavior at the cost of reintroducing the ``dropna=False `` bug.
25
+
26
+ .. ipython :: python
27
+
28
+ df = pd.DataFrame(
29
+ {
30
+ " x" : pd.Categorical([1 , None ], categories = [1 , 2 , 3 ]),
31
+ " y" : [3 , 4 ],
32
+ }
33
+ )
34
+ df
35
+
36
+ *1.5.0 behavior *:
37
+
38
+ .. code-block :: ipython
39
+
40
+ In [3]: # Correct behavior, NA values are not dropped
41
+ df.groupby("x", observed=True, dropna=False).sum()
42
+ Out[3]:
43
+ y
44
+ x
45
+ 1 3
46
+ NaN 4
47
+
48
+
49
+ In [4]: # Incorrect behavior, only observed categories present
50
+ df.groupby("x", observed=False, dropna=False).sum()
51
+ Out[4]:
52
+ y
53
+ x
54
+ 1 3
55
+ NaN 4
56
+
57
+
58
+ *1.5.1 behavior *:
59
+
60
+ .. ipython :: python
61
+
62
+ # Incorrect behavior, NA values are dropped
63
+ df.groupby(" x" , observed = True , dropna = False ).sum()
64
+
65
+ # Correct behavior, unobserved categories present (NA values still dropped)
66
+ df.groupby(" x" , observed = False , dropna = False ).sum()
67
+
13
68
.. _whatsnew_151.regressions :
14
69
15
70
Fixed regressions
0 commit comments