Skip to content

Commit f7419d8

Browse files
authored
DOC: Whatsnew notable bugfix on groupby behavior with unobserved groups (#57600)
* DOC: Whatsnew notable bugfix on groupby behavior with unobserved groups * Finish up * refinements and fixes
1 parent e19dbee commit f7419d8

File tree

1 file changed

+57
-7
lines changed

1 file changed

+57
-7
lines changed

doc/source/whatsnew/v3.0.0.rst

+57-7
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,63 @@ Notable bug fixes
4444

4545
These are bug fixes that might have notable behavior changes.
4646

47-
.. _whatsnew_300.notable_bug_fixes.notable_bug_fix1:
47+
.. _whatsnew_300.notable_bug_fixes.groupby_unobs_and_na:
4848

49-
notable_bug_fix1
50-
^^^^^^^^^^^^^^^^
49+
Improved behavior in groupby for ``observed=False``
50+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
51+
52+
A number of bugs have been fixed due to improved handling of unobserved groups (:issue:`55738`). All remarks in this section equally impact :class:`.SeriesGroupBy`.
53+
54+
In previous versions of pandas, a single grouping with :meth:`.DataFrameGroupBy.apply` or :meth:`.DataFrameGroupBy.agg` would pass the unobserved groups to the provided function, resulting in ``0`` below.
55+
56+
.. ipython:: python
57+
58+
df = pd.DataFrame(
59+
{
60+
"key1": pd.Categorical(list("aabb"), categories=list("abc")),
61+
"key2": [1, 1, 1, 2],
62+
"values": [1, 2, 3, 4],
63+
}
64+
)
65+
df
66+
gb = df.groupby("key1", observed=False)
67+
gb[["values"]].apply(lambda x: x.sum())
68+
69+
However this was not the case when using multiple groupings, resulting in ``NaN`` below.
70+
71+
.. code-block:: ipython
72+
73+
In [1]: gb = df.groupby(["key1", "key2"], observed=False)
74+
In [2]: gb[["values"]].apply(lambda x: x.sum())
75+
Out[2]:
76+
values
77+
key1 key2
78+
a 1 3.0
79+
2 NaN
80+
b 1 3.0
81+
2 4.0
82+
c 1 NaN
83+
2 NaN
84+
85+
Now using multiple groupings will also pass the unobserved groups to the provided function.
86+
87+
.. ipython:: python
88+
89+
gb = df.groupby(["key1", "key2"], observed=False)
90+
gb[["values"]].apply(lambda x: x.sum())
91+
92+
Similarly:
93+
94+
- In previous versions of pandas the method :meth:`.DataFrameGroupBy.sum` would result in ``0`` for unobserved groups, but :meth:`.DataFrameGroupBy.prod`, :meth:`.DataFrameGroupBy.all`, and :meth:`.DataFrameGroupBy.any` would all result in NA values. Now these methods result in ``1``, ``True``, and ``False`` respectively.
95+
- :meth:`.DataFrameGroupBy.groups` did not include unobserved groups and now does.
96+
97+
These improvements also fixed certain bugs in groupby:
98+
99+
- :meth:`.DataFrameGroupBy.nunique` would fail when there are multiple groupings, unobserved groups, and ``as_index=False`` (:issue:`52848`)
100+
- :meth:`.DataFrameGroupBy.agg` would fail when there are multiple groupings, unobserved groups, and ``as_index=False`` (:issue:`36698`)
101+
- :meth:`.DataFrameGroupBy.sum` would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (:issue:`43891`)
102+
- :meth:`.DataFrameGroupBy.groups` with ``sort=False`` would sort groups; they now occur in the order they are observed (:issue:`56966`)
103+
- :meth:`.DataFrameGroupBy.value_counts` would produce incorrect results when used with some categorical and some non-categorical groupings and ``observed=False`` (:issue:`56016`)
51104

52105
.. _whatsnew_300.notable_bug_fixes.notable_bug_fix2:
53106

@@ -285,12 +338,9 @@ Plotting
285338

286339
Groupby/resample/rolling
287340
^^^^^^^^^^^^^^^^^^^^^^^^
341+
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument ``dropna`` (:issue:`55919`)
288342
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
289343
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
290-
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby arguments ``dropna`` and ``sort`` (:issue:`55919`, :issue:`56966`, :issue:`56851`)
291-
- Bug in :meth:`.DataFrameGroupBy.nunique` and :meth:`.SeriesGroupBy.nunique` would fail with multiple categorical groupings when ``as_index=False`` (:issue:`52848`)
292-
- Bug in :meth:`.DataFrameGroupBy.prod`, :meth:`.DataFrameGroupBy.any`, and :meth:`.DataFrameGroupBy.all` would result in NA values on unobserved groups; they now result in ``1``, ``False``, and ``True`` respectively (:issue:`55783`)
293-
- Bug in :meth:`.DataFrameGroupBy.value_counts` would produce incorrect results when used with some categorical and some non-categorical groupings and ``observed=False`` (:issue:`56016`)
294344
-
295345

296346
Reshaping

0 commit comments

Comments
 (0)