Skip to content

Commit 6acf643

Browse files
rhshadrachmroeschke
authored andcommitted
DEPR: DataFrameGroupBy.apply operating on the group keys (pandas-dev#54950)
* DEPR: DataFrameGroupBy.apply operating on the group keys * fixups * Improvements * Add DataFrameGroupBy.resample to the whatsnew; mypy fixup * Ignore wrong parameter order * Ignore groupby.resample in docstring validation * Fixup docstring
1 parent a959b19 commit 6acf643

30 files changed

+767
-294
lines changed

doc/source/user_guide/cookbook.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
459459
df
460460
461461
# List the size of the animals with the highest weight.
462-
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
462+
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()], include_groups=False)
463463
464464
`Using get_group
465465
<https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key>`__
@@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
482482
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])
483483
484484
485-
expected_df = gb.apply(GrowUp)
485+
expected_df = gb.apply(GrowUp, include_groups=False)
486486
expected_df
487487
488488
`Expanding apply

doc/source/user_guide/groupby.rst

+10-4
Original file line numberDiff line numberDiff line change
@@ -420,6 +420,12 @@ This is mainly syntactic sugar for the alternative, which is much more verbose:
420420
Additionally, this method avoids recomputing the internal grouping information
421421
derived from the passed key.
422422

423+
You can also include the grouping columns if you want to operate on them.
424+
425+
.. ipython:: python
426+
427+
grouped[["A", "B"]].sum()
428+
423429
.. _groupby.iterating-label:
424430

425431
Iterating through groups
@@ -1053,7 +1059,7 @@ missing values with the ``ffill()`` method.
10531059
).set_index("date")
10541060
df_re
10551061
1056-
df_re.groupby("group").resample("1D").ffill()
1062+
df_re.groupby("group").resample("1D", include_groups=False).ffill()
10571063
10581064
.. _groupby.filter:
10591065

@@ -1219,13 +1225,13 @@ the argument ``group_keys`` which defaults to ``True``. Compare
12191225

12201226
.. ipython:: python
12211227
1222-
df.groupby("A", group_keys=True).apply(lambda x: x)
1228+
df.groupby("A", group_keys=True).apply(lambda x: x, include_groups=False)
12231229
12241230
with
12251231

12261232
.. ipython:: python
12271233
1228-
df.groupby("A", group_keys=False).apply(lambda x: x)
1234+
df.groupby("A", group_keys=False).apply(lambda x: x, include_groups=False)
12291235
12301236
12311237
Numba Accelerated Routines
@@ -1709,7 +1715,7 @@ column index name will be used as the name of the inserted column:
17091715
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
17101716
return pd.Series(result, name="metrics")
17111717
1712-
result = df.groupby("a").apply(compute_metrics)
1718+
result = df.groupby("a").apply(compute_metrics, include_groups=False)
17131719
17141720
result
17151721

doc/source/whatsnew/v0.14.0.rst

+16-5
Original file line numberDiff line numberDiff line change
@@ -328,13 +328,24 @@ More consistent behavior for some groupby methods:
328328

329329
- groupby ``head`` and ``tail`` now act more like ``filter`` rather than an aggregation:
330330

331-
.. ipython:: python
331+
.. code-block:: ipython
332332
333-
df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
334-
g = df.groupby('A')
335-
g.head(1) # filters DataFrame
333+
In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
336334
337-
g.apply(lambda x: x.head(1)) # used to simply fall-through
335+
In [2]: g = df.groupby('A')
336+
337+
In [3]: g.head(1) # filters DataFrame
338+
Out[3]:
339+
A B
340+
0 1 2
341+
2 5 6
342+
343+
In [4]: g.apply(lambda x: x.head(1)) # used to simply fall-through
344+
Out[4]:
345+
A B
346+
A
347+
1 0 1 2
348+
5 2 5 6
338349
339350
- groupby head and tail respect column selection:
340351

doc/source/whatsnew/v0.18.1.rst

+87-6
Original file line numberDiff line numberDiff line change
@@ -77,9 +77,52 @@ Previously you would have to do this to get a rolling window mean per-group:
7777
df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
7878
df
7979
80-
.. ipython:: python
80+
.. code-block:: ipython
8181
82-
df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
82+
In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
83+
Out[1]:
84+
A
85+
1 0 NaN
86+
1 NaN
87+
2 NaN
88+
3 1.5
89+
4 2.5
90+
5 3.5
91+
6 4.5
92+
7 5.5
93+
8 6.5
94+
9 7.5
95+
10 8.5
96+
11 9.5
97+
12 10.5
98+
13 11.5
99+
14 12.5
100+
15 13.5
101+
16 14.5
102+
17 15.5
103+
18 16.5
104+
19 17.5
105+
2 20 NaN
106+
21 NaN
107+
22 NaN
108+
23 21.5
109+
24 22.5
110+
25 23.5
111+
26 24.5
112+
27 25.5
113+
28 26.5
114+
29 27.5
115+
30 28.5
116+
31 29.5
117+
3 32 NaN
118+
33 NaN
119+
34 NaN
120+
35 33.5
121+
36 34.5
122+
37 35.5
123+
38 36.5
124+
39 37.5
125+
Name: B, dtype: float64
83126
84127
Now you can do:
85128

@@ -101,15 +144,53 @@ For ``.resample(..)`` type of operations, previously you would have to:
101144
102145
df
103146
104-
.. ipython:: python
147+
.. code-block:: ipython
105148
106-
df.groupby("group").apply(lambda x: x.resample("1D").ffill())
149+
In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
150+
Out[1]:
151+
group val
152+
group date
153+
1 2016-01-03 1 5
154+
2016-01-04 1 5
155+
2016-01-05 1 5
156+
2016-01-06 1 5
157+
2016-01-07 1 5
158+
2016-01-08 1 5
159+
2016-01-09 1 5
160+
2016-01-10 1 6
161+
2 2016-01-17 2 7
162+
2016-01-18 2 7
163+
2016-01-19 2 7
164+
2016-01-20 2 7
165+
2016-01-21 2 7
166+
2016-01-22 2 7
167+
2016-01-23 2 7
168+
2016-01-24 2 8
107169
108170
Now you can do:
109171

110-
.. ipython:: python
172+
.. code-block:: ipython
111173
112-
df.groupby("group").resample("1D").ffill()
174+
In[1]: df.groupby("group").resample("1D").ffill()
175+
Out[1]:
176+
group val
177+
group date
178+
1 2016-01-03 1 5
179+
2016-01-04 1 5
180+
2016-01-05 1 5
181+
2016-01-06 1 5
182+
2016-01-07 1 5
183+
2016-01-08 1 5
184+
2016-01-09 1 5
185+
2016-01-10 1 6
186+
2 2016-01-17 2 7
187+
2016-01-18 2 7
188+
2016-01-19 2 7
189+
2016-01-20 2 7
190+
2016-01-21 2 7
191+
2016-01-22 2 7
192+
2016-01-23 2 7
193+
2016-01-24 2 8
113194
114195
.. _whatsnew_0181.enhancements.method_chain:
115196

doc/source/whatsnew/v2.2.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -146,12 +146,12 @@ Deprecations
146146
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_pickle` except ``path``. (:issue:`54229`)
147147
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_string` except ``buf``. (:issue:`54229`)
148148
- Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call ``result.infer_objects(copy=False)`` on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, use ``pd.set_option("future.downcasting", True)`` (:issue:`53656`)
149+
- Deprecated including the groups in computations when using :meth:`DataFrameGroupBy.apply` and :meth:`DataFrameGroupBy.resample`; pass ``include_groups=False`` to exclude the groups (:issue:`7155`)
149150
- Deprecated not passing a tuple to :class:`DataFrameGroupBy.get_group` or :class:`SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
150151
- Deprecated strings ``S``, ``U``, and ``N`` denoting units in :func:`to_timedelta` (:issue:`52536`)
151152
- Deprecated strings ``T``, ``S``, ``L``, ``U``, and ``N`` denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`52536`)
152153
- Deprecated strings ``T``, ``S``, ``L``, ``U``, and ``N`` denoting units in :class:`Timedelta` (:issue:`52536`)
153154
- Deprecated the extension test classes ``BaseNoReduceTests``, ``BaseBooleanReduceTests``, and ``BaseNumericReduceTests``, use ``BaseReduceTests`` instead (:issue:`54663`)
154-
-
155155

156156
.. ---------------------------------------------------------------------------
157157
.. _whatsnew_220.performance:

pandas/core/frame.py

+13-13
Original file line numberDiff line numberDiff line change
@@ -8869,20 +8869,20 @@ def update(
88698869
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
88708870
... 'Parrot', 'Parrot'],
88718871
... 'Max Speed': [380., 370., 24., 26.]})
8872-
>>> df.groupby("Animal", group_keys=True).apply(lambda x: x)
8873-
Animal Max Speed
8872+
>>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
8873+
Max Speed
88748874
Animal
8875-
Falcon 0 Falcon 380.0
8876-
1 Falcon 370.0
8877-
Parrot 2 Parrot 24.0
8878-
3 Parrot 26.0
8879-
8880-
>>> df.groupby("Animal", group_keys=False).apply(lambda x: x)
8881-
Animal Max Speed
8882-
0 Falcon 380.0
8883-
1 Falcon 370.0
8884-
2 Parrot 24.0
8885-
3 Parrot 26.0
8875+
Falcon 0 380.0
8876+
1 370.0
8877+
Parrot 2 24.0
8878+
3 26.0
8879+
8880+
>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
8881+
Max Speed
8882+
0 380.0
8883+
1 370.0
8884+
2 24.0
8885+
3 26.0
88868886
"""
88878887
)
88888888
)

0 commit comments

Comments
 (0)