Skip to content

Commit 700c9b8

Browse files
committed
Revert "DEPR: DataFrameGroupBy.apply operating on the group keys (pandas-dev#52477)"
This reverts commit 9b20759.
1 parent 937f774 commit 700c9b8

30 files changed

+256
-704
lines changed

doc/source/user_guide/cookbook.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
459459
df
460460
461461
# List the size of the animals with the highest weight.
462-
df.groupby("animal")[["size", "weight"]].apply(lambda subf: subf["size"][subf["weight"].idxmax()])
462+
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
463463
464464
`Using get_group
465465
<https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key>`__
@@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
482482
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])
483483
484484
485-
expected_df = gb[["size", "weight"]].apply(GrowUp)
485+
expected_df = gb.apply(GrowUp)
486486
expected_df
487487
488488
`Expanding apply

doc/source/user_guide/groupby.rst

+4-10
Original file line numberDiff line numberDiff line change
@@ -431,12 +431,6 @@ This is mainly syntactic sugar for the alternative, which is much more verbose:
431431
Additionally, this method avoids recomputing the internal grouping information
432432
derived from the passed key.
433433

434-
You can also include the grouping columns if you want to operate on them.
435-
436-
.. ipython:: python
437-
438-
grouped[["A", "B"]].sum()
439-
440434
.. _groupby.iterating-label:
441435

442436
Iterating through groups
@@ -1074,7 +1068,7 @@ missing values with the ``ffill()`` method.
10741068
).set_index("date")
10751069
df_re
10761070
1077-
df_re.groupby("group")[["val"]].resample("1D").ffill()
1071+
df_re.groupby("group").resample("1D").ffill()
10781072
10791073
.. _groupby.filter:
10801074

@@ -1240,13 +1234,13 @@ the argument ``group_keys`` which defaults to ``True``. Compare
12401234

12411235
.. ipython:: python
12421236
1243-
df.groupby("A", group_keys=True)[["B", "C", "D"]].apply(lambda x: x)
1237+
df.groupby("A", group_keys=True).apply(lambda x: x)
12441238
12451239
with
12461240

12471241
.. ipython:: python
12481242
1249-
df.groupby("A", group_keys=False)[["B", "C", "D"]].apply(lambda x: x)
1243+
df.groupby("A", group_keys=False).apply(lambda x: x)
12501244
12511245
12521246
Numba Accelerated Routines
@@ -1730,7 +1724,7 @@ column index name will be used as the name of the inserted column:
17301724
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
17311725
return pd.Series(result, name="metrics")
17321726
1733-
result = df.groupby("a")[["b", "c"]].apply(compute_metrics)
1727+
result = df.groupby("a").apply(compute_metrics)
17341728
17351729
result
17361730

doc/source/whatsnew/v0.14.0.rst

+5-17
Original file line numberDiff line numberDiff line change
@@ -328,25 +328,13 @@ More consistent behavior for some groupby methods:
328328

329329
- groupby ``head`` and ``tail`` now act more like ``filter`` rather than an aggregation:
330330

331-
.. code-block:: ipython
332-
333-
In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
334-
335-
In [2]: g = df.groupby('A')
336-
337-
In [3]: g.head(1) # filters DataFrame
338-
Out[3]:
339-
A B
340-
0 1 2
341-
2 5 6
331+
.. ipython:: python
342332
343-
In [4]: g.apply(lambda x: x.head(1)) # used to simply fall-through
344-
Out[4]:
345-
A B
346-
A
347-
1 0 1 2
348-
5 2 5 6
333+
df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
334+
g = df.groupby('A')
335+
g.head(1) # filters DataFrame
349336
337+
g.apply(lambda x: x.head(1)) # used to simply fall-through
350338
351339
- groupby head and tail respect column selection:
352340

doc/source/whatsnew/v0.18.1.rst

+6-87
Original file line numberDiff line numberDiff line change
@@ -77,52 +77,9 @@ Previously you would have to do this to get a rolling window mean per-group:
7777
df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
7878
df
7979
80-
.. code-block:: ipython
80+
.. ipython:: python
8181
82-
In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
83-
Out[1]:
84-
A
85-
1 0 NaN
86-
1 NaN
87-
2 NaN
88-
3 1.5
89-
4 2.5
90-
5 3.5
91-
6 4.5
92-
7 5.5
93-
8 6.5
94-
9 7.5
95-
10 8.5
96-
11 9.5
97-
12 10.5
98-
13 11.5
99-
14 12.5
100-
15 13.5
101-
16 14.5
102-
17 15.5
103-
18 16.5
104-
19 17.5
105-
2 20 NaN
106-
21 NaN
107-
22 NaN
108-
23 21.5
109-
24 22.5
110-
25 23.5
111-
26 24.5
112-
27 25.5
113-
28 26.5
114-
29 27.5
115-
30 28.5
116-
31 29.5
117-
3 32 NaN
118-
33 NaN
119-
34 NaN
120-
35 33.5
121-
36 34.5
122-
37 35.5
123-
38 36.5
124-
39 37.5
125-
Name: B, dtype: float64
82+
df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
12683
12784
Now you can do:
12885

@@ -144,53 +101,15 @@ For ``.resample(..)`` type of operations, previously you would have to:
144101
145102
df
146103
147-
.. code-block:: ipython
104+
.. ipython:: python
148105
149-
In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
150-
Out[1]:
151-
group val
152-
group date
153-
1 2016-01-03 1 5
154-
2016-01-04 1 5
155-
2016-01-05 1 5
156-
2016-01-06 1 5
157-
2016-01-07 1 5
158-
2016-01-08 1 5
159-
2016-01-09 1 5
160-
2016-01-10 1 6
161-
2 2016-01-17 2 7
162-
2016-01-18 2 7
163-
2016-01-19 2 7
164-
2016-01-20 2 7
165-
2016-01-21 2 7
166-
2016-01-22 2 7
167-
2016-01-23 2 7
168-
2016-01-24 2 8
106+
df.groupby("group").apply(lambda x: x.resample("1D").ffill())
169107
170108
Now you can do:
171109

172-
.. code-block:: ipython
110+
.. ipython:: python
173111
174-
In[1]: df.groupby("group").resample("1D").ffill()
175-
Out[1]:
176-
group val
177-
group date
178-
1 2016-01-03 1 5
179-
2016-01-04 1 5
180-
2016-01-05 1 5
181-
2016-01-06 1 5
182-
2016-01-07 1 5
183-
2016-01-08 1 5
184-
2016-01-09 1 5
185-
2016-01-10 1 6
186-
2 2016-01-17 2 7
187-
2016-01-18 2 7
188-
2016-01-19 2 7
189-
2016-01-20 2 7
190-
2016-01-21 2 7
191-
2016-01-22 2 7
192-
2016-01-23 2 7
193-
2016-01-24 2 8
112+
df.groupby("group").resample("1D").ffill()
194113
195114
.. _whatsnew_0181.enhancements.method_chain:
196115

pandas/core/frame.py

+13-13
Original file line numberDiff line numberDiff line change
@@ -8606,20 +8606,20 @@ def update(
86068606
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
86078607
... 'Parrot', 'Parrot'],
86088608
... 'Max Speed': [380., 370., 24., 26.]})
8609-
>>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
8610-
Max Speed
8609+
>>> df.groupby("Animal", group_keys=True).apply(lambda x: x)
8610+
Animal Max Speed
86118611
Animal
8612-
Falcon 0 380.0
8613-
1 370.0
8614-
Parrot 2 24.0
8615-
3 26.0
8616-
8617-
>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
8618-
Max Speed
8619-
0 380.0
8620-
1 370.0
8621-
2 24.0
8622-
3 26.0
8612+
Falcon 0 Falcon 380.0
8613+
1 Falcon 370.0
8614+
Parrot 2 Parrot 24.0
8615+
3 Parrot 26.0
8616+
8617+
>>> df.groupby("Animal", group_keys=False).apply(lambda x: x)
8618+
Animal Max Speed
8619+
0 Falcon 380.0
8620+
1 Falcon 370.0
8621+
2 Parrot 24.0
8622+
3 Parrot 26.0
86238623
"""
86248624
)
86258625
)

pandas/core/groupby/groupby.py

+30-50
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ class providing the base-class of operations.
260260
each group together into a Series, including setting the index as
261261
appropriate:
262262
263-
>>> g1[['B', 'C']].apply(lambda x: x.C.max() - x.B.min())
263+
>>> g1.apply(lambda x: x.C.max() - x.B.min())
264264
A
265265
a 5
266266
b 2
@@ -1500,16 +1500,6 @@ def f(g):
15001500
with option_context("mode.chained_assignment", None):
15011501
try:
15021502
result = self._python_apply_general(f, self._selected_obj)
1503-
if (
1504-
not isinstance(self.obj, Series)
1505-
and self._selection is None
1506-
and self._selected_obj.shape != self._obj_with_exclusions.shape
1507-
):
1508-
warnings.warn(
1509-
message=_apply_groupings_depr.format(type(self).__name__),
1510-
category=FutureWarning,
1511-
stacklevel=find_stack_level(),
1512-
)
15131503
except TypeError:
15141504
# gh-20949
15151505
# try again, with .apply acting as a filtering
@@ -2671,55 +2661,55 @@ def resample(self, rule, *args, **kwargs):
26712661
Downsample the DataFrame into 3 minute bins and sum the values of
26722662
the timestamps falling into a bin.
26732663
2674-
>>> df.groupby('a')[['b']].resample('3T').sum()
2675-
b
2664+
>>> df.groupby('a').resample('3T').sum()
2665+
a b
26762666
a
2677-
0 2000-01-01 00:00:00 2
2678-
2000-01-01 00:03:00 1
2679-
5 2000-01-01 00:00:00 1
2667+
0 2000-01-01 00:00:00 0 2
2668+
2000-01-01 00:03:00 0 1
2669+
5 2000-01-01 00:00:00 5 1
26802670
26812671
Upsample the series into 30 second bins.
26822672
2683-
>>> df.groupby('a')[['b']].resample('30S').sum()
2684-
b
2673+
>>> df.groupby('a').resample('30S').sum()
2674+
a b
26852675
a
2686-
0 2000-01-01 00:00:00 1
2687-
2000-01-01 00:00:30 0
2688-
2000-01-01 00:01:00 1
2689-
2000-01-01 00:01:30 0
2690-
2000-01-01 00:02:00 0
2691-
2000-01-01 00:02:30 0
2692-
2000-01-01 00:03:00 1
2693-
5 2000-01-01 00:02:00 1
2676+
0 2000-01-01 00:00:00 0 1
2677+
2000-01-01 00:00:30 0 0
2678+
2000-01-01 00:01:00 0 1
2679+
2000-01-01 00:01:30 0 0
2680+
2000-01-01 00:02:00 0 0
2681+
2000-01-01 00:02:30 0 0
2682+
2000-01-01 00:03:00 0 1
2683+
5 2000-01-01 00:02:00 5 1
26942684
26952685
Resample by month. Values are assigned to the month of the period.
26962686
2697-
>>> df.groupby('a')[['b']].resample('M').sum()
2698-
b
2687+
>>> df.groupby('a').resample('M').sum()
2688+
a b
26992689
a
2700-
0 2000-01-31 3
2701-
5 2000-01-31 1
2690+
0 2000-01-31 0 3
2691+
5 2000-01-31 5 1
27022692
27032693
Downsample the series into 3 minute bins as above, but close the right
27042694
side of the bin interval.
27052695
2706-
>>> df.groupby('a')[['b']].resample('3T', closed='right').sum()
2707-
b
2696+
>>> df.groupby('a').resample('3T', closed='right').sum()
2697+
a b
27082698
a
2709-
0 1999-12-31 23:57:00 1
2710-
2000-01-01 00:00:00 2
2711-
5 2000-01-01 00:00:00 1
2699+
0 1999-12-31 23:57:00 0 1
2700+
2000-01-01 00:00:00 0 2
2701+
5 2000-01-01 00:00:00 5 1
27122702
27132703
Downsample the series into 3 minute bins and close the right side of
27142704
the bin interval, but label each bin using the right edge instead of
27152705
the left.
27162706
2717-
>>> df.groupby('a')[['b']].resample('3T', closed='right', label='right').sum()
2718-
b
2707+
>>> df.groupby('a').resample('3T', closed='right', label='right').sum()
2708+
a b
27192709
a
2720-
0 2000-01-01 00:00:00 1
2721-
2000-01-01 00:03:00 2
2722-
5 2000-01-01 00:03:00 1
2710+
0 2000-01-01 00:00:00 0 1
2711+
2000-01-01 00:03:00 0 2
2712+
5 2000-01-01 00:03:00 5 1
27232713
"""
27242714
from pandas.core.resample import get_resampler_for_grouping
27252715

@@ -4343,13 +4333,3 @@ def _insert_quantile_level(idx: Index, qs: npt.NDArray[np.float64]) -> MultiInde
43434333
else:
43444334
mi = MultiIndex.from_product([idx, qs])
43454335
return mi
4346-
4347-
4348-
# GH#7155
4349-
_apply_groupings_depr = (
4350-
"{}.apply operated on the grouping columns. This behavior is deprecated, "
4351-
"and in a future version of pandas the grouping columns will be excluded "
4352-
"from the operation. Select the columns to operate on after groupby to"
4353-
"either explicitly include or exclude the groupings and silence "
4354-
"this warning."
4355-
)

0 commit comments

Comments
 (0)