Skip to content

Commit b14ccea

Browse files
rhshadrachim-vinicius
authored and
im-vinicius
committed
DEPR: Revert DataFrameGroupBy.apply operating on the group keys (pandas-dev#52921)
Revert "DEPR: DataFrameGroupBy.apply operating on the group keys (pandas-dev#52477)" This reverts commit 9b20759.
1 parent d9041af commit b14ccea

30 files changed

+256
-704
lines changed

doc/source/user_guide/cookbook.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
459459
df
460460
461461
# List the size of the animals with the highest weight.
462-
df.groupby("animal")[["size", "weight"]].apply(lambda subf: subf["size"][subf["weight"].idxmax()])
462+
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
463463
464464
`Using get_group
465465
<https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key>`__
@@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
482482
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])
483483
484484
485-
expected_df = gb[["size", "weight"]].apply(GrowUp)
485+
expected_df = gb.apply(GrowUp)
486486
expected_df
487487
488488
`Expanding apply

doc/source/user_guide/groupby.rst

+4-10
Original file line numberDiff line numberDiff line change
@@ -429,12 +429,6 @@ This is mainly syntactic sugar for the alternative, which is much more verbose:
429429
Additionally, this method avoids recomputing the internal grouping information
430430
derived from the passed key.
431431

432-
You can also include the grouping columns if you want to operate on them.
433-
434-
.. ipython:: python
435-
436-
grouped[["A", "B"]].sum()
437-
438432
.. _groupby.iterating-label:
439433

440434
Iterating through groups
@@ -1072,7 +1066,7 @@ missing values with the ``ffill()`` method.
10721066
).set_index("date")
10731067
df_re
10741068
1075-
df_re.groupby("group")[["val"]].resample("1D").ffill()
1069+
df_re.groupby("group").resample("1D").ffill()
10761070
10771071
.. _groupby.filter:
10781072

@@ -1238,13 +1232,13 @@ the argument ``group_keys`` which defaults to ``True``. Compare
12381232

12391233
.. ipython:: python
12401234
1241-
df.groupby("A", group_keys=True)[["B", "C", "D"]].apply(lambda x: x)
1235+
df.groupby("A", group_keys=True).apply(lambda x: x)
12421236
12431237
with
12441238

12451239
.. ipython:: python
12461240
1247-
df.groupby("A", group_keys=False)[["B", "C", "D"]].apply(lambda x: x)
1241+
df.groupby("A", group_keys=False).apply(lambda x: x)
12481242
12491243
12501244
Numba Accelerated Routines
@@ -1728,7 +1722,7 @@ column index name will be used as the name of the inserted column:
17281722
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
17291723
return pd.Series(result, name="metrics")
17301724
1731-
result = df.groupby("a")[["b", "c"]].apply(compute_metrics)
1725+
result = df.groupby("a").apply(compute_metrics)
17321726
17331727
result
17341728

doc/source/whatsnew/v0.14.0.rst

+5-17
Original file line numberDiff line numberDiff line change
@@ -328,25 +328,13 @@ More consistent behavior for some groupby methods:
328328

329329
- groupby ``head`` and ``tail`` now act more like ``filter`` rather than an aggregation:
330330

331-
.. code-block:: ipython
332-
333-
In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
334-
335-
In [2]: g = df.groupby('A')
336-
337-
In [3]: g.head(1) # filters DataFrame
338-
Out[3]:
339-
A B
340-
0 1 2
341-
2 5 6
331+
.. ipython:: python
342332
343-
In [4]: g.apply(lambda x: x.head(1)) # used to simply fall-through
344-
Out[4]:
345-
A B
346-
A
347-
1 0 1 2
348-
5 2 5 6
333+
df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
334+
g = df.groupby('A')
335+
g.head(1) # filters DataFrame
349336
337+
g.apply(lambda x: x.head(1)) # used to simply fall-through
350338
351339
- groupby head and tail respect column selection:
352340

doc/source/whatsnew/v0.18.1.rst

+6-87
Original file line numberDiff line numberDiff line change
@@ -77,52 +77,9 @@ Previously you would have to do this to get a rolling window mean per-group:
7777
df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
7878
df
7979
80-
.. code-block:: ipython
80+
.. ipython:: python
8181
82-
In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
83-
Out[1]:
84-
A
85-
1 0 NaN
86-
1 NaN
87-
2 NaN
88-
3 1.5
89-
4 2.5
90-
5 3.5
91-
6 4.5
92-
7 5.5
93-
8 6.5
94-
9 7.5
95-
10 8.5
96-
11 9.5
97-
12 10.5
98-
13 11.5
99-
14 12.5
100-
15 13.5
101-
16 14.5
102-
17 15.5
103-
18 16.5
104-
19 17.5
105-
2 20 NaN
106-
21 NaN
107-
22 NaN
108-
23 21.5
109-
24 22.5
110-
25 23.5
111-
26 24.5
112-
27 25.5
113-
28 26.5
114-
29 27.5
115-
30 28.5
116-
31 29.5
117-
3 32 NaN
118-
33 NaN
119-
34 NaN
120-
35 33.5
121-
36 34.5
122-
37 35.5
123-
38 36.5
124-
39 37.5
125-
Name: B, dtype: float64
82+
df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
12683
12784
Now you can do:
12885

@@ -144,53 +101,15 @@ For ``.resample(..)`` type of operations, previously you would have to:
144101
145102
df
146103
147-
.. code-block:: ipython
104+
.. ipython:: python
148105
149-
In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
150-
Out[1]:
151-
group val
152-
group date
153-
1 2016-01-03 1 5
154-
2016-01-04 1 5
155-
2016-01-05 1 5
156-
2016-01-06 1 5
157-
2016-01-07 1 5
158-
2016-01-08 1 5
159-
2016-01-09 1 5
160-
2016-01-10 1 6
161-
2 2016-01-17 2 7
162-
2016-01-18 2 7
163-
2016-01-19 2 7
164-
2016-01-20 2 7
165-
2016-01-21 2 7
166-
2016-01-22 2 7
167-
2016-01-23 2 7
168-
2016-01-24 2 8
106+
df.groupby("group").apply(lambda x: x.resample("1D").ffill())
169107
170108
Now you can do:
171109

172-
.. code-block:: ipython
110+
.. ipython:: python
173111
174-
In[1]: df.groupby("group").resample("1D").ffill()
175-
Out[1]:
176-
group val
177-
group date
178-
1 2016-01-03 1 5
179-
2016-01-04 1 5
180-
2016-01-05 1 5
181-
2016-01-06 1 5
182-
2016-01-07 1 5
183-
2016-01-08 1 5
184-
2016-01-09 1 5
185-
2016-01-10 1 6
186-
2 2016-01-17 2 7
187-
2016-01-18 2 7
188-
2016-01-19 2 7
189-
2016-01-20 2 7
190-
2016-01-21 2 7
191-
2016-01-22 2 7
192-
2016-01-23 2 7
193-
2016-01-24 2 8
112+
df.groupby("group").resample("1D").ffill()
194113
195114
.. _whatsnew_0181.enhancements.method_chain:
196115

pandas/core/frame.py

+13-13
Original file line numberDiff line numberDiff line change
@@ -8583,20 +8583,20 @@ def update(
85838583
>>> df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
85848584
... 'Parrot', 'Parrot'],
85858585
... 'Max Speed': [380., 370., 24., 26.]})
8586-
>>> df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
8587-
Max Speed
8586+
>>> df.groupby("Animal", group_keys=True).apply(lambda x: x)
8587+
Animal Max Speed
85888588
Animal
8589-
Falcon 0 380.0
8590-
1 370.0
8591-
Parrot 2 24.0
8592-
3 26.0
8593-
8594-
>>> df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
8595-
Max Speed
8596-
0 380.0
8597-
1 370.0
8598-
2 24.0
8599-
3 26.0
8589+
Falcon 0 Falcon 380.0
8590+
1 Falcon 370.0
8591+
Parrot 2 Parrot 24.0
8592+
3 Parrot 26.0
8593+
8594+
>>> df.groupby("Animal", group_keys=False).apply(lambda x: x)
8595+
Animal Max Speed
8596+
0 Falcon 380.0
8597+
1 Falcon 370.0
8598+
2 Parrot 24.0
8599+
3 Parrot 26.0
86008600
"""
86018601
)
86028602
)

pandas/core/groupby/groupby.py

+30-50
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ class providing the base-class of operations.
260260
each group together into a Series, including setting the index as
261261
appropriate:
262262
263-
>>> g1[['B', 'C']].apply(lambda x: x.C.max() - x.B.min())
263+
>>> g1.apply(lambda x: x.C.max() - x.B.min())
264264
A
265265
a 5
266266
b 2
@@ -1488,16 +1488,6 @@ def f(g):
14881488
with option_context("mode.chained_assignment", None):
14891489
try:
14901490
result = self._python_apply_general(f, self._selected_obj)
1491-
if (
1492-
not isinstance(self.obj, Series)
1493-
and self._selection is None
1494-
and self._selected_obj.shape != self._obj_with_exclusions.shape
1495-
):
1496-
warnings.warn(
1497-
message=_apply_groupings_depr.format(type(self).__name__),
1498-
category=FutureWarning,
1499-
stacklevel=find_stack_level(),
1500-
)
15011491
except TypeError:
15021492
# gh-20949
15031493
# try again, with .apply acting as a filtering
@@ -2659,55 +2649,55 @@ def resample(self, rule, *args, **kwargs):
26592649
Downsample the DataFrame into 3 minute bins and sum the values of
26602650
the timestamps falling into a bin.
26612651
2662-
>>> df.groupby('a')[['b']].resample('3T').sum()
2663-
b
2652+
>>> df.groupby('a').resample('3T').sum()
2653+
a b
26642654
a
2665-
0 2000-01-01 00:00:00 2
2666-
2000-01-01 00:03:00 1
2667-
5 2000-01-01 00:00:00 1
2655+
0 2000-01-01 00:00:00 0 2
2656+
2000-01-01 00:03:00 0 1
2657+
5 2000-01-01 00:00:00 5 1
26682658
26692659
Upsample the series into 30 second bins.
26702660
2671-
>>> df.groupby('a')[['b']].resample('30S').sum()
2672-
b
2661+
>>> df.groupby('a').resample('30S').sum()
2662+
a b
26732663
a
2674-
0 2000-01-01 00:00:00 1
2675-
2000-01-01 00:00:30 0
2676-
2000-01-01 00:01:00 1
2677-
2000-01-01 00:01:30 0
2678-
2000-01-01 00:02:00 0
2679-
2000-01-01 00:02:30 0
2680-
2000-01-01 00:03:00 1
2681-
5 2000-01-01 00:02:00 1
2664+
0 2000-01-01 00:00:00 0 1
2665+
2000-01-01 00:00:30 0 0
2666+
2000-01-01 00:01:00 0 1
2667+
2000-01-01 00:01:30 0 0
2668+
2000-01-01 00:02:00 0 0
2669+
2000-01-01 00:02:30 0 0
2670+
2000-01-01 00:03:00 0 1
2671+
5 2000-01-01 00:02:00 5 1
26822672
26832673
Resample by month. Values are assigned to the month of the period.
26842674
2685-
>>> df.groupby('a')[['b']].resample('M').sum()
2686-
b
2675+
>>> df.groupby('a').resample('M').sum()
2676+
a b
26872677
a
2688-
0 2000-01-31 3
2689-
5 2000-01-31 1
2678+
0 2000-01-31 0 3
2679+
5 2000-01-31 5 1
26902680
26912681
Downsample the series into 3 minute bins as above, but close the right
26922682
side of the bin interval.
26932683
2694-
>>> df.groupby('a')[['b']].resample('3T', closed='right').sum()
2695-
b
2684+
>>> df.groupby('a').resample('3T', closed='right').sum()
2685+
a b
26962686
a
2697-
0 1999-12-31 23:57:00 1
2698-
2000-01-01 00:00:00 2
2699-
5 2000-01-01 00:00:00 1
2687+
0 1999-12-31 23:57:00 0 1
2688+
2000-01-01 00:00:00 0 2
2689+
5 2000-01-01 00:00:00 5 1
27002690
27012691
Downsample the series into 3 minute bins and close the right side of
27022692
the bin interval, but label each bin using the right edge instead of
27032693
the left.
27042694
2705-
>>> df.groupby('a')[['b']].resample('3T', closed='right', label='right').sum()
2706-
b
2695+
>>> df.groupby('a').resample('3T', closed='right', label='right').sum()
2696+
a b
27072697
a
2708-
0 2000-01-01 00:00:00 1
2709-
2000-01-01 00:03:00 2
2710-
5 2000-01-01 00:03:00 1
2698+
0 2000-01-01 00:00:00 0 1
2699+
2000-01-01 00:03:00 0 2
2700+
5 2000-01-01 00:03:00 5 1
27112701
"""
27122702
from pandas.core.resample import get_resampler_for_grouping
27132703

@@ -4329,13 +4319,3 @@ def _insert_quantile_level(idx: Index, qs: npt.NDArray[np.float64]) -> MultiInde
43294319
else:
43304320
mi = MultiIndex.from_product([idx, qs])
43314321
return mi
4332-
4333-
4334-
# GH#7155
4335-
_apply_groupings_depr = (
4336-
"{}.apply operated on the grouping columns. This behavior is deprecated, "
4337-
"and in a future version of pandas the grouping columns will be excluded "
4338-
"from the operation. Select the columns to operate on after groupby to "
4339-
"either explicitly include or exclude the groupings and silence "
4340-
"this warning."
4341-
)

0 commit comments

Comments
 (0)