Skip to content

DEPR: DataFrameGroupBy.apply operating on the group keys #52477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 12, 2023
4 changes: 2 additions & 2 deletions doc/source/user_guide/cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
df
# List the size of the animals with the highest weight.
df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
df.groupby("animal")[["size", "weight"]].apply(lambda subf: subf["size"][subf["weight"].idxmax()])
`Using get_group
<https://stackoverflow.com/questions/14734533/how-to-access-pandas-groupby-dataframe-by-key>`__
Expand All @@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])
expected_df = gb.apply(GrowUp)
expected_df = gb[["size", "weight"]].apply(GrowUp)
expected_df
`Expanding apply
Expand Down
8 changes: 4 additions & 4 deletions doc/source/user_guide/groupby.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1067,7 +1067,7 @@ missing values with the ``ffill()`` method.
).set_index("date")
df_re

df_re.groupby("group").resample("1D").ffill()
df_re.groupby("group")[["val"]].resample("1D").ffill()

.. _groupby.filter:

Expand Down Expand Up @@ -1233,13 +1233,13 @@ the argument ``group_keys`` which defaults to ``True``. Compare

.. ipython:: python

df.groupby("A", group_keys=True).apply(lambda x: x)
df.groupby("A", group_keys=True)[["B", "C", "D"]].apply(lambda x: x)

with

.. ipython:: python

df.groupby("A", group_keys=False).apply(lambda x: x)
df.groupby("A", group_keys=False)[["B", "C", "D"]].apply(lambda x: x)


Numba Accelerated Routines
Expand Down Expand Up @@ -1722,7 +1722,7 @@ column index name will be used as the name of the inserted column:
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
return pd.Series(result, name="metrics")

result = df.groupby("a").apply(compute_metrics)
result = df.groupby("a")[["b", "c"]].apply(compute_metrics)

result

Expand Down
22 changes: 17 additions & 5 deletions doc/source/whatsnew/v0.14.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -328,13 +328,25 @@ More consistent behavior for some groupby methods:

- groupby ``head`` and ``tail`` now act more like ``filter`` rather than an aggregation:

.. ipython:: python
.. code-block:: ipython
df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
g = df.groupby('A')
g.head(1) # filters DataFrame
In [1]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])
In [2]: g = df.groupby('A')
In [3]: g.head(1) # filters DataFrame
Out[3]:
A B
0 1 2
2 5 6
In [4]: g.apply(lambda x: x.head(1)) # used to simply fall-through
Out[4]:
A B
A
1 0 1 2
5 2 5 6
g.apply(lambda x: x.head(1)) # used to simply fall-through
- groupby head and tail respect column selection:

Expand Down
93 changes: 87 additions & 6 deletions doc/source/whatsnew/v0.18.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,52 @@ Previously you would have to do this to get a rolling window mean per-group:
df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)})
df
.. ipython:: python
.. code-block:: ipython
df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
In [1]: df.groupby("A").apply(lambda x: x.rolling(4).B.mean())
Out[1]:
A
1 0 NaN
1 NaN
2 NaN
3 1.5
4 2.5
5 3.5
6 4.5
7 5.5
8 6.5
9 7.5
10 8.5
11 9.5
12 10.5
13 11.5
14 12.5
15 13.5
16 14.5
17 15.5
18 16.5
19 17.5
2 20 NaN
21 NaN
22 NaN
23 21.5
24 22.5
25 23.5
26 24.5
27 25.5
28 26.5
29 27.5
30 28.5
31 29.5
3 32 NaN
33 NaN
34 NaN
35 33.5
36 34.5
37 35.5
38 36.5
39 37.5
Name: B, dtype: float64
Now you can do:

Expand All @@ -101,15 +144,53 @@ For ``.resample(..)`` type of operations, previously you would have to:
df
.. ipython:: python
.. code-block:: ipython
df.groupby("group").apply(lambda x: x.resample("1D").ffill())
In[1]: df.groupby("group").apply(lambda x: x.resample("1D").ffill())
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8
Now you can do:

.. ipython:: python
.. code-block:: ipython
df.groupby("group").resample("1D").ffill()
In[1]: df.groupby("group").resample("1D").ffill()
Out[1]:
group val
group date
1 2016-01-03 1 5
2016-01-04 1 5
2016-01-05 1 5
2016-01-06 1 5
2016-01-07 1 5
2016-01-08 1 5
2016-01-09 1 5
2016-01-10 1 6
2 2016-01-17 2 7
2016-01-18 2 7
2016-01-19 2 7
2016-01-20 2 7
2016-01-21 2 7
2016-01-22 2 7
2016-01-23 2 7
2016-01-24 2 8
.. _whatsnew_0181.enhancements.method_chain:

Expand Down