Skip to content

Commit a2f5815

Browse files
authored
DOC: Add to docs on group_keys in groupby.apply (#47185)
* DOC: Add to docs on group_keys in groupby.apply * Add link to user guide
1 parent 7c5c81e commit a2f5815

File tree

2 files changed

+59
-13
lines changed

2 files changed

+59
-13
lines changed

pandas/core/groupby/groupby.py

+53-10
Original file line numberDiff line numberDiff line change
@@ -188,21 +188,33 @@ class providing the base-class of operations.
188188
>>> df = pd.DataFrame({'A': 'a a b'.split(),
189189
... 'B': [1,2,3],
190190
... 'C': [4,6,5]})
191-
>>> g = df.groupby('A')
191+
>>> g1 = df.groupby('A', group_keys=False)
192+
>>> g2 = df.groupby('A', group_keys=True)
192193
193-
Notice that ``g`` has two groups, ``a`` and ``b``.
194-
Calling `apply` in various ways, we can get different grouping results:
194+
Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only
195+
differ in their ``group_keys`` argument. Calling `apply` in various ways,
196+
we can get different grouping results:
195197
196198
Example 1: below the function passed to `apply` takes a DataFrame as
197199
its argument and returns a DataFrame. `apply` combines the result for
198200
each group together into a new DataFrame:
199201
200-
>>> g[['B', 'C']].apply(lambda x: x / x.sum())
202+
>>> g1[['B', 'C']].apply(lambda x: x / x.sum())
201203
B C
202204
0 0.333333 0.4
203205
1 0.666667 0.6
204206
2 1.000000 1.0
205207
208+
In the above, the groups are not part of the index. We can have them included
209+
by using ``g2`` where ``group_keys=True``:
210+
211+
>>> g2[['B', 'C']].apply(lambda x: x / x.sum())
212+
B C
213+
A
214+
a 0 0.333333 0.4
215+
1 0.666667 0.6
216+
b 2 1.000000 1.0
217+
206218
Example 2: The function passed to `apply` takes a DataFrame as
207219
its argument and returns a Series. `apply` combines the result for
208220
each group together into a new DataFrame.
@@ -211,28 +223,41 @@ class providing the base-class of operations.
211223
212224
The resulting dtype will reflect the return value of the passed ``func``.
213225
214-
>>> g[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
226+
>>> g1[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
227+
B C
228+
A
229+
a 1.0 2.0
230+
b 0.0 0.0
231+
232+
>>> g2[['B', 'C']].apply(lambda x: x.astype(float).max() - x.min())
215233
B C
216234
A
217235
a 1.0 2.0
218236
b 0.0 0.0
219237
238+
The ``group_keys`` argument has no effect here because the result is not
239+
like-indexed (i.e. :ref:`a transform <groupby.transform>`) when compared
240+
to the input.
241+
220242
Example 3: The function passed to `apply` takes a DataFrame as
221243
its argument and returns a scalar. `apply` combines the result for
222244
each group together into a Series, including setting the index as
223245
appropriate:
224246
225-
>>> g.apply(lambda x: x.C.max() - x.B.min())
247+
>>> g1.apply(lambda x: x.C.max() - x.B.min())
226248
A
227249
a 5
228250
b 2
229251
dtype: int64""",
230252
"series_examples": """
231253
>>> s = pd.Series([0, 1, 2], index='a a b'.split())
232-
>>> g = s.groupby(s.index)
254+
>>> g1 = s.groupby(s.index, group_keys=False)
255+
>>> g2 = s.groupby(s.index, group_keys=True)
233256
234257
From ``s`` above we can see that ``g`` has two groups, ``a`` and ``b``.
235-
Calling `apply` in various ways, we can get different grouping results:
258+
Notice that ``g1`` have ``g2`` have two groups, ``a`` and ``b``, and only
259+
differ in their ``group_keys`` argument. Calling `apply` in various ways,
260+
we can get different grouping results:
236261
237262
Example 1: The function passed to `apply` takes a Series as
238263
its argument and returns a Series. `apply` combines the result for
@@ -242,18 +267,36 @@ class providing the base-class of operations.
242267
243268
The resulting dtype will reflect the return value of the passed ``func``.
244269
245-
>>> g.apply(lambda x: x*2 if x.name == 'a' else x/2)
270+
>>> g1.apply(lambda x: x*2 if x.name == 'a' else x/2)
246271
a 0.0
247272
a 2.0
248273
b 1.0
249274
dtype: float64
250275
276+
In the above, the groups are not part of the index. We can have them included
277+
by using ``g2`` where ``group_keys=True``:
278+
279+
>>> g2.apply(lambda x: x*2 if x.name == 'a' else x/2)
280+
a a 0.0
281+
a 2.0
282+
b b 1.0
283+
dtype: float64
284+
251285
Example 2: The function passed to `apply` takes a Series as
252286
its argument and returns a scalar. `apply` combines the result for
253287
each group together into a Series, including setting the index as
254288
appropriate:
255289
256-
>>> g.apply(lambda x: x.max() - x.min())
290+
>>> g1.apply(lambda x: x.max() - x.min())
291+
a 1
292+
b 0
293+
dtype: int64
294+
295+
The ``group_keys`` argument has no effect here because the result is not
296+
like-indexed (i.e. :ref:`a transform <groupby.transform>`) when compared
297+
to the input.
298+
299+
>>> g2.apply(lambda x: x.max() - x.min())
257300
a 1
258301
b 0
259302
dtype: int64""",

pandas/core/shared_docs.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -115,9 +115,12 @@
115115
Note this does not influence the order of observations within each
116116
group. Groupby preserves the order of rows within each group.
117117
group_keys : bool, optional
118-
When calling apply, add group keys to index to identify pieces.
119-
By default group keys are not included when the result's index
120-
(and column) labels match the inputs, and are included otherwise.
118+
When calling apply and the ``by`` argument produces a like-indexed
119+
(i.e. :ref:`a transform <groupby.transform>`) result, add group keys to
120+
index to identify pieces. By default group keys are not included
121+
when the result's index (and column) labels match the inputs, and
122+
are included otherwise. This argument has no effect if the result produced
123+
is not like-indexed with respect to the input.
121124
122125
.. versionchanged:: 1.5.0
123126

0 commit comments

Comments
 (0)