Skip to content

DOC: Add groupby-specific docstrings for take #50526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 9, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 158 additions & 2 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -892,13 +892,84 @@ def fillna(
)
return result

@doc(Series.take.__doc__)
def take(
self,
indices: TakeIndexer,
axis: Axis = 0,
**kwargs,
) -> Series:
"""
Return the elements in the given *positional* indices in each group.

This means that we are not indexing according to actual values in
the index attribute of the object. We are indexing according to the
actual position of the element in the object.

If a requested index does not exist for some group, this method will raise.
To get similar behavior that ignores indices that don't exist, see
:meth:`.SeriesGroupBy.nth`.

Parameters
----------
indices : array-like
An array of ints indicating which positions to take in each group.
axis : {0 or 'index', 1 or 'columns', None}, default 0
The axis on which to select elements. ``0`` means that we are
selecting rows, ``1`` means that we are selecting columns.
For `SeriesGroupBy` this parameter is unused and defaults to 0.
**kwargs
For compatibility with :meth:`numpy.take`. Has no effect on the
output.

Returns
-------
Series
A Series containing the elements taken from each group.

See Also
--------
Series.loc : Select a subset of a DataFrame by labels.
Series.iloc : Select a subset of a DataFrame by positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to reference Series.take (and same for DataFrame below)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - good idea.

numpy.take : Take elements from an array along an axis.
SeriesGroupBy.nth : Similar to take, won't raise if indices don't exist.

Examples
--------
>>> df = DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', np.nan),
... ('rabbit', 'mammal', 15.0)],
... columns=['name', 'class', 'max_speed'],
... index=[4, 3, 2, 1, 0])
>>> df
name class max_speed
4 falcon bird 389.0
3 parrot bird 24.0
2 lion mammal 80.5
1 monkey mammal NaN
0 rabbit mammal 15.0
>>> gb = df["name"].groupby([1, 1, 2, 2, 2])

Take elements at positions 0 and 1 along the axis 0 in each group (default).

>>> gb.take([0, 1])
1 4 falcon
3 parrot
2 2 lion
1 monkey
Name: name, dtype: object

We may take elements using negative integers for positive indices,
starting from the end of the object, just like with Python lists.

>>> gb.take([-1, -2])
1 3 parrot
4 falcon
2 0 rabbit
1 monkey
Name: name, dtype: object
"""
result = self._op_via_apply("take", indices=indices, axis=axis, **kwargs)
return result

Expand Down Expand Up @@ -2271,13 +2342,98 @@ def fillna(
)
return result

@doc(DataFrame.take.__doc__)
def take(
self,
indices: TakeIndexer,
axis: Axis | None = 0,
**kwargs,
) -> DataFrame:
"""
Return the elements in the given *positional* indices in each group.

This means that we are not indexing according to actual values in
the index attribute of the object. We are indexing according to the
actual position of the element in the object.

If a requested index does not exist for some group, this method will raise.
To get similar behavior that ignores indices that don't exist, see
:meth:`.DataFrameGroupBy.nth`.

Parameters
----------
indices : array-like
An array of ints indicating which positions to take.
axis : {0 or 'index', 1 or 'columns', None}, default 0
The axis on which to select elements. ``0`` means that we are
selecting rows, ``1`` means that we are selecting columns.
**kwargs
For compatibility with :meth:`numpy.take`. Has no effect on the
output.

Returns
-------
DataFrame
An DataFrame containing the elements taken from each group.

See Also
--------
DataFrame.loc : Select a subset of a DataFrame by labels.
DataFrame.iloc : Select a subset of a DataFrame by positions.
numpy.take : Take elements from an array along an axis.

Examples
--------
>>> df = DataFrame([('falcon', 'bird', 389.0),
... ('parrot', 'bird', 24.0),
... ('lion', 'mammal', 80.5),
... ('monkey', 'mammal', np.nan),
... ('rabbit', 'mammal', 15.0)],
... columns=['name', 'class', 'max_speed'],
... index=[4, 3, 2, 1, 0])
>>> df
name class max_speed
4 falcon bird 389.0
3 parrot bird 24.0
2 lion mammal 80.5
1 monkey mammal NaN
0 rabbit mammal 15.0
>>> gb = df.groupby([1, 1, 2, 2, 2])

Take elements at positions 0 and 1 along the axis 0 (default).

Note how the indices selected in the result do not correspond to
our input indices 0 and 1. That's because we are selecting the 0th
and 1st rows, not rows whose indices equal 0 and 1.

>>> gb.take([0, 1])
name class max_speed
1 4 falcon bird 389.0
3 parrot bird 24.0
2 2 lion mammal 80.5
1 monkey mammal NaN

The order of the specified indices influnces the order in the result.
Here, the order is swapped from the previous example.

>>> gb.take([0, 1])
name class max_speed
1 4 falcon bird 389.0
3 parrot bird 24.0
2 2 lion mammal 80.5
1 monkey mammal NaN

Take elements at indices 1 and 2 along the axis 1 (column selection).

We may take elements using negative integers for positive indices,
starting from the end of the object, just like with Python lists.

>>> gb.take([-1, -2])
name class max_speed
1 3 parrot bird 24.0
4 falcon bird 389.0
2 0 rabbit mammal 15.0
1 monkey mammal NaN
"""
result = self._op_via_apply("take", indices=indices, axis=axis, **kwargs)
return result

Expand Down