Skip to content

DOC GH22893 Fix docstring of groupby in pandas/core/generic.py #22920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 3, 2018
96 changes: 67 additions & 29 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7063,8 +7063,12 @@ def clip_lower(self, threshold, axis=None, inplace=False):
def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
group_keys=True, squeeze=False, observed=False, **kwargs):
"""
Group series using mapper (dict or key function, apply given function
to group, return result as series) or by a series of columns.
Group series using a mapper or by a series of columns.

A groupby operation involves some combination of splitting the
object, applying a function, and combining the results. This can be
used to group large amounts of data and compute operations on these
groups.

Parameters
----------
Expand All @@ -7077,54 +7081,88 @@ def groupby(self, by=None, axis=0, level=None, as_index=True, sort=True,
values are used as-is determine the groups. A label or list of
labels may be passed to group by the columns in ``self``. Notice
that a tuple is interpreted a (single) key.
axis : int, default 0
axis : {0 or 'index', 1 or 'columns'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the default

Split along rows (0) or columns (1).
level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular
level or levels
as_index : boolean, default True
level or levels.
as_index : bool, default True
For aggregated output, return object with group labels as the
index. Only relevant for DataFrame input. as_index=False is
effectively "SQL-style" grouped output
sort : boolean, default True
effectively "SQL-style" grouped output.
sort : bool, default True
Sort group keys. Get better performance by turning this off.
Note this does not influence the order of observations within each
group. groupby preserves the order of rows within each group.
group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces
squeeze : boolean, default False
reduce the dimensionality of the return type if possible,
otherwise return a consistent type
observed : boolean, default False
This only applies if any of the groupers are Categoricals
group. Groupby preserves the order of rows within each group.
group_keys : bool, default True
When calling apply, add group keys to index to identify pieces.
squeeze : bool, default False
Reduce the dimensionality of the return type if possible,
otherwise return a consistent type.
observed : bool, default False
This only applies if any of the groupers are Categoricals.
If True: only show observed values for categorical groupers.
If False: show all values for categorical groupers.

.. versionadded:: 0.23.0

**kwargs
Optional, only accepts keyword argument 'mutated' and is passed
to groupby.

Returns
-------
GroupBy object
DataFrameGroupBy object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is also used by Series.groupby(). Can you check if that also returns `DataFrameGroupBy'? Also, not sure if that's being returned when a single label (not a list of labels with one value) is being used.

The word object is unnecessary, we try to keep just the type.

An object that contains information about the groups.

Examples
See Also
--------
DataFrame results

>>> data.groupby(func, axis=0).mean()
>>> data.groupby(['col1', 'col2'])['col3'].mean()

DataFrame with hierarchical index
resample : Convenience method for frequency conversion and resampling
of time series.

>>> data.groupby(['col1', 'col2']).mean()
Examples
--------
>>> df = pd.DataFrame({'col1' : ['A', 'A', 'B', 'B'],
... 'col2' : [1, 2, 3, 4]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid examples with a, b, foo, bar... as arbitrary data does not illustrate the behavior as good as meaningful examples.

Can you use something like: https://github.com/pandas-dev/pandas/blob/master/pandas/core/generic.py#L2514

>>> df
col1 col2
0 A 1
1 A 2
2 B 3
3 B 4
>>> df.groupby(['col1']).mean()
col2
col1
A 1.5
B 3.5

**Hierarchical Indexes**

We can groupby different levels of a hierarchical index
using the `level` parameter:

>>> arrays = [np.array(['A', 'A', 'B', 'B']),
... np.array(['foo', 'bar', 'foo', 'bar'])]
>>> df = pd.DataFrame(np.array([1, 2, 3, 4]), index=arrays)
>>> df
0
A foo 1
bar 2
B foo 3
bar 4
>>> df.groupby(level=0).mean()
0
A 1.5
B 3.5
>>> df.groupby(level=1).mean()
0
bar 3
foo 2

Notes
-----
See the `user guide
<http://pandas.pydata.org/pandas-docs/stable/groupby.html>`_ for more.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move Notes before the examples too


See also
--------
resample : Convenience method for frequency conversion and resampling
of time series.
"""
from pandas.core.groupby.groupby import groupby

Expand Down