Skip to content

DOC: update the GroupBy.apply docstring #20098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 21, 2018

Conversation

ajdyka
Copy link
Contributor

@ajdyka ajdyka commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant
  • closes Docs: broken groupby.GroupBy.apply example #19337

Please include the output of the validation script below between the "```" ticks:

################################################################################
################ Docstring (pandas.core.groupby.GroupBy.apply)  ################
################################################################################

Apply function ``func``  group-wise and combine the results together.

The function passed to ``apply`` must take a dataframe as its first
argument and return a dataframe, a series or a scalar. ``apply`` will
then take care of combining the results back together into a single
dataframe or series. ``apply`` is therefore a highly flexible
grouping method.

While ``apply`` is a very flexible method, its downside is that
using it can be quite a bit slower than using more specific methods.
Pandas offers a wide range of method that will be much faster
than using ``apply`` for their specific purposes, so try to use them
before reaching for ``apply``.

Parameters
----------
func : function
    A callable that takes a dataframe as its first argument, and
    returns a dataframe, a series or a scalar. In addition the
    callable may take positional and keyword arguments.
args : tuple
    Optional positional and keyword arguments to pass to ``func``.
kwargs : dict
    Optional positional and keyword arguments to pass to ``func``.

Returns
-------
applied : Series or DataFrame

Notes
-----
In the current implementation ``apply`` calls func twice on the
first group to decide whether it can take a fast or slow code
path. This can lead to unexpected behavior if func has
side-effects, as they will take effect twice for the first
group.

Examples
--------

>>> df = pd.DataFrame({'A': 'a a b'.split(), 'B': [1,2,3], 'C': [4,6, 5]})
>>> g = df.groupby('A')

From ``df`` above we can see that ``g`` has two groups, ``a``, ``b``.
Calling ``apply`` in various ways, we can get different grouping results:

Example 1: below the function passed to ``apply`` takes a dataframe as
its argument and returns a dataframe. ``apply`` combines the result for
each group together into a new dataframe:

>>> g[['B','C']].apply(lambda x: x / x.sum())
          B    C
0  0.333333  0.4
1  0.666667  0.6
2  1.000000  1.0

Example 2: The function passed to ``apply`` takes a dataframe as
its argument and returns a series.  ``apply`` combines the result for
each group together into a new dataframe:

>>> g[['B','C']].apply(lambda x: x.max() - x.min())
   B  C
A
a  1  2
b  0  0

Example 3: The function passed to ``apply`` takes a dataframe as
its argument and returns a scalar. ``apply`` combines the result for
each group together into a series, including setting the index as
appropriate:

>>> g.apply(lambda x: x.C.max() - x.B.min())
A
a    5
b    2
dtype: int64

See also
--------
pipe : Apply function to the full GroupBy object instead of to each
    group.
aggregate : Apply aggregate function to the GroupBy object.
transform : Apply function column-by-column to the GroupBy object.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.core.groupby.GroupBy.apply" correct. :)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docstring!

Can you change the occurences of ``func`` to `func`? (we use single backtick quotes for parameter names)

args, kwargs : tuple and dict
Optional positional and keyword arguments to pass to ``func``
callable may take positional and keyword arguments.
args : tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this *args ? (and you can leave out the 'tuple')

callable may take positional and keyword arguments.
args : tuple
Optional positional and keyword arguments to pass to ``func``.
kwargs : dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the same **kwargs (and without dict)

Optional positional and keyword arguments to pass to ``func``
callable may take positional and keyword arguments.
args : tuple
Optional positional and keyword arguments to pass to ``func``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This are only the positional ones, and below the keyword ones.

I think it would also be a good idea to add an example of this in the Examples section

@codecov
Copy link

codecov bot commented Mar 10, 2018

Codecov Report

Merging #20098 into master will decrease coverage by 0.26%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20098      +/-   ##
==========================================
- Coverage   91.99%   91.73%   -0.27%     
==========================================
  Files         167      150      -17     
  Lines       50578    49168    -1410     
==========================================
- Hits        46530    45102    -1428     
- Misses       4048     4066      +18
Flag Coverage Δ
#multiple 90.11% <ø> (-0.29%) ⬇️
#single 41.86% <ø> (-0.31%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby.py 92.14% <ø> (ø)
pandas/plotting/_compat.py 62% <0%> (-28.91%) ⬇️
pandas/io/s3.py 72.72% <0%> (-13.64%) ⬇️
pandas/core/arrays/base.py 74.35% <0%> (-13.5%) ⬇️
pandas/util/_doctools.py 0% <0%> (-12.88%) ⬇️
pandas/io/formats/terminal.py 16.43% <0%> (-4.55%) ⬇️
pandas/io/formats/printing.py 89.38% <0%> (-3.71%) ⬇️
pandas/io/html.py 86.6% <0%> (-2.57%) ⬇️
pandas/core/dtypes/missing.py 91.07% <0%> (-2.5%) ⬇️
pandas/io/formats/format.py 96.26% <0%> (-1.99%) ⬇️
... and 98 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 322dbf4...cf3b3e9. Read the comment docs.

@@ -118,12 +120,12 @@
Examples
--------
{examples}

See also
--------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reference Series/DataFrame.apply as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add these to the See Also?

@@ -216,17 +216,17 @@
----------
func : callable or tuple of (callable, string)
Function to apply to this %(klass)s object or, alternatively,
a ``(callable, data_keyword)`` tuple where ``data_keyword`` is a
string indicating the keyword of ``callable`` that expects the
a `(callable, data_keyword)` tuple where `data_keyword` is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this tuple should be double-backticks, since it's a code snippet.

@@ -394,7 +394,7 @@ class Grouper(object):
Examples
--------

Syntactic sugar for ``df.groupby('A')``
Syntactic sugar for `df.groupby('A')`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

@@ -1641,7 +1641,7 @@ def nth(self, n, dropna=None):
1 NaN
2 NaN

Specifying ``as_index=False`` in ``groupby`` keeps the original index.
Specifying `as_index=False` in `groupby` keeps the original index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backtick for as_index=False

@@ -2067,7 +2067,7 @@ def head(self, n=5):
"""
Returns first n rows of each group.

Essentially equivalent to ``.apply(lambda x: x.head(n))``,
Essentially equivalent to `.apply(lambda x: x.head(n))`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks

@@ -2094,7 +2094,7 @@ def tail(self, n=5):
"""
Returns last n rows of each group

Essentially equivalent to ``.apply(lambda x: x.tail(n))``,
Essentially equivalent to `.apply(lambda x: x.tail(n))`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backtics

@@ -118,12 +120,12 @@
Examples
--------
{examples}

See also
--------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add these to the See Also?

args, kwargs : tuple and dict
Optional positional and keyword arguments to pass to ``func``
callable may take positional and keyword arguments.
*args : Optional positional and keyword arguments to pass to `func`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be formatted as

*args
    Additional positional arguments are passed through to `func`.
**kwargs
    Additional keyword arguments are passed through to `func`.

* Undo some single backticking
* PEP8 on examples
* Reword

[ci skip]
@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 15, 2018
@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

can you rebase this

@jreback jreback removed this from the 0.23.0 milestone Apr 14, 2018
@WillAyd WillAyd merged commit 8c4ec65 into pandas-dev:master Jul 21, 2018
@WillAyd
Copy link
Member

WillAyd commented Jul 21, 2018

Thanks @ajdyka !

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs: broken groupby.GroupBy.apply example
5 participants