DOC: Update Series min and max docstring. GH22459 #22554

Roald87 · 2018-08-31T16:36:30Z

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

Doesn't pass all the tests, but can figure out where they come from.

Errors found:
Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)
Use only one blank line to separate sections or paragraphs

Also I find the way the axis are defined quite confusing. My initial thought was that if you say axis='index' it would take apply your function on a row wise basis.

datapythonista

I'd prefer to have just a set of tests for min and max that illustrates both, to reduce te size of the code.

Also, can you explain better why implementing _make_min_max_function was needed?

Roald87 · 2018-09-01T10:08:03Z

Thanks for the comments @datapythonista.

I'd prefer to have just a set of tests for min and max that illustrates both, to reduce te size of the code.

What tests do you mean? Or do you mean the examples? I wasn't sure how to do that, since the answer of an example is hard coded. Or is there a way to generate the answers when the docs are compiled? As in

By default NA's are ignored.

>>> s = pd.Series([1, np.nan, 4, 3])
>>> s.%(outname)s()
%(ans)s

Also, can you explain better why implementing _make_min_max_function was needed?

Because I thought I couldn't use the other functions.

_make_min_count_stat_function has an extra variable min_count which is not used by min and max.
With _make_stat_function you can't pass an example. And since this method is used in several places I wasn't sure what would happen if added examples to the @Substitution of this method. Also I wasn't sure how to pass a See also section using this function.
The other _make_... are for very different functions or have extra input parameters which are not used by min and max.

datapythonista · 2018-09-03T10:56:06Z

Ok, I understand. When I said having just a set of examples (that's what I meant by tests), I mean showing both methods in both docstrings. Instead of showing only examples for min in the min docstring, showing both min and max (and same for max).

Then, if I'm not missing anything, we can simply set _min_max_doc as the docstring without a function. Am I missing something?

Roald87 · 2018-09-04T17:58:00Z

@datapythonista ok. I'll start working on it.

Then, if I'm not missing anything, we can simply set _min_max_doc as the docstring without a function. Am I missing something?

Yeah I think so.

Roald87 · 2018-09-04T20:19:21Z

@datapythonista so you mean change

cls.max = _make_min_max_function(
            cls, 'max', name, name2, axis_descr,
            'maximum', nanops.nanmax, _max_examples)

into

cls.max = _min_max_doc

And same for cls.min?

If we do this, then it is not possible to add what the method does to the docstring summary. As in, now it says Return the %(desc)s of the values in the object.. If that changes to a general docstring for both min and max, you get a summary which reads Return the minimum or maximum of the values in the object., or something along those lines. And a similar statement on how to retrieve the index of the min and max. You are OK with that?

datapythonista · 2018-09-04T22:09:42Z

I took a closer look, and I understand what you did now. This code (the original) is a bit tricky. Let me take a closer look and see if we can make it simpler (I'd like to avoid creating the new function _make_min_max_function to not complicate it even more).

Roald87 · 2018-09-16T05:47:12Z

@datapythonista What about having a single general docstring for all methods, something like this:

_num_doc = """
%(short_summary)s

%(extended_summary)s

Parameters
----------
axis : %(axis_descr)s, default 0
    Indicate which axis should be reduced. Not implemented for Series.

    * 0 / ‘index’ : reduce the index, return a Series whose index is the
      original column labels.
    * 1 / ‘columns’ : reduce the columns, return a Series whose index is the
      original index.
    %(bool_axis)s

    For a DataFrame the value 0 applies %(fname)s on each column, and 1
    applies it on each row.
skipna : bool, default True
%(ddof)s
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a %(axis_name1)s.
%(min_count)s
numeric_only : bool, default None
    Include only float, int, bool columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
    Exclude NA/null values when computing the result.
**kwargs : any, default None
    Additional keyword arguments.

Returns
-------
%(fname)s : %(axis_name1)s or %(axis_name2)s (if level specified)

%(see_also)s
%(examples)s
"""

This can than be used for all functions, also the ones which have a ddof or min_count argument and even the bool methods any and all. Thus you get rid of _num_ddof_doc, _bool_doc and of course _min_max_doc.

I took the liberty to rename some variables in the doc string such that it easier to see what is what. name -> fname, name1 -> axis_name1 and name2 -> axis_name2. Finally I've added two more variables short_summary and long_summary.

In order to use this new docstring I've updated _make_stat_function as follows:

def _make_stat_function(cls, long_name, axis_name1, axis_name2, axis_descr,
                           short_summary, extended_summary, f,
                           examples, see_also):
    @Substitution(fname=cls.__name__, short_summary=short_summary, 
                  extended_summary=extended_summary, long_name=long_name,
                  axis_name1=axis_name1, axis_name2=axis_name2,
                  axis_descr=axis_descr, bool_axis='', ddof='',
                  min_count='', examples=examples, see_also=see_also)
    @Appender(_num_doc)
    def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None,
                  **kwargs):
        nv.validate_stat_func(tuple(), kwargs, fname=cls.__name__)
        if skipna is None:
            skipna = True
        if axis is None:
            axis = self._stat_axis_number
        if level is not None:
            return self._agg_by_level(cls.__name__, axis=axis, level=level,
                                      skipna=skipna)
        return self._reduce(f, cls.__name__, axis=axis, skipna=skipna,
                            numeric_only=numeric_only)

    return set_function_name(stat_func, cls.__name__, cls)

In the updated method I use the new variables and I've omitted the name variable and replaced it with cls.__name__. Furthermore I've added a long_name variable, which can be used as follows:

cls.max = _make_stat_function(
    cls, 'maximum', name, name2, axis_descr,
    _stat_short_summary, _min_max_extended_summary, nanops.nanmax,
    _max_examples, _min_max_see_also)

where _stat_short_summary is:

_stat_short_summary = """\
Return the %(long_name)s of the values in the object.
"""

and makes use of long_name. stat_short_summary can then also be used by methods std, mean, var, mad, etc.

Shall I start implementing this for all methods which use _make_stat_function?

Roald87 · 2018-09-26T20:14:04Z

@datapythonista ?

- Changed `_make_min_max_function` such that in the future it can be used to replace `_bool_doc`, `_num_doc` and `_num_ddof_doc`.

pep8speaks · 2018-10-13T20:28:02Z

Hello @Roald87! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/generic.py !

codecov · 2018-10-13T20:28:09Z

Codecov Report

Merging #22554 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22554      +/-   ##
==========================================
+ Coverage   92.29%   92.31%   +0.01%     
==========================================
  Files         161      161              
  Lines       51501    51473      -28     
==========================================
- Hits        47534    47517      -17     
+ Misses       3967     3956      -11

Flag	Coverage Δ
#multiple	`90.7% <100%> (+0.01%)`	⬆️
#single	`42.43% <84.61%> (+0.01%)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.84% <100%> (ø)`	⬆️
pandas/plotting/_misc.py	`38.68% <0%> (-0.31%)`	⬇️
pandas/core/arrays/datetimes.py	`98.37% <0%> (-0.14%)`	⬇️
pandas/core/ops.py	`94.14% <0%> (-0.14%)`	⬇️
pandas/io/formats/html.py	`97.63% <0%> (-0.05%)`	⬇️
pandas/core/dtypes/dtypes.py	`95.59% <0%> (-0.03%)`	⬇️
pandas/core/arrays/sparse.py	`91.93% <0%> (-0.02%)`	⬇️
pandas/io/json/normalize.py	`96.87% <0%> (ø)`	⬆️
pandas/io/formats/format.py	`97.76% <0%> (ø)`	⬆️
pandas/tseries/offsets.py	`96.98% <0%> (ø)`	⬆️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7294dd...e3e0e1b. Read the comment docs.

…x-doc-max

Roald87 · 2018-10-18T13:36:16Z

@datapythonista I made the following changes:

Now there is a single example for .min and .max
I made some changes to how the docstring of _make_min_max_function gets assembled. I made these changes such that _stats_parameters_and_returns can replace _num_doc, _num_ddof_doc and _bool_doc. If you approve of this, I can rename _make_min_max_function to _make_stat_function and remove the original _make_stat_function. And then also change all the functions which make use of the original _make_stat_function.

Also I pulled the latest changes from the original master branch, however this seems to have bricked pandas. I now get the following when I do import pandas:

Traceback (most recent call last):
  File "/home/roald/virtualenvs/pandas-dev/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-7dd3504c366f>", line 1, in <module>
    import pandas as pd
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/api.py", line 10, in <module>
    from pandas.core.groupby import Grouper
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.groupby import GroupBy  # flake8: noqa
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/groupby/groupby.py", line 38, in <module>
    from pandas.core.index import Index, MultiIndex
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/index.py", line 2, in <module>
    from pandas.core.indexes.api import *
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/api.py", line 11, in <module>
    from pandas.core.indexes.interval import IntervalIndex  # noqa
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/interval.py", line 34, in <module>
    from pandas.core.indexes.datetimes import date_range, DatetimeIndex
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/datetimes.py", line 39, in <module>
    from pandas.core.indexes.datetimelike import (
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/datetimelike.py", line 14, in <module>
    from pandas._libs.tslibs.timestamps import round_nsint64, RoundTo
ImportError: cannot import name 'round_nsint64'

Is it generally a good idea to get the latest version while working on a PR or not?

datapythonista · 2018-11-03T16:21:51Z

Sorry for the delay. Did you try to recompile the C parts of pandas python setup.py build_ext --inplace?

I guess that should fix the error. As far as you start the PR with a fresh version of pandas, and there are no conflicts (like now), I don't think fetching the new changes adds a lot of value. It surely don't harm, and it's good to be updated, but in most cases won't make a difference.

If you can try to recompile pandas, and also fetch from master to fix the conflict, we can move forward with this.

datapythonista · 2018-11-03T17:02:08Z

I'd prefer to touch as less code as possible for these changes. I think with the current make_stats_func you can already provide the summary and examples as summary, and the rest is provided. What else is missing? If all the changes are for the see also section, let's leave this for another PR. Can you check how sum is adding the examples to the docstring, do it in the same way, and undo all the changes to the code in the PR.

Sorry, but I think pandas code is complex enough, let's try to make the changes without adding anything else.

datapythonista · 2018-11-04T10:04:33Z

I think this contains conflicting changes: #23338

Can you take a look, and proceed considering that this PR (#23338) will be merged soon.

WillAyd · 2018-11-23T03:35:14Z

Closing as stale. Also conflicts with some other development. Ping if you'd like to pick it back up

datapythonista · 2018-11-26T10:17:26Z

@Roald87 would be great to get your work in this PR merged. Do you have time to merge master into your branch, and address the last comments? Please reopen the PR if that's the case.

Roald87 · 2018-11-27T07:29:44Z

@datapythonista Yeah I'll try to do it this week.

# Conflicts: # pandas/core/generic.py

Roald87 · 2018-11-27T20:33:13Z

@WillAyd Could you open it again?

@datapythonista I've made the changes. I see the a _max_examples was added. I used that one now for both the min and max functions. Or should I add some of my examples from _min_max_examples into _max_examples?

Also can't generate the html docs anymore. I get the following:

>>> python make.py --single DataFrame.max
[autosummary] generating autosummary for: source/index.rst
[autosummary] writing to source/generated_single
Running Sphinx v1.7.7

Extension error:
Could not import extension contributors (exception: No module named 'git')
gio: file:///home/roald/Documents/Projects/pandas/doc/build/html/generated_single/pandas.DataFrame.max.html: Error when getting information for file “/home/roald/Documents/Projects/pandas/doc/build/html/generated_single/pandas.DataFrame.max.html”: No such file or directory

datapythonista · 2018-11-27T23:22:31Z

conda env update should fix it, there is a new library required in requiremets.yml, I think it's named gitpy

datapythonista · 2018-11-27T23:23:27Z

I think reusing the examples for min and max is better

Roald87 · 2018-11-28T19:03:18Z

@datapythonista OK done.

It was GitPython which was missing.

datapythonista · 2018-12-09T20:46:37Z

If I understood it correctly, we're finally not merging this, as it's superseded by #23338. Feel free to reopen if I misunderstood.

Roald87 added 3 commits August 28, 2018 22:39

DOC: Update Series min and max docstring. GH22459

d7c8cf3

Clarified use of index and column in axis.

03338db

Fixed formatting, passes PEP8.

d5145cd

datapythonista added the Docs label Sep 1, 2018

datapythonista requested changes Sep 1, 2018

View reviewed changes

- .min and .max methods now have the same examples.

3f681fa

- Changed `_make_min_max_function` such that in the future it can be used to replace `_bool_doc`, `_num_doc` and `_num_ddof_doc`.

Roald87 added 2 commits October 14, 2018 13:33

Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…

a897f5d

…x-doc-max

Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…

21cfaad

…x-doc-max

WillAyd closed this Nov 23, 2018

Roald87 added 5 commits November 27, 2018 20:26

Added examples and see also for .min and .max methods.

0c96e4d

Merge branch 'master' into fix-doc-max

da044a9

# Conflicts: # pandas/core/generic.py

Passed validation checks.

7c3623a

Added min_count stub back in.

a2081f3

Added DataFrame see also's. Passes PEP8 check.

b1e0b59

WillAyd reopened this Nov 27, 2018

Removed obsolete _min_max_examples.

e3e0e1b

datapythonista mentioned this pull request Dec 2, 2018

DOC: Added MultiIndex Example for Series Min #23338

Merged

2 tasks

datapythonista closed this Dec 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Update Series min and max docstring. GH22459 #22554

DOC: Update Series min and max docstring. GH22459 #22554

Roald87 commented Aug 31, 2018

datapythonista left a comment

Roald87 commented Sep 1, 2018

datapythonista commented Sep 3, 2018

Roald87 commented Sep 4, 2018

Roald87 commented Sep 4, 2018

datapythonista commented Sep 4, 2018

Roald87 commented Sep 16, 2018 •

edited

Loading

Roald87 commented Sep 26, 2018

pep8speaks commented Oct 13, 2018

codecov bot commented Oct 13, 2018 •

edited

Loading

Roald87 commented Oct 18, 2018

datapythonista commented Nov 3, 2018

datapythonista commented Nov 3, 2018

datapythonista commented Nov 4, 2018

WillAyd commented Nov 23, 2018

datapythonista commented Nov 26, 2018

Roald87 commented Nov 27, 2018

Roald87 commented Nov 27, 2018

datapythonista commented Nov 27, 2018

datapythonista commented Nov 27, 2018

Roald87 commented Nov 28, 2018

datapythonista commented Dec 9, 2018

DOC: Update Series min and max docstring. GH22459 #22554

DOC: Update Series min and max docstring. GH22459 #22554

Conversation

Roald87 commented Aug 31, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Roald87 commented Sep 1, 2018

datapythonista commented Sep 3, 2018

Roald87 commented Sep 4, 2018

Roald87 commented Sep 4, 2018

datapythonista commented Sep 4, 2018

Roald87 commented Sep 16, 2018 • edited Loading

Roald87 commented Sep 26, 2018

pep8speaks commented Oct 13, 2018

codecov bot commented Oct 13, 2018 • edited Loading

Codecov Report

Roald87 commented Oct 18, 2018

datapythonista commented Nov 3, 2018

datapythonista commented Nov 3, 2018

datapythonista commented Nov 4, 2018

WillAyd commented Nov 23, 2018

datapythonista commented Nov 26, 2018

Roald87 commented Nov 27, 2018

Roald87 commented Nov 27, 2018

datapythonista commented Nov 27, 2018

datapythonista commented Nov 27, 2018

Roald87 commented Nov 28, 2018

datapythonista commented Dec 9, 2018

Roald87 commented Sep 16, 2018 •

edited

Loading

codecov bot commented Oct 13, 2018 •

edited

Loading