Skip to content

DOC: Update Series min and max docstring. GH22459 #22554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

Roald87
Copy link

@Roald87 Roald87 commented Aug 31, 2018

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Doesn't pass all the tests, but can figure out where they come from.

Errors found:
Closing quotes should be placed in the line after the last text in the docstring (do not close the quotes in the same line as the text, or leave a blank line between the last text and the quotes)
Use only one blank line to separate sections or paragraphs

Also I find the way the axis are defined quite confusing. My initial thought was that if you say axis='index' it would take apply your function on a row wise basis.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to have just a set of tests for min and max that illustrates both, to reduce te size of the code.

Also, can you explain better why implementing _make_min_max_function was needed?

@Roald87
Copy link
Author

Roald87 commented Sep 1, 2018

Thanks for the comments @datapythonista.

I'd prefer to have just a set of tests for min and max that illustrates both, to reduce te size of the code.

What tests do you mean? Or do you mean the examples? I wasn't sure how to do that, since the answer of an example is hard coded. Or is there a way to generate the answers when the docs are compiled? As in

By default NA's are ignored.

>>> s = pd.Series([1, np.nan, 4, 3])
>>> s.%(outname)s()
%(ans)s

Also, can you explain better why implementing _make_min_max_function was needed?

Because I thought I couldn't use the other functions.

  • _make_min_count_stat_function has an extra variable min_count which is not used by min and max.
  • With _make_stat_function you can't pass an example. And since this method is used in several places I wasn't sure what would happen if added examples to the @Substitution of this method. Also I wasn't sure how to pass a See also section using this function.
  • The other _make_... are for very different functions or have extra input parameters which are not used by min and max.

@datapythonista
Copy link
Member

Ok, I understand. When I said having just a set of examples (that's what I meant by tests), I mean showing both methods in both docstrings. Instead of showing only examples for min in the min docstring, showing both min and max (and same for max).

Then, if I'm not missing anything, we can simply set _min_max_doc as the docstring without a function. Am I missing something?

@Roald87
Copy link
Author

Roald87 commented Sep 4, 2018

@datapythonista ok. I'll start working on it.

Then, if I'm not missing anything, we can simply set _min_max_doc as the docstring without a function. Am I missing something?

Yeah I think so.

@Roald87
Copy link
Author

Roald87 commented Sep 4, 2018

@datapythonista so you mean change

cls.max = _make_min_max_function(
            cls, 'max', name, name2, axis_descr,
            'maximum', nanops.nanmax, _max_examples)

into

cls.max = _min_max_doc

And same for cls.min?

If we do this, then it is not possible to add what the method does to the docstring summary. As in, now it says Return the %(desc)s of the values in the object.. If that changes to a general docstring for both min and max, you get a summary which reads Return the minimum or maximum of the values in the object., or something along those lines. And a similar statement on how to retrieve the index of the min and max. You are OK with that?

@datapythonista
Copy link
Member

I took a closer look, and I understand what you did now. This code (the original) is a bit tricky. Let me take a closer look and see if we can make it simpler (I'd like to avoid creating the new function _make_min_max_function to not complicate it even more).

@Roald87
Copy link
Author

Roald87 commented Sep 16, 2018

@datapythonista What about having a single general docstring for all methods, something like this:

_num_doc = """
%(short_summary)s

%(extended_summary)s

Parameters
----------
axis : %(axis_descr)s, default 0
    Indicate which axis should be reduced. Not implemented for Series.

    * 0 / ‘index’ : reduce the index, return a Series whose index is the
      original column labels.
    * 1 / ‘columns’ : reduce the columns, return a Series whose index is the
      original index.
    %(bool_axis)s

    For a DataFrame the value 0 applies %(fname)s on each column, and 1
    applies it on each row.
skipna : bool, default True
%(ddof)s
level : int or level name, default None
    If the axis is a MultiIndex (hierarchical), count along a
    particular level, collapsing into a %(axis_name1)s.
%(min_count)s
numeric_only : bool, default None
    Include only float, int, bool columns. If None, will attempt to use
    everything, then use only numeric data. Not implemented for Series.
    Exclude NA/null values when computing the result.
**kwargs : any, default None
    Additional keyword arguments.

Returns
-------
%(fname)s : %(axis_name1)s or %(axis_name2)s (if level specified)

%(see_also)s
%(examples)s
"""

This can than be used for all functions, also the ones which have a ddof or min_count argument and even the bool methods any and all. Thus you get rid of _num_ddof_doc, _bool_doc and of course _min_max_doc.

I took the liberty to rename some variables in the doc string such that it easier to see what is what. name -> fname, name1 -> axis_name1 and name2 -> axis_name2. Finally I've added two more variables short_summary and long_summary.

In order to use this new docstring I've updated _make_stat_function as follows:

def _make_stat_function(cls, long_name, axis_name1, axis_name2, axis_descr,
                           short_summary, extended_summary, f,
                           examples, see_also):
    @Substitution(fname=cls.__name__, short_summary=short_summary, 
                  extended_summary=extended_summary, long_name=long_name,
                  axis_name1=axis_name1, axis_name2=axis_name2,
                  axis_descr=axis_descr, bool_axis='', ddof='',
                  min_count='', examples=examples, see_also=see_also)
    @Appender(_num_doc)
    def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None,
                  **kwargs):
        nv.validate_stat_func(tuple(), kwargs, fname=cls.__name__)
        if skipna is None:
            skipna = True
        if axis is None:
            axis = self._stat_axis_number
        if level is not None:
            return self._agg_by_level(cls.__name__, axis=axis, level=level,
                                      skipna=skipna)
        return self._reduce(f, cls.__name__, axis=axis, skipna=skipna,
                            numeric_only=numeric_only)

    return set_function_name(stat_func, cls.__name__, cls)

In the updated method I use the new variables and I've omitted the name variable and replaced it with cls.__name__. Furthermore I've added a long_name variable, which can be used as follows:

cls.max = _make_stat_function(
    cls, 'maximum', name, name2, axis_descr,
    _stat_short_summary, _min_max_extended_summary, nanops.nanmax,
    _max_examples, _min_max_see_also)

where _stat_short_summary is:

_stat_short_summary = """\
Return the %(long_name)s of the values in the object.
"""

and makes use of long_name. stat_short_summary can then also be used by methods std, mean, var, mad, etc.

Shall I start implementing this for all methods which use _make_stat_function?

@Roald87
Copy link
Author

Roald87 commented Sep 26, 2018

@datapythonista ?

- Changed `_make_min_max_function` such that in the future it can be used
to replace `_bool_doc`, `_num_doc` and `_num_ddof_doc`.
@pep8speaks
Copy link

Hello @Roald87! Thanks for updating the PR.

@codecov
Copy link

codecov bot commented Oct 13, 2018

Codecov Report

Merging #22554 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22554      +/-   ##
==========================================
+ Coverage   92.29%   92.31%   +0.01%     
==========================================
  Files         161      161              
  Lines       51501    51473      -28     
==========================================
- Hits        47534    47517      -17     
+ Misses       3967     3956      -11
Flag Coverage Δ
#multiple 90.7% <100%> (+0.01%) ⬆️
#single 42.43% <84.61%> (+0.01%) ⬆️
Impacted Files Coverage Δ
pandas/core/generic.py 96.84% <100%> (ø) ⬆️
pandas/plotting/_misc.py 38.68% <0%> (-0.31%) ⬇️
pandas/core/arrays/datetimes.py 98.37% <0%> (-0.14%) ⬇️
pandas/core/ops.py 94.14% <0%> (-0.14%) ⬇️
pandas/io/formats/html.py 97.63% <0%> (-0.05%) ⬇️
pandas/core/dtypes/dtypes.py 95.59% <0%> (-0.03%) ⬇️
pandas/core/arrays/sparse.py 91.93% <0%> (-0.02%) ⬇️
pandas/io/json/normalize.py 96.87% <0%> (ø) ⬆️
pandas/io/formats/format.py 97.76% <0%> (ø) ⬆️
pandas/tseries/offsets.py 96.98% <0%> (ø) ⬆️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7294dd...e3e0e1b. Read the comment docs.

@Roald87
Copy link
Author

Roald87 commented Oct 18, 2018

@datapythonista I made the following changes:

  • Now there is a single example for .min and .max
  • I made some changes to how the docstring of _make_min_max_function gets assembled. I made these changes such that _stats_parameters_and_returns can replace _num_doc, _num_ddof_doc and _bool_doc. If you approve of this, I can rename _make_min_max_function to _make_stat_function and remove the original _make_stat_function. And then also change all the functions which make use of the original _make_stat_function.

Also I pulled the latest changes from the original master branch, however this seems to have bricked pandas. I now get the following when I do import pandas:

Traceback (most recent call last):
  File "/home/roald/virtualenvs/pandas-dev/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2961, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-7dd3504c366f>", line 1, in <module>
    import pandas as pd
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/api.py", line 10, in <module>
    from pandas.core.groupby import Grouper
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.groupby import GroupBy  # flake8: noqa
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/groupby/groupby.py", line 38, in <module>
    from pandas.core.index import Index, MultiIndex
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/index.py", line 2, in <module>
    from pandas.core.indexes.api import *
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/api.py", line 11, in <module>
    from pandas.core.indexes.interval import IntervalIndex  # noqa
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/interval.py", line 34, in <module>
    from pandas.core.indexes.datetimes import date_range, DatetimeIndex
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/datetimes.py", line 39, in <module>
    from pandas.core.indexes.datetimelike import (
  File "/snap/pycharm-community/85/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "/home/roald/Documents/Projects/pandas/pandas/core/indexes/datetimelike.py", line 14, in <module>
    from pandas._libs.tslibs.timestamps import round_nsint64, RoundTo
ImportError: cannot import name 'round_nsint64'

Is it generally a good idea to get the latest version while working on a PR or not?

@datapythonista
Copy link
Member

Sorry for the delay. Did you try to recompile the C parts of pandas python setup.py build_ext --inplace?

I guess that should fix the error. As far as you start the PR with a fresh version of pandas, and there are no conflicts (like now), I don't think fetching the new changes adds a lot of value. It surely don't harm, and it's good to be updated, but in most cases won't make a difference.

If you can try to recompile pandas, and also fetch from master to fix the conflict, we can move forward with this.

@datapythonista
Copy link
Member

I'd prefer to touch as less code as possible for these changes. I think with the current make_stats_func you can already provide the summary and examples as summary, and the rest is provided. What else is missing? If all the changes are for the see also section, let's leave this for another PR. Can you check how sum is adding the examples to the docstring, do it in the same way, and undo all the changes to the code in the PR.

Sorry, but I think pandas code is complex enough, let's try to make the changes without adding anything else.

@datapythonista
Copy link
Member

I think this contains conflicting changes: #23338

Can you take a look, and proceed considering that this PR (#23338) will be merged soon.

@WillAyd
Copy link
Member

WillAyd commented Nov 23, 2018

Closing as stale. Also conflicts with some other development. Ping if you'd like to pick it back up

@WillAyd WillAyd closed this Nov 23, 2018
@datapythonista
Copy link
Member

@Roald87 would be great to get your work in this PR merged. Do you have time to merge master into your branch, and address the last comments? Please reopen the PR if that's the case.

@Roald87
Copy link
Author

Roald87 commented Nov 27, 2018

@datapythonista Yeah I'll try to do it this week.

@Roald87
Copy link
Author

Roald87 commented Nov 27, 2018

@WillAyd Could you open it again?

@datapythonista I've made the changes. I see the a _max_examples was added. I used that one now for both the min and max functions. Or should I add some of my examples from _min_max_examples into _max_examples?

Also can't generate the html docs anymore. I get the following:

>>> python make.py --single DataFrame.max
[autosummary] generating autosummary for: source/index.rst
[autosummary] writing to source/generated_single
Running Sphinx v1.7.7

Extension error:
Could not import extension contributors (exception: No module named 'git')
gio: file:///home/roald/Documents/Projects/pandas/doc/build/html/generated_single/pandas.DataFrame.max.html: Error when getting information for file “/home/roald/Documents/Projects/pandas/doc/build/html/generated_single/pandas.DataFrame.max.html”: No such file or directory

@WillAyd WillAyd reopened this Nov 27, 2018
@datapythonista
Copy link
Member

conda env update should fix it, there is a new library required in requiremets.yml, I think it's named gitpy

@datapythonista
Copy link
Member

I think reusing the examples for min and max is better

@Roald87
Copy link
Author

Roald87 commented Nov 28, 2018

@datapythonista OK done.

It was GitPython which was missing.

@datapythonista
Copy link
Member

If I understood it correctly, we're finally not merging this, as it's superseded by #23338. Feel free to reopen if I misunderstood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants