DOC: add checks on the returns section in the docstrings (#23138) #23432

igorfassen · 2018-10-31T18:20:51Z

Closes DOC: Validate the Returns section in the docstrings #23138

Updated returns checks in validate_docstrings.py:

check that no name is given when only one value is returned
check capitalization and punctuation in descriptions

Updated test_validate_docstrings.py:

added some tests to validate these new checks

…23138)

pep8speaks · 2018-10-31T18:20:56Z

Hello @igorfassen! Thanks for updating the PR.

There are no PEP8 issues in the file scripts/tests/test_validate_docstrings.py !
There are no PEP8 issues in the file scripts/validate_docstrings.py !

Comment last updated on November 06, 2018 at 10:07 Hours UTC

gfyoung · 2018-10-31T20:11:40Z

scripts/validate_docstrings.py

-            errs.append('No Returns section found')
+        if not doc.returns:
+            if "return" in doc.method_source:
+                errs.append('No Returns section found')


@datapythonista @WillAyd : How does this validation handle situations where we use an empty return to break out early from a function? In such a case, we have a return statement, but the documentation wouldn't need to specifies a Returns section per se.

(BTW, I realize that this already in the codebase, but it only struck me just now)

Afaik, in Python all the functions return None by default, and a bare return also returns None. I think in those cases the numpydoc standard and ours is to not include the Returns section at all.

I don't think the validation checks in the code whether there is a return or not. This change should be to avoid giving a name to what is being returned if it's just a single value. In my opinion, saying that the function transform returns a variable transformed does not add any value.

We want to name the values returned if the return is a tuple, for example code and name. This PR should validate this.

Does this make sense to you @gfyoung?

Update: Just saw the code (sorry, reviewing from the phone, which is tricky). I forgot we were checking the code for a return statement. I think we should change and use a regex that doesn't capture the bare return then.

so replacing
if "return" in doc.method_source:
by something like
if re.search(r"return\b[ \t]*[^;#\n\r]", doc.method_source)
would deal with these cases ?

(maybe I should have another look at what's in doc.method_source)

doc.method_source should be the code as a string. Not sure about the exact regex, not sure if looking for a return followed by letter, number or opening brackets would be simpler.

I created #23488 to take care of the return problem. Let's keep the focus of this PR in the original topic and not mix things.

Do we really need a regex for this? Wouldn't it just be better documented as an optional return value if such a branch is possible to hit?

Not sure what you mean, but it's better if you mention this approach in #23488, where this will be implemented.

codecov · 2018-10-31T20:30:30Z

Codecov Report

Merging #23432 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #23432      +/-   ##
==========================================
- Coverage   92.31%    92.3%   -0.01%     
==========================================
  Files         166      166              
  Lines       52391    52391              
==========================================
- Hits        48363    48362       -1     
- Misses       4028     4029       +1

Flag	Coverage Δ
#multiple	`90.73% <ø> (ø)`	⬆️
#single	`43.05% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/util/testing.py	`87.68% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4ac0d6...5ea7881. Read the comment docs.

datapythonista

looks good, added a few comments, but almost ready to be merged.

datapythonista · 2018-11-04T10:40:55Z

scripts/validate_docstrings.py

-            errs.append('No Returns section found')
+        if not doc.returns:
+            if "return" in doc.method_source:
+                errs.append('No Returns section found')


I created #23488 to take care of the return problem. Let's keep the focus of this PR in the original topic and not mix things.

scripts/validate_docstrings.py

datapythonista · 2018-11-04T10:56:51Z

scripts/tests/test_validate_docstrings.py

@@ -696,10 +720,18 @@ def test_bad_generic_functions(self, func):
        ('BadReturns', 'yield_not_documented', ('No Yields section found',)),
        pytest.param('BadReturns', 'no_type', ('foo',),
                     marks=pytest.mark.xfail),


We can test this now. I guess we can't know whether we're missing the type or the description. We should return a message that considers both, so the user always know what to do. And test this case.

indeed, with the current modifications, in this case the error message is Return value has no description
should I replace it by something like Return value has no type or description ?
(well, in the case where a tuple is returned, that would mean that both type and name are missing)

while writing that, I realize that I do not check, in the case of multiple return values, if all of them have both a name and a type

scripts/validate_docstrings.py

datapythonista · 2018-11-04T11:00:10Z

scripts/validate_docstrings.py

+            if returns_errs:
+                errs.append('Errors in returns section')
+                for returns_err in returns_errs:
+                    errs.append('\t{}'.format(returns_err))


errs += ['\t{}'.format(e) for e in returns_errs] is probably better

I find this better as is. Appending seems more idiomatic than adding an intermediary list comp

WillAyd · 2018-11-04T22:52:59Z

scripts/validate_docstrings.py

+                desc = ''.join(desc)
+                name = '"' + name + '" ' if type_ else ''
+                if not desc:
+                    returns_errs.append('Return value {}has no '


Hmm so is the assumption here that name always has a trailing whitespace? If so that seems rather fragile and should probably be fixed directly to remove that

there should not be any trailing space in name.
I recognize that what's done here is not very clean :

if type_ is empty, which mainly occurs in fact when only one value is returned, which means in fact that name contains the type, then I don't want to print it in the error message.

if type_ is not empty, which means both name and type were provided, then I add a trailing space to name, for the error message to be correctly printed.

actually, the main problem I see here is that in the case of multiple returned values, if one of them has no name or type, the error message will not specify where the problem is :
with something like

Returns ------- foo : int The first value bar The second value

will cause the following errors :

Return value "foo" description should finish with "." Return value description should finish with "."

I would be fine with a more generic error message then that doesn't use the name if it simplifies the code. The instances where multiple values are returned is few and far between to begin with so it shouldn't be hard as an end user to recognize where the issue appears if it comes up

in b09c322, I've removed value names from the error messages (and prevented from printing duplicate error messages)... but I'm not sure wether this simplifies the code...

WillAyd · 2018-11-04T22:54:21Z

scripts/validate_docstrings.py

+            if returns_errs:
+                errs.append('Errors in returns section')
+                for returns_err in returns_errs:
+                    errs.append('\t{}'.format(returns_err))


I find this better as is. Appending seems more idiomatic than adding an intermediary list comp

Co-Authored-By: igorfassen <[email protected]>

* removed value name from error messages * updated the associated test case

WillAyd · 2018-11-07T06:13:11Z

scripts/validate_docstrings.py

+                returns_errs.append('The first line of the Returns section '
+                                    'should contain only the type, unless '
+                                    'multiple values are being returned.')
+            missing_desc, missing_cap, missing_period = False, False, False


missing_desc = missing_cap = missing_period = False

With that said is this even required? It seems like you could simplify things without the or statements below

the or statement are needed to print at most one message of each kind.
as the error messages do not specify the variable names, I assumed it would be clearer to avoid printing the same message twice (or even more).

WillAyd · 2018-11-07T06:18:16Z

scripts/validate_docstrings.py

+                returns_errs.append('Return value description should finish '
+                                    'with ".".')
+            if returns_errs:
+                errs.append('Errors in Returns section')


I assume you are emulating what you saw with the parameters section, but do we need this? Doesn't look like we are testing for it and I don't necessarily see the utility of it alongside the actual errors

indeed, it's not as useful here as in the parameters section, as functions with multiple values are quite rare
I've removed this message and appended errors directly to errs

* removed the message `Errors in Returns section` * simpler variable initialization

datapythonista · 2018-11-07T09:09:26Z

@igorfassen you will have to merge conflicts with the changes in #23514, in which we codified the error messages. Let me know if you need help.

datapythonista

Looks good, and thanks for the merge.

Just one comment. For the case when a tuple is being returned, I think it'd be better to report one error for every missing capital or period. I think we do that in See Also already.

If we do that, we should add tests for it, and also in the error messages add which is the field that is wrong.

igorfassen · 2018-11-08T11:49:48Z

Actually, that was the first behavior that was implemented, but I eventually simplified this part (see this part of the discussion: #23432 (review)).

datapythonista · 2018-11-08T15:53:42Z

Sorry, I didn't see that before. I think @WillAyd comment was more about the implementation, that was somehow tricky.

I think after the refactoring, it should be a bit easier to have a simpler implementation. You can have separate error messages for the case when there is just one output, and for when there are more. It's a bit more repetitive, but I think the implementation should be very straightforward, same as See Also.

Sorry about asking too many changes. Does it sound good?

WillAyd · 2018-11-08T18:26:31Z

scripts/validate_docstrings.py

+        else:
+            if len(doc.returns) == 1 and doc.returns[0][1]:
+                errs.append(error('RT02'))
+            missing_desc = missing_cap = missing_period = False


Do we even need this line? Couldn't we just use the assignments within the loop over doc.returns?

Or rather not even assign to variables but simply check if not desc: ...

ok, I'll try something simpler.
my idea was to avoid duplicate error messages, so these variables allowed me to detect the first occurrence of each type of error, and then short-circuit the subsequent tests. but I must admit this was a bit less readable...

datapythonista

Looks good. Only thing is the case with multiple results. Let's see if we can find an agreement, so we can close.

We'll have to add tests, to test specifically cases where only one parameter is missing punctuation or the whole description...

scripts/validate_docstrings.py

datapythonista · 2018-11-09T10:04:08Z

scripts/validate_docstrings.py

+        else:
+            if len(doc.returns) == 1 and doc.returns[0][1]:
+                errs.append(error('RT02'))
+            for name, type_, desc in doc.returns:


What is the value of name in the case we've just got a single return?

I'd still like to have the names of the return fields that are failing in the returned error messages, and not give the user something like:

Return value has no description Return value has no description Return value has no description

@WillAyd can you propose your preferred solution for this?

For me the best would be the original one:

errs.append(error('RT03', return_name='{} '.format(name) if name else ''))

I think with the way it is implemented now it should be perfectly readable.

when there is a single return and the first line is correctly written (ie no name is given), name contains the type, and type_ is empty

I'd prefer to rename name to name_or_type then, and possibly add a with what you just said.

I'm totally indifferent as I really think this is an edge case. Whatever can be done simply and accurately works for me!

datapythonista · 2018-11-25T23:07:42Z

@igorfassen can you make that last change to the name of the name variable, and the error messages?

datapythonista · 2018-12-02T00:44:50Z

@igorfassen do you have time to make the last changes, and merge master on this, so we can merge?

jreback · 2018-12-27T22:28:36Z

@datapythonista can you update

jreback · 2018-12-30T21:33:34Z

lgtm. ping on green.

datapythonista · 2018-12-30T22:36:17Z

all green @jreback

jreback · 2018-12-30T22:46:22Z

thanks @igorfassen and @datapythonista

* upstream/master: REF/TST: replace capture_stdout with pytest capsys fixture (pandas-dev#24501) BUG: fix .iat assignment creates a new column (pandas-dev#24495) DOC: add checks on the returns section in the docstrings (pandas-dev#23138) (pandas-dev#23432) ENH: Add strings_as_fixed_length parameter for df.to_records() (pandas-dev#18146) (pandas-dev#22229) TST: Skip db tests unless explicitly specified in -m pattern (pandas-dev#24492) Mix EA into DTA/TDA; part of 24024 (pandas-dev#24502) DOC: Fix building of a single API document (pandas-dev#24506)

…23138) (pandas-dev#23432)

DOC: add checks on the returns section in the docstrings (pandas-dev#…

e0f9689

…23138)

igorfassen mentioned this pull request Oct 31, 2018

DOC: Validate the Returns section in the docstrings #23138

Closed

gfyoung added Docs CI Continuous Integration labels Oct 31, 2018

gfyoung reviewed Oct 31, 2018

View reviewed changes

gfyoung requested review from WillAyd and datapythonista and removed request for WillAyd October 31, 2018 20:13

datapythonista reviewed Nov 4, 2018

View reviewed changes

WillAyd requested changes Nov 4, 2018

View reviewed changes

datapythonista and others added 6 commits November 5, 2018 13:57

Update scripts/validate_docstrings.py

20b32e7

Co-Authored-By: igorfassen <[email protected]>

update validate_docstrings.py: clearer error message

f2d6449

update test_validate_docstrings.py: fix expected error message

0d34a88

update validate_docstrings.py: Returns section validation

b09c322

* removed value name from error messages * updated the associated test case

Merge remote-tracking branch 'upstream/master' into validate_returns

a556925

update validate_docstrings.py: split line to comply with pep 8

a1384d4

WillAyd requested changes Nov 7, 2018

View reviewed changes

update validate_docstrings.py: small fixes in Returns validation

2f3f5bf

* removed the message `Errors in Returns section` * simpler variable initialization

Merge remote-tracking branch 'upstream/master' into validate_returns

e62de14

datapythonista reviewed Nov 8, 2018

View reviewed changes

WillAyd reviewed Nov 8, 2018

View reviewed changes

update validate_docstrings.py: simplify returns section validation

fdad765

datapythonista reviewed Nov 9, 2018

View reviewed changes

update validate_docstrings.py: replace " by ' for homogenization

58a0a91

datapythonista mentioned this pull request Nov 11, 2018

BUG/REF: TimedeltaIndex.__new__ #23539

Merged

datapythonista self-assigned this Dec 7, 2018

datapythonista added 2 commits December 30, 2018 20:19

Merge remote-tracking branch 'upstream/master' into validate_returns

a116a49

Minor fixes

5ea7881

jreback added this to the 0.24.0 milestone Dec 30, 2018

jreback merged commit 6df8567 into pandas-dev:master Dec 30, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: add checks on the returns section in the docstrings (pandas-dev#…

3e74d18

…23138) (pandas-dev#23432)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

DOC: add checks on the returns section in the docstrings (pandas-dev#…

757bb7a

…23138) (pandas-dev#23432)

Uh oh!

DOC: add checks on the returns section in the docstrings (#23138) #23432

DOC: add checks on the returns section in the docstrings (#23138) #23432

Uh oh!

Conversation

igorfassen commented Oct 31, 2018 • edited by datapythonista Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on November 06, 2018 at 10:07 Hours UTC

Uh oh!

gfyoung Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

datapythonista commented Nov 7, 2018

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

igorfassen commented Nov 8, 2018

Uh oh!

datapythonista commented Nov 8, 2018

igorfassen commented Oct 31, 2018 •

edited by datapythonista

Loading

pep8speaks commented Oct 31, 2018 •

edited

Loading

gfyoung Oct 31, 2018 •

edited

Loading

datapythonista Nov 1, 2018 •

edited

Loading

codecov bot commented Oct 31, 2018 •

edited

Loading