Skip to content

DOC: add checks on the returns section in the docstrings (#23138) #23432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 30, 2018
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 34 additions & 4 deletions scripts/tests/test_validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -558,6 +558,30 @@ def no_punctuation(self):
"""
return "Hello world!"

def named_single_return(self):
"""
Provides name but returns only one value.

Returns
-------
s : str
A nice greeting.
"""
return "Hello world!"

def no_capitalization(self):
"""
Forgets capitalization in return values descriptions.

Returns
-------
foo : str
The first returned string.
bar : str
the second returned string.
"""
return "Hello", "World!"


class BadSeeAlso(object):

Expand Down Expand Up @@ -747,10 +771,16 @@ def test_bad_generic_functions(self, func):
('BadReturns', 'yield_not_documented', ('No Yields section found',)),
pytest.param('BadReturns', 'no_type', ('foo',),
marks=pytest.mark.xfail),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can test this now. I guess we can't know whether we're missing the type or the description. We should return a message that considers both, so the user always know what to do. And test this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, with the current modifications, in this case the error message is Return value has no description
should I replace it by something like Return value has no type or description ?
(well, in the case where a tuple is returned, that would mean that both type and name are missing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while writing that, I realize that I do not check, in the case of multiple return values, if all of them have both a name and a type

pytest.param('BadReturns', 'no_description', ('foo',),
marks=pytest.mark.xfail),
pytest.param('BadReturns', 'no_punctuation', ('foo',),
marks=pytest.mark.xfail),
('BadReturns', 'no_description',
('Return value has no description',)),
('BadReturns', 'no_punctuation',
('Return value description should finish with "."',)),
('BadReturns', 'named_single_return',
('The first line of the Returns section should contain only the '
'type, unless multiple values are being returned.',)),
('BadReturns', 'no_capitalization',
('Return value description should start with a capital '
'letter',)),
# Examples tests
('BadGenericDocStrings', 'method',
('numpy does not need to be imported in the examples',)),
Expand Down
31 changes: 29 additions & 2 deletions scripts/validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -527,8 +527,35 @@ def validate_one(func_name):
err.message))

if doc.is_function_or_method:
if not doc.returns and "return" in doc.method_source:
errs.append('No Returns section found')
if not doc.returns:
if "return" in doc.method_source:
errs.append('No Returns section found')
Copy link
Member

@gfyoung gfyoung Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista @WillAyd : How does this validation handle situations where we use an empty return to break out early from a function? In such a case, we have a return statement, but the documentation wouldn't need to specifies a Returns section per se.

(BTW, I realize that this already in the codebase, but it only struck me just now)

Copy link
Member

@datapythonista datapythonista Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, in Python all the functions return None by default, and a bare return also returns None. I think in those cases the numpydoc standard and ours is to not include the Returns section at all.

I don't think the validation checks in the code whether there is a return or not. This change should be to avoid giving a name to what is being returned if it's just a single value. In my opinion, saying that the function transform returns a variable transformed does not add any value.

We want to name the values returned if the return is a tuple, for example code and name. This PR should validate this.

Does this make sense to you @gfyoung?

Update: Just saw the code (sorry, reviewing from the phone, which is tricky). I forgot we were checking the code for a return statement. I think we should change and use a regex that doesn't capture the bare return then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so replacing
if "return" in doc.method_source:
by something like
if re.search(r"return\b[ \t]*[^;#\n\r]", doc.method_source)
would deal with these cases ?

(maybe I should have another look at what's in doc.method_source)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc.method_source should be the code as a string. Not sure about the exact regex, not sure if looking for a return followed by letter, number or opening brackets would be simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #23488 to take care of the return problem. Let's keep the focus of this PR in the original topic and not mix things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a regex for this? Wouldn't it just be better documented as an optional return value if such a branch is possible to hit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean, but it's better if you mention this approach in #23488, where this will be implemented.

else:
returns_errs = []
if len(doc.returns) == 1 and doc.returns[0][1]:
returns_errs.append('The first line of the Returns section '
'should contain only the type, unless '
'multiple values are being returned.')
missing_desc, missing_cap, missing_period = False, False, False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing_desc = missing_cap = missing_period = False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that said is this even required? It seems like you could simplify things without the or statements below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the or statement are needed to print at most one message of each kind.
as the error messages do not specify the variable names, I assumed it would be clearer to avoid printing the same message twice (or even more).

for name, type_, desc in doc.returns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of name in the case we've just got a single return?

I'd still like to have the names of the return fields that are failing in the returned error messages, and not give the user something like:

Return value has no description
Return value has no description
Return value has no description

@WillAyd can you propose your preferred solution for this?

For me the best would be the original one:

errs.append(error('RT03', return_name='{} '.format(name) if name else ''))

I think with the way it is implemented now it should be perfectly readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when there is a single return and the first line is correctly written (ie no name is given), name contains the type, and type_ is empty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to rename name to name_or_type then, and possibly add a with what you just said.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally indifferent as I really think this is an edge case. Whatever can be done simply and accurately works for me!

desc = ''.join(desc)
missing_desc = missing_desc or not desc
missing_cap = missing_cap or desc and not desc[0].isupper()
missing_period = (missing_period
or desc and not desc.endswith('.'))
if missing_desc:
returns_errs.append('Return value has no description.')
if missing_cap:
returns_errs.append('Return value description should start '
'with a capital letter.')
if missing_period:
returns_errs.append('Return value description should finish '
'with ".".')
if returns_errs:
errs.append('Errors in Returns section')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are emulating what you saw with the parameters section, but do we need this? Doesn't look like we are testing for it and I don't necessarily see the utility of it alongside the actual errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, it's not as useful here as in the parameters section, as functions with multiple values are quite rare
I've removed this message and appended errors directly to errs

for returns_err in returns_errs:
errs.append('\t{}'.format(returns_err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errs += ['\t{}'.format(e) for e in returns_errs] is probably better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this better as is. Appending seems more idiomatic than adding an intermediary list comp


if not doc.yields and "yield" in doc.method_source:
errs.append('No Yields section found')

Expand Down