Skip to content

DOC: add checks on the returns section in the docstrings (#23138) #23432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 30, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 36 additions & 4 deletions scripts/tests/test_validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -545,6 +545,30 @@ def no_punctuation(self):
"""
return "Hello world!"

def named_single_return(self):
"""
Provides name but returns only one value.

Returns
-------
s : str
A nice greeting.
"""
return "Hello world!"

def no_capitalization(self):
"""
Forgets capitalization in return values descriptions.

Returns
-------
foo : str
the first returned string.
bar : str
the second returned string.
"""
return "Hello", "World!"


class BadSeeAlso(object):

Expand Down Expand Up @@ -696,10 +720,18 @@ def test_bad_generic_functions(self, func):
('BadReturns', 'yield_not_documented', ('No Yields section found',)),
pytest.param('BadReturns', 'no_type', ('foo',),
marks=pytest.mark.xfail),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can test this now. I guess we can't know whether we're missing the type or the description. We should return a message that considers both, so the user always know what to do. And test this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, with the current modifications, in this case the error message is Return value has no description
should I replace it by something like Return value has no type or description ?
(well, in the case where a tuple is returned, that would mean that both type and name are missing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while writing that, I realize that I do not check, in the case of multiple return values, if all of them have both a name and a type

pytest.param('BadReturns', 'no_description', ('foo',),
marks=pytest.mark.xfail),
pytest.param('BadReturns', 'no_punctuation', ('foo',),
marks=pytest.mark.xfail),
('BadReturns', 'no_description',
('Return value has no description',)),
('BadReturns', 'no_punctuation',
('Return value description should finish with "."',)),
('BadReturns', 'named_single_return',
('No name is to be provided when returning a single value',)),
('BadReturns', 'no_capitalization',
('Return value "foo" description should start with a capital '
'letter',)),
('BadReturns', 'no_capitalization',
('Return value "bar" description should start with a capital '
'letter',)),
# See Also tests
('BadSeeAlso', 'prefix_pandas',
('pandas.Series.rename in `See Also` section '
Expand Down
30 changes: 28 additions & 2 deletions scripts/validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -491,8 +491,34 @@ def validate_one(func_name):
errs.append('\t{}'.format(param_err))

if doc.is_function_or_method:
if not doc.returns and "return" in doc.method_source:
errs.append('No Returns section found')
if not doc.returns:
if "return" in doc.method_source:
errs.append('No Returns section found')
Copy link
Member

@gfyoung gfyoung Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista @WillAyd : How does this validation handle situations where we use an empty return to break out early from a function? In such a case, we have a return statement, but the documentation wouldn't need to specifies a Returns section per se.

(BTW, I realize that this already in the codebase, but it only struck me just now)

Copy link
Member

@datapythonista datapythonista Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, in Python all the functions return None by default, and a bare return also returns None. I think in those cases the numpydoc standard and ours is to not include the Returns section at all.

I don't think the validation checks in the code whether there is a return or not. This change should be to avoid giving a name to what is being returned if it's just a single value. In my opinion, saying that the function transform returns a variable transformed does not add any value.

We want to name the values returned if the return is a tuple, for example code and name. This PR should validate this.

Does this make sense to you @gfyoung?

Update: Just saw the code (sorry, reviewing from the phone, which is tricky). I forgot we were checking the code for a return statement. I think we should change and use a regex that doesn't capture the bare return then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so replacing
if "return" in doc.method_source:
by something like
if re.search(r"return\b[ \t]*[^;#\n\r]", doc.method_source)
would deal with these cases ?

(maybe I should have another look at what's in doc.method_source)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc.method_source should be the code as a string. Not sure about the exact regex, not sure if looking for a return followed by letter, number or opening brackets would be simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #23488 to take care of the return problem. Let's keep the focus of this PR in the original topic and not mix things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a regex for this? Wouldn't it just be better documented as an optional return value if such a branch is possible to hit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean, but it's better if you mention this approach in #23488, where this will be implemented.

else:
returns_errs = []
if len(doc.returns) == 1 and doc.returns[0][1]:
returns_errs.append('No name is to be provided when '
'returning a single value.')
for name, type_, desc in doc.returns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of name in the case we've just got a single return?

I'd still like to have the names of the return fields that are failing in the returned error messages, and not give the user something like:

Return value has no description
Return value has no description
Return value has no description

@WillAyd can you propose your preferred solution for this?

For me the best would be the original one:

errs.append(error('RT03', return_name='{} '.format(name) if name else ''))

I think with the way it is implemented now it should be perfectly readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when there is a single return and the first line is correctly written (ie no name is given), name contains the type, and type_ is empty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to rename name to name_or_type then, and possibly add a with what you just said.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally indifferent as I really think this is an edge case. Whatever can be done simply and accurately works for me!

desc = ''.join(desc)
name = '"' + name + '" ' if type_ else ''
if not desc:
returns_errs.append('Return value {}has no '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so is the assumption here that name always has a trailing whitespace? If so that seems rather fragile and should probably be fixed directly to remove that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should not be any trailing space in name.
I recognize that what's done here is not very clean :

  • if type_ is empty, which mainly occurs in fact when only one value is returned, which means in fact that name contains the type, then I don't want to print it in the error message.
  • if type_ is not empty, which means both name and type were provided, then I add a trailing space to name, for the error message to be correctly printed.

actually, the main problem I see here is that in the case of multiple returned values, if one of them has no name or type, the error message will not specify where the problem is :
with something like

Returns
-------
foo : int  
   The first value  
bar  
   The second value

will cause the following errors :

   Return value "foo" description should finish with "."
   Return value description should finish with "."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with a more generic error message then that doesn't use the name if it simplifies the code. The instances where multiple values are returned is few and far between to begin with so it shouldn't be hard as an end user to recognize where the issue appears if it comes up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in b09c322, I've removed value names from the error messages (and prevented from printing duplicate error messages)... but I'm not sure wether this simplifies the code...

'description'.format(name))
else:
if not desc[0].isupper():
returns_errs.append('Return value {}description '
'should start with a capital '
'letter'.format(name))
if desc[-1] != '.':
returns_errs.append('Return value {}description '
'should finish with '
'"."'.format(name))
if returns_errs:
errs.append('Errors in returns section')
for returns_err in returns_errs:
errs.append('\t{}'.format(returns_err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errs += ['\t{}'.format(e) for e in returns_errs] is probably better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this better as is. Appending seems more idiomatic than adding an intermediary list comp


if not doc.yields and "yield" in doc.method_source:
errs.append('No Yields section found')

Expand Down