-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: add checks on the returns section in the docstrings (#23138) #23432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @igorfassen! Thanks for updating the PR.
Comment last updated on November 06, 2018 at 10:07 Hours UTC |
scripts/validate_docstrings.py
Outdated
errs.append('No Returns section found') | ||
if not doc.returns: | ||
if "return" in doc.method_source: | ||
errs.append('No Returns section found') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista @WillAyd : How does this validation handle situations where we use an empty return
to break out early from a function? In such a case, we have a return
statement, but the documentation wouldn't need to specifies a Returns
section per se.
(BTW, I realize that this already in the codebase, but it only struck me just now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik, in Python all the functions return None
by default, and a bare return
also returns None
. I think in those cases the numpydoc standard and ours is to not include the Returns
section at all.
I don't think the validation checks in the code whether there is a return or not. This change should be to avoid giving a name to what is being returned if it's just a single value. In my opinion, saying that the function transform
returns a variable transformed
does not add any value.
We want to name the values returned if the return is a tuple, for example code
and name
. This PR should validate this.
Does this make sense to you @gfyoung?
Update: Just saw the code (sorry, reviewing from the phone, which is tricky). I forgot we were checking the code for a return statement. I think we should change and use a regex that doesn't capture the bare return
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so replacing
if "return" in doc.method_source:
by something like
if re.search(r"return\b[ \t]*[^;#\n\r]", doc.method_source)
would deal with these cases ?
(maybe I should have another look at what's in doc.method_source)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc.method_source
should be the code as a string. Not sure about the exact regex, not sure if looking for a return followed by letter, number or opening brackets would be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #23488 to take care of the return
problem. Let's keep the focus of this PR in the original topic and not mix things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a regex for this? Wouldn't it just be better documented as an optional return value if such a branch is possible to hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean, but it's better if you mention this approach in #23488, where this will be implemented.
Codecov Report
@@ Coverage Diff @@
## master #23432 +/- ##
==========================================
- Coverage 92.31% 92.3% -0.01%
==========================================
Files 166 166
Lines 52391 52391
==========================================
- Hits 48363 48362 -1
- Misses 4028 4029 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, added a few comments, but almost ready to be merged.
scripts/validate_docstrings.py
Outdated
errs.append('No Returns section found') | ||
if not doc.returns: | ||
if "return" in doc.method_source: | ||
errs.append('No Returns section found') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #23488 to take care of the return
problem. Let's keep the focus of this PR in the original topic and not mix things.
@@ -696,10 +720,18 @@ def test_bad_generic_functions(self, func): | |||
('BadReturns', 'yield_not_documented', ('No Yields section found',)), | |||
pytest.param('BadReturns', 'no_type', ('foo',), | |||
marks=pytest.mark.xfail), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can test this now. I guess we can't know whether we're missing the type or the description. We should return a message that considers both, so the user always know what to do. And test this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, with the current modifications, in this case the error message is Return value has no description
should I replace it by something like Return value has no type or description ?
(well, in the case where a tuple is returned, that would mean that both type and name are missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while writing that, I realize that I do not check, in the case of multiple return values, if all of them have both a name and a type
scripts/validate_docstrings.py
Outdated
if returns_errs: | ||
errs.append('Errors in returns section') | ||
for returns_err in returns_errs: | ||
errs.append('\t{}'.format(returns_err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errs += ['\t{}'.format(e) for e in returns_errs]
is probably better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this better as is. Appending seems more idiomatic than adding an intermediary list comp
scripts/validate_docstrings.py
Outdated
desc = ''.join(desc) | ||
name = '"' + name + '" ' if type_ else '' | ||
if not desc: | ||
returns_errs.append('Return value {}has no ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so is the assumption here that name
always has a trailing whitespace? If so that seems rather fragile and should probably be fixed directly to remove that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should not be any trailing space in name
.
I recognize that what's done here is not very clean :
- if
type_
is empty, which mainly occurs in fact when only one value is returned, which means in fact thatname
contains the type, then I don't want to print it in the error message. - if
type_
is not empty, which means both name and type were provided, then I add a trailing space toname
, for the error message to be correctly printed.
actually, the main problem I see here is that in the case of multiple returned values, if one of them has no name or type, the error message will not specify where the problem is :
with something like
Returns
-------
foo : int
The first value
bar
The second value
will cause the following errors :
Return value "foo" description should finish with "."
Return value description should finish with "."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be fine with a more generic error message then that doesn't use the name if it simplifies the code. The instances where multiple values are returned is few and far between to begin with so it shouldn't be hard as an end user to recognize where the issue appears if it comes up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in b09c322, I've removed value names from the error messages (and prevented from printing duplicate error messages)... but I'm not sure wether this simplifies the code...
scripts/validate_docstrings.py
Outdated
if returns_errs: | ||
errs.append('Errors in returns section') | ||
for returns_err in returns_errs: | ||
errs.append('\t{}'.format(returns_err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this better as is. Appending seems more idiomatic than adding an intermediary list comp
Co-Authored-By: igorfassen <[email protected]>
* removed value name from error messages * updated the associated test case
scripts/validate_docstrings.py
Outdated
returns_errs.append('The first line of the Returns section ' | ||
'should contain only the type, unless ' | ||
'multiple values are being returned.') | ||
missing_desc, missing_cap, missing_period = False, False, False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing_desc = missing_cap = missing_period = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that said is this even required? It seems like you could simplify things without the or
statements below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the or
statement are needed to print at most one message of each kind.
as the error messages do not specify the variable names, I assumed it would be clearer to avoid printing the same message twice (or even more).
scripts/validate_docstrings.py
Outdated
returns_errs.append('Return value description should finish ' | ||
'with ".".') | ||
if returns_errs: | ||
errs.append('Errors in Returns section') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you are emulating what you saw with the parameters section, but do we need this? Doesn't look like we are testing for it and I don't necessarily see the utility of it alongside the actual errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, it's not as useful here as in the parameters section, as functions with multiple values are quite rare
I've removed this message and appended errors directly to errs
* removed the message `Errors in Returns section` * simpler variable initialization
@igorfassen you will have to merge conflicts with the changes in #23514, in which we codified the error messages. Let me know if you need help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and thanks for the merge.
Just one comment. For the case when a tuple is being returned, I think it'd be better to report one error for every missing capital or period. I think we do that in See Also
already.
If we do that, we should add tests for it, and also in the error messages add which is the field that is wrong.
Actually, that was the first behavior that was implemented, but I eventually simplified this part (see this part of the discussion: #23432 (review)). |
Sorry, I didn't see that before. I think @WillAyd comment was more about the implementation, that was somehow tricky. I think after the refactoring, it should be a bit easier to have a simpler implementation. You can have separate error messages for the case when there is just one output, and for when there are more. It's a bit more repetitive, but I think the implementation should be very straightforward, same as Sorry about asking too many changes. Does it sound good? |
scripts/validate_docstrings.py
Outdated
else: | ||
if len(doc.returns) == 1 and doc.returns[0][1]: | ||
errs.append(error('RT02')) | ||
missing_desc = missing_cap = missing_period = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need this line? Couldn't we just use the assignments within the loop over doc.returns
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather not even assign to variables but simply check if not desc:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll try something simpler.
my idea was to avoid duplicate error messages, so these variables allowed me to detect the first occurrence of each type of error, and then short-circuit the subsequent tests. but I must admit this was a bit less readable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Only thing is the case with multiple results. Let's see if we can find an agreement, so we can close.
We'll have to add tests, to test specifically cases where only one parameter is missing punctuation or the whole description...
scripts/validate_docstrings.py
Outdated
else: | ||
if len(doc.returns) == 1 and doc.returns[0][1]: | ||
errs.append(error('RT02')) | ||
for name, type_, desc in doc.returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value of name
in the case we've just got a single return?
I'd still like to have the names of the return fields that are failing in the returned error messages, and not give the user something like:
Return value has no description
Return value has no description
Return value has no description
@WillAyd can you propose your preferred solution for this?
For me the best would be the original one:
errs.append(error('RT03', return_name='{} '.format(name) if name else ''))
I think with the way it is implemented now it should be perfectly readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when there is a single return and the first line is correctly written (ie no name is given), name
contains the type, and type_
is empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to rename name
to name_or_type
then, and possibly add a with what you just said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm totally indifferent as I really think this is an edge case. Whatever can be done simply and accurately works for me!
@igorfassen can you make that last change to the name of the |
@igorfassen do you have time to make the last changes, and merge master on this, so we can merge? |
@datapythonista can you update |
lgtm. ping on green. |
all green @jreback |
thanks @igorfassen and @datapythonista |
* upstream/master: REF/TST: replace capture_stdout with pytest capsys fixture (pandas-dev#24501) BUG: fix .iat assignment creates a new column (pandas-dev#24495) DOC: add checks on the returns section in the docstrings (pandas-dev#23138) (pandas-dev#23432) ENH: Add strings_as_fixed_length parameter for df.to_records() (pandas-dev#18146) (pandas-dev#22229) TST: Skip db tests unless explicitly specified in -m pattern (pandas-dev#24492) Mix EA into DTA/TDA; part of 24024 (pandas-dev#24502) DOC: Fix building of a single API document (pandas-dev#24506)
Updated returns checks in
validate_docstrings.py
:Updated
test_validate_docstrings.py
: