Skip to content

DOC: add checks on the returns section in the docstrings (#23138) #23432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 30, 2018

Conversation

igorfassen
Copy link
Contributor

@igorfassen igorfassen commented Oct 31, 2018

Updated returns checks in validate_docstrings.py:

  • check that no name is given when only one value is returned
  • check capitalization and punctuation in descriptions

Updated test_validate_docstrings.py:

  • added some tests to validate these new checks

@pep8speaks
Copy link

pep8speaks commented Oct 31, 2018

Hello @igorfassen! Thanks for updating the PR.

Comment last updated on November 06, 2018 at 10:07 Hours UTC

errs.append('No Returns section found')
if not doc.returns:
if "return" in doc.method_source:
errs.append('No Returns section found')
Copy link
Member

@gfyoung gfyoung Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista @WillAyd : How does this validation handle situations where we use an empty return to break out early from a function? In such a case, we have a return statement, but the documentation wouldn't need to specifies a Returns section per se.

(BTW, I realize that this already in the codebase, but it only struck me just now)

Copy link
Member

@datapythonista datapythonista Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, in Python all the functions return None by default, and a bare return also returns None. I think in those cases the numpydoc standard and ours is to not include the Returns section at all.

I don't think the validation checks in the code whether there is a return or not. This change should be to avoid giving a name to what is being returned if it's just a single value. In my opinion, saying that the function transform returns a variable transformed does not add any value.

We want to name the values returned if the return is a tuple, for example code and name. This PR should validate this.

Does this make sense to you @gfyoung?

Update: Just saw the code (sorry, reviewing from the phone, which is tricky). I forgot we were checking the code for a return statement. I think we should change and use a regex that doesn't capture the bare return then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so replacing
if "return" in doc.method_source:
by something like
if re.search(r"return\b[ \t]*[^;#\n\r]", doc.method_source)
would deal with these cases ?

(maybe I should have another look at what's in doc.method_source)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc.method_source should be the code as a string. Not sure about the exact regex, not sure if looking for a return followed by letter, number or opening brackets would be simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #23488 to take care of the return problem. Let's keep the focus of this PR in the original topic and not mix things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a regex for this? Wouldn't it just be better documented as an optional return value if such a branch is possible to hit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean, but it's better if you mention this approach in #23488, where this will be implemented.

@gfyoung gfyoung requested review from WillAyd and datapythonista and removed request for WillAyd October 31, 2018 20:13
@codecov
Copy link

codecov bot commented Oct 31, 2018

Codecov Report

Merging #23432 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23432      +/-   ##
==========================================
- Coverage   92.31%    92.3%   -0.01%     
==========================================
  Files         166      166              
  Lines       52391    52391              
==========================================
- Hits        48363    48362       -1     
- Misses       4028     4029       +1
Flag Coverage Δ
#multiple 90.73% <ø> (ø) ⬆️
#single 43.05% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/util/testing.py 87.68% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4ac0d6...5ea7881. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, added a few comments, but almost ready to be merged.

errs.append('No Returns section found')
if not doc.returns:
if "return" in doc.method_source:
errs.append('No Returns section found')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #23488 to take care of the return problem. Let's keep the focus of this PR in the original topic and not mix things.

@@ -696,10 +720,18 @@ def test_bad_generic_functions(self, func):
('BadReturns', 'yield_not_documented', ('No Yields section found',)),
pytest.param('BadReturns', 'no_type', ('foo',),
marks=pytest.mark.xfail),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can test this now. I guess we can't know whether we're missing the type or the description. We should return a message that considers both, so the user always know what to do. And test this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, with the current modifications, in this case the error message is Return value has no description
should I replace it by something like Return value has no type or description ?
(well, in the case where a tuple is returned, that would mean that both type and name are missing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while writing that, I realize that I do not check, in the case of multiple return values, if all of them have both a name and a type

if returns_errs:
errs.append('Errors in returns section')
for returns_err in returns_errs:
errs.append('\t{}'.format(returns_err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errs += ['\t{}'.format(e) for e in returns_errs] is probably better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this better as is. Appending seems more idiomatic than adding an intermediary list comp

desc = ''.join(desc)
name = '"' + name + '" ' if type_ else ''
if not desc:
returns_errs.append('Return value {}has no '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm so is the assumption here that name always has a trailing whitespace? If so that seems rather fragile and should probably be fixed directly to remove that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should not be any trailing space in name.
I recognize that what's done here is not very clean :

  • if type_ is empty, which mainly occurs in fact when only one value is returned, which means in fact that name contains the type, then I don't want to print it in the error message.
  • if type_ is not empty, which means both name and type were provided, then I add a trailing space to name, for the error message to be correctly printed.

actually, the main problem I see here is that in the case of multiple returned values, if one of them has no name or type, the error message will not specify where the problem is :
with something like

Returns
-------
foo : int  
   The first value  
bar  
   The second value

will cause the following errors :

   Return value "foo" description should finish with "."
   Return value description should finish with "."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be fine with a more generic error message then that doesn't use the name if it simplifies the code. The instances where multiple values are returned is few and far between to begin with so it shouldn't be hard as an end user to recognize where the issue appears if it comes up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in b09c322, I've removed value names from the error messages (and prevented from printing duplicate error messages)... but I'm not sure wether this simplifies the code...

if returns_errs:
errs.append('Errors in returns section')
for returns_err in returns_errs:
errs.append('\t{}'.format(returns_err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this better as is. Appending seems more idiomatic than adding an intermediary list comp

returns_errs.append('The first line of the Returns section '
'should contain only the type, unless '
'multiple values are being returned.')
missing_desc, missing_cap, missing_period = False, False, False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing_desc = missing_cap = missing_period = False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that said is this even required? It seems like you could simplify things without the or statements below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the or statement are needed to print at most one message of each kind.
as the error messages do not specify the variable names, I assumed it would be clearer to avoid printing the same message twice (or even more).

returns_errs.append('Return value description should finish '
'with ".".')
if returns_errs:
errs.append('Errors in Returns section')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are emulating what you saw with the parameters section, but do we need this? Doesn't look like we are testing for it and I don't necessarily see the utility of it alongside the actual errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, it's not as useful here as in the parameters section, as functions with multiple values are quite rare
I've removed this message and appended errors directly to errs

* removed the message `Errors in Returns section`
* simpler variable initialization
@datapythonista
Copy link
Member

@igorfassen you will have to merge conflicts with the changes in #23514, in which we codified the error messages. Let me know if you need help.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and thanks for the merge.

Just one comment. For the case when a tuple is being returned, I think it'd be better to report one error for every missing capital or period. I think we do that in See Also already.

If we do that, we should add tests for it, and also in the error messages add which is the field that is wrong.

@igorfassen
Copy link
Contributor Author

Actually, that was the first behavior that was implemented, but I eventually simplified this part (see this part of the discussion: #23432 (review)).

@datapythonista
Copy link
Member

Sorry, I didn't see that before. I think @WillAyd comment was more about the implementation, that was somehow tricky.

I think after the refactoring, it should be a bit easier to have a simpler implementation. You can have separate error messages for the case when there is just one output, and for when there are more. It's a bit more repetitive, but I think the implementation should be very straightforward, same as See Also.

Sorry about asking too many changes. Does it sound good?

else:
if len(doc.returns) == 1 and doc.returns[0][1]:
errs.append(error('RT02'))
missing_desc = missing_cap = missing_period = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this line? Couldn't we just use the assignments within the loop over doc.returns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather not even assign to variables but simply check if not desc: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll try something simpler.
my idea was to avoid duplicate error messages, so these variables allowed me to detect the first occurrence of each type of error, and then short-circuit the subsequent tests. but I must admit this was a bit less readable...

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only thing is the case with multiple results. Let's see if we can find an agreement, so we can close.

We'll have to add tests, to test specifically cases where only one parameter is missing punctuation or the whole description...

else:
if len(doc.returns) == 1 and doc.returns[0][1]:
errs.append(error('RT02'))
for name, type_, desc in doc.returns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value of name in the case we've just got a single return?

I'd still like to have the names of the return fields that are failing in the returned error messages, and not give the user something like:

Return value has no description
Return value has no description
Return value has no description

@WillAyd can you propose your preferred solution for this?

For me the best would be the original one:

errs.append(error('RT03', return_name='{} '.format(name) if name else ''))

I think with the way it is implemented now it should be perfectly readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when there is a single return and the first line is correctly written (ie no name is given), name contains the type, and type_ is empty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to rename name to name_or_type then, and possibly add a with what you just said.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally indifferent as I really think this is an edge case. Whatever can be done simply and accurately works for me!

@datapythonista
Copy link
Member

@igorfassen can you make that last change to the name of the name variable, and the error messages?

@datapythonista
Copy link
Member

@igorfassen do you have time to make the last changes, and merge master on this, so we can merge?

@datapythonista datapythonista self-assigned this Dec 7, 2018
@jreback
Copy link
Contributor

jreback commented Dec 27, 2018

@datapythonista can you update

@jreback jreback added this to the 0.24.0 milestone Dec 30, 2018
@jreback
Copy link
Contributor

jreback commented Dec 30, 2018

lgtm. ping on green.

@datapythonista
Copy link
Member

all green @jreback

@jreback jreback merged commit 6df8567 into pandas-dev:master Dec 30, 2018
@jreback
Copy link
Contributor

jreback commented Dec 30, 2018

thanks @igorfassen and @datapythonista

thoo added a commit to thoo/pandas that referenced this pull request Dec 30, 2018
* upstream/master:
  REF/TST: replace capture_stdout with pytest capsys fixture (pandas-dev#24501)
  BUG: fix .iat assignment creates a new column (pandas-dev#24495)
  DOC: add checks on the returns section in the docstrings (pandas-dev#23138) (pandas-dev#23432)
  ENH: Add strings_as_fixed_length parameter for df.to_records() (pandas-dev#18146) (pandas-dev#22229)
  TST: Skip db tests unless explicitly specified in -m pattern (pandas-dev#24492)
  Mix EA into DTA/TDA; part of 24024 (pandas-dev#24502)
  DOC: Fix building of a single API document (pandas-dev#24506)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants