-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Make validate_docstrings.py ready for the CI #23514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make validate_docstrings.py ready for the CI #23514
Conversation
…rs (so, 0 for no errors)
… to filter which docstrings are validated
Hello @datapythonista! Thanks for submitting the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Let's start with these comments and go from there.
In your opinion, what does our end goal look like? I'm just wondering whether we should have a list of prefixes that are succeeding, or a list of allowed failures. The benefit of the allowed failures is that new APIs (outside the existing prefixes) are automatically checked by default. If we go with a list of "good" prefixes, then we need to remember to add the new API to the list. |
@TomAugspurger that's a very good point, it was discussed a bit in #23481. Ideally we want to run for all errors and all docstrings, and fail the CI if something is wrong. We have around 9,000 errors at the moment (and still adding new validations). I agree excluding docstrings (like what we do with the doctests) is likely to be very useful. But I prefer to keep that for a second PR, so this one is not too complex, and I can play a bit with this before thinking on strategies to validate as much as possible in the best way. So far with this we should be able to validate for specific errors (like pep8 in examples, or parameter mismatches) in an incremental way (start by |
completely agree. Just thinking towards the end goal, but we're a ways from that right now :) |
@gfyoung I made the changes. I was thinking on an additional change, but I want to know your opinion. It'd be having all the error messages in a dictionary: ERR_MSG = {
...
'PR04': 'Parameter "{param}" has no type',
...
def validate():
...
if some_cond:
# may be the tuple (code, msg) could be created with a function, so this looks clearer
errs.append(('PR04', ERR_MSG['PR04'].format(param=param)))
... I'm unsure, because we will loose the error messages next to the conditions. But without the long messages in between all the conditions I think the code will be neater in that part (the most critical of this script). And we also have the reference of error codes to error messages clearer in one place (and it can be used externally, to map codes to messages, for example if I generate a report of the most frequent errors. What do you think? |
@datapythonista : I think putting it in a dictionary seems reasonable. That way the implementation of the function doesn't expand with more error messages, only with logic changes, if that makes sense. |
thanks for the feedback, I'll change it then, and we can see in practice |
+1 on gathering all codes and (base) error messages together in a dict, if possible. That would make it easier to see which codes/errors we have (but of course might make it a bit less clear in the code what the message is where it is raised) |
Made the changes. I do think it makes things clearer. |
Codecov Report
@@ Coverage Diff @@
## master #23514 +/- ##
=======================================
Coverage 92.25% 92.25%
=======================================
Files 161 161
Lines 51169 51169
=======================================
Hits 47207 47207
Misses 3962 3962
Continue to review full report at Codecov.
|
All green. If there are no more changes required, could someone take another look and merge? There is some other work going on on adding validations, and would be nice to have this merged soon. |
Thanks! |
Thanks @jorisvandenbossche. I just detected a typo in an error that wasn't tested. I'm fixing it now, and adding the test, will open a separate PR. |
* validate_docstrings.py to exit with status code as the number of errors (so, 0 for no errors) * Implemented different output types for the validate_all, and a prefix to filter which docstrings are validated * Codifying errors * Adding --errors parameter to be able to validate only specific errors
* validate_docstrings.py to exit with status code as the number of errors (so, 0 for no errors) * Implemented different output types for the validate_all, and a prefix to filter which docstrings are validated * Codifying errors * Adding --errors parameter to be able to validate only specific errors
* validate_docstrings.py to exit with status code as the number of errors (so, 0 for no errors) * Implemented different output types for the validate_all, and a prefix to filter which docstrings are validated * Codifying errors * Adding --errors parameter to be able to validate only specific errors
* validate_docstrings.py to exit with status code as the number of errors (so, 0 for no errors) * Implemented different output types for the validate_all, and a prefix to filter which docstrings are validated * Codifying errors * Adding --errors parameter to be able to validate only specific errors
git diff upstream/master -u -- "*.py" | flake8 --diff
This will allow us to start having validation rules for the docstings in the CI, for example, we can have:
Which will finish with an exit status different than 0, validate only certain docstrings (the ones starting with
pandas.read_
), and validate only specific errors (in this case onlyEX03
which validates pep8 in the docstrings). The output format can be a JSON file, simple messages, or the messages formatted to be highlighted in azure.@jorisvandenbossche @gfyoung if you can review, and give this a bit of priority... there will be conflicts soon, as there are several issues that add new rules. Thanks!