-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Make validate_docstrings.py ready for the CI #23481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Makes sense to me. As far as the prefix goes do you see a logical way to target particular modules / package structures? Anything lower than that would probably be too detailed (even module-level might be) so if we identify a package to start could open that as a particular issue for the community to solve as a pre-cursor |
My first thought was on using files, as we have for the doctests. Then I realized that we don't get the docstrings from the files by instrospection, but we instead get the list of public objects from the docs as strings. So, my idea is to simply have an But I think the best we can do is start by something simple, and then see in practice if we need anything else or not. May adding a |
That seems fine with me. Or more general could be a pattern. Not sure if such a more advanced selector will be needed though (maybe a use case would be to select/deselect a single method name accross Series/DataFrame(/Index). This might even be useful for interactive usage when working on a docstring). |
One general concern: how do you envision to use it on the CI? How do you now think we will specify which functions to run and which errors to check? Because with the outline above (and the PR), you can specify a subset of functions and/or a set of error codes to check. But I can imagine that at some point, we could need a higher granularity (like, for this function ignore this error code, for that function ignore that error code, ..). In a similar way that you can skip a specific a doctest with a flag, or have the special comments to overrule flake8 in specific cases. Or the goal will be that every docstring will pass the validation completely? (I think we currently have some cases where we are fine with a certain error, because of limitations in the script, or sometimes making an exception might make something clearer / more readable, etc) |
That's an interesting discussion. I don't know right now, I think we'll have to experiment a bit once this is ready. The first thing I'd like is to validate specific errors. For example, parameter mismatches and pep8 issues in the examples. I need to see how many of them we have, but if there are few for
Excluding docstrings from the validation is likely to be useful. And I'd also like to be able to say Does this make sense? |
Since #22408 was merged,
validate_docstrings.py
is able to validate all docstrings when called without parameters, returning the output as a json file.Next step would be to make the script ready to be used in the CI, so we can fail when there are validation errors in the changes to a docstring.
The main changes required are:
0
when no error has been encountered, and a different number (possibly the number of found errors) when something is wrong.stderr
in a format appropriate for the CI (currently is a json file instdout
). As the code checks will soon be performed in Azure (BLD: Use Azure pipelines for linting #22844), the output format will ideally be customizable with a--format
parameter, so we can call the script providing the azure format (equivalent toflake8 --format
parameter).As the number of errors is currently huge (8,404 errors in total), we'll have to add the validations to the CI in different stages. To allow that, the script will also have to support:
--prefix
parameter, accepting a string of the formatpandas.Series.str
. Meaning that only the functions/methods... starting by that string will be validated--errors
parameter, accepting a error code, or comma separated list of error codes, that will be validated. This will also require codifying all the errors that we currently validateMay be it'd also be useful to have something like a
--exclude
parameter, but will leave this for later, once is needed for a specific case.@jorisvandenbossche @WillAyd @jbrockmendel any feedback on this before I start working on it?
The text was updated successfully, but these errors were encountered: