-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Add ignore_functions option to validate_docstrings.py #50509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Add ignore_functions option to validate_docstrings.py #50509
Conversation
thanks for working on this! i'll take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unsure about this. I like that it allows to validate more things, and avoid making mistakes in docstrings already passing RT02
, but at the same time this can start to become quite complex, if for each docstring error we need to keep a separate call and the list of docstrings not passing it.
Let's see what others think.
scripts/validate_docstrings.py
Outdated
if ignore_functions is None: | ||
ignore_functions = {} | ||
else: | ||
ignore_functions = set(ignore_functions) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this approach is a bit nicer?
>>> ignore_functions = ['pandas.read_csv', 'pandas.read_excel']
>>> set(ignore_functions or [])
{'pandas.read_csv', 'pandas.read_excel'}
>>> ignore_functions = None
>>> set(ignore_functions or [])
set()
scripts/validate_docstrings.py
Outdated
if ( | ||
prefix and not func_name.startswith(prefix) | ||
) or func_name in ignore_functions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably have a separate if
, this is a bit cumbersome, and looks like black makes it even more unreadable.
if ( | |
prefix and not func_name.startswith(prefix) | |
) or func_name in ignore_functions: | |
if func_name in ignore_functions: | |
continue | |
if prefix and not func_name.startswith(prefix): |
scripts/validate_docstrings.py
Outdated
"--ignore_functions", | ||
default=None, | ||
help="function or method to not validate " | ||
"(e.g. Pandas.DataFrame.head). " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"(e.g. Pandas.DataFrame.head). " | |
"(e.g. pandas.DataFrame.head). " |
True, but this should hopefully just be temporary - once the errors are fixed, then this separate call can be removed |
I think we started fixing docstrings 5 years ago ;) But sure, seems fine to me if that's the plan/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall +1 for the change, just got some requests
also, it would be good to add a little test to scripts/tests/test_validate_docstrings.py
I think we started fixing docstrings 5 years ago ;)
😄 true, but I'm hoping that with a list of what still needs doing (that contributors can tick off as part of their PR), then this will eventually get addressed. at least, in similar checks, having a list of exclusions helped
ci/code_checks.sh
Outdated
@@ -83,6 +83,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then | |||
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX04,GL01,GL02,GL03,GL04,GL05,GL06,GL07,GL09,GL10,PR03,PR04,PR05,PR06,PR08,PR09,PR10,RT01,RT04,RT05,SA02,SA03,SA04,SS01,SS02,SS03,SS04,SS05,SS06 | |||
RET=$(($RET + $?)) ; echo $MSG "DONE" | |||
|
|||
MSG='Partially validate docstrings (RT02)' ; echo $MSG | |||
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=RT02 --ignore_functions=pandas.Series.align,pandas.Series.dt.total_seconds,pandas.Series.cat.rename_categories,pandas.Series.cat.reorder_categories,pandas.Series.cat.add_categories,pandas.Series.cat.remove_categories,pandas.Series.cat.remove_unused_categories,pandas.Index.all,pandas.Index.any,pandas.CategoricalIndex.rename_categories,pandas.CategoricalIndex.reorder_categories,pandas.CategoricalIndex.add_categories,pandas.CategoricalIndex.remove_categories,pandas.CategoricalIndex.remove_unused_categories,pandas.MultiIndex.drop,pandas.DatetimeIndex.to_pydatetime,pandas.TimedeltaIndex.to_pytimedelta,pandas.core.groupby.SeriesGroupBy.apply,pandas.core.groupby.DataFrameGroupBy.apply,pandas.io.formats.style.Styler.export,pandas.api.extensions.ExtensionArray.astype,pandas.api.extensions.ExtensionArray.dropna,pandas.api.extensions.ExtensionArray.isna,pandas.api.extensions.ExtensionArray.repeat,pandas.api.extensions.ExtensionArray.unique,pandas.DataFrame.align |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to cause too many merge conflicts, can you write it as
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=RT02 --ignore_functions \
pandas.Series.align \
pandas.Series.dt.total_seconds \
pandas.Series.cat.rename_categories \
pandas.Series.cat.reorder_categories \
please?
see below for what else needs to change to enable this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, each contributor can just make their change, and remove one single line
scripts/validate_docstrings.py
Outdated
@@ -473,5 +491,6 @@ def main(func_name, prefix, errors, output_format, ignore_deprecated): | |||
args.errors.split(",") if args.errors else None, | |||
args.format, | |||
args.ignore_deprecated, | |||
args.ignore_functions.split(",") if args.ignore_functions else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use args.ignore_functions
scripts/validate_docstrings.py
Outdated
@@ -464,6 +475,13 @@ def main(func_name, prefix, errors, output_format, ignore_deprecated): | |||
"deprecated objects are ignored when validating " | |||
"all docstrings", | |||
) | |||
argparser.add_argument( | |||
"--ignore_functions", | |||
default=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set nargs='*'
and remove this default
2db9420
to
2813305
Compare
Thanks for reviews! I did a small refactor to make the testing easier. I also rebased on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Moisan, looks great. I added couple of suggestions, but happy to get this merged as is too if you disagree with them.
prefix=None, | ||
ignore_functions=["pandas.DataFrame.align"], | ||
) | ||
assert len(result) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides checking that the ignored function is ignored, I guess it would make sense that a non-ignored function is still present. If we do that, besides checking the length, I guess an assert to check the name of the returned function would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @Moisan, looks perfect, great job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Moisan !
EX02 is now enabled 🎉 I've come to believe that the only way to get validation issues fixed is to get something running in CI as quickly as possible (with an exclusions list if necessary). So thanks again @Moisan for this script, looks like it ended up delivering! Let's see if EX01 follows a similar fate 🤞 |
Excellent work. Thanks @MarcoGorelli for leading this! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Following @MarcoGorelli comment, this PR adds an option to
validate_docstrings.py
to ignore functions. This allows to ignore the remainingRT02
errors, described in #49968, inci/code_checks.sh
. This will also be useful for other documentation-related errors that required multiple fixes.