Skip to content

DOC: Add ignore_functions option to validate_docstrings.py #50509

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Moisan
Copy link
Contributor

@Moisan Moisan commented Dec 30, 2022

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Following @MarcoGorelli comment, this PR adds an option to validate_docstrings.py to ignore functions. This allows to ignore the remaining RT02 errors, described in #49968, in ci/code_checks.sh. This will also be useful for other documentation-related errors that required multiple fixes.

@MarcoGorelli MarcoGorelli self-requested a review December 30, 2022 19:00
@MarcoGorelli
Copy link
Member

thanks for working on this! i'll take a look

@mroeschke mroeschke added the Code Style Code style, linting, code_checks label Jan 3, 2023
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unsure about this. I like that it allows to validate more things, and avoid making mistakes in docstrings already passing RT02, but at the same time this can start to become quite complex, if for each docstring error we need to keep a separate call and the list of docstrings not passing it.

Let's see what others think.

Comment on lines 322 to 323
if ignore_functions is None:
ignore_functions = {}
else:
ignore_functions = set(ignore_functions)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this approach is a bit nicer?

>>> ignore_functions = ['pandas.read_csv', 'pandas.read_excel']
>>> set(ignore_functions or [])
{'pandas.read_csv', 'pandas.read_excel'}
>>> ignore_functions = None
>>> set(ignore_functions or [])
set()

Comment on lines 335 to 337
if (
prefix and not func_name.startswith(prefix)
) or func_name in ignore_functions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably have a separate if, this is a bit cumbersome, and looks like black makes it even more unreadable.

Suggested change
if (
prefix and not func_name.startswith(prefix)
) or func_name in ignore_functions:
if func_name in ignore_functions:
continue
if prefix and not func_name.startswith(prefix):

"--ignore_functions",
default=None,
help="function or method to not validate "
"(e.g. Pandas.DataFrame.head). "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"(e.g. Pandas.DataFrame.head). "
"(e.g. pandas.DataFrame.head). "

@datapythonista datapythonista added Docs CI Continuous Integration labels Jan 4, 2023
@MarcoGorelli
Copy link
Member

I like that it allows to validate more things, and avoid making mistakes in docstrings already passing RT02, but at the same time this can start to become quite complex, if for each docstring error we need to keep a separate call and the list of docstrings not passing it.

True, but this should hopefully just be temporary - once the errors are fixed, then this separate call can be removed

@datapythonista
Copy link
Member

True, but this should hopefully just be temporary - once the errors are fixed, then this separate call can be removed

I think we started fixing docstrings 5 years ago ;) But sure, seems fine to me if that's the plan/

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall +1 for the change, just got some requests

also, it would be good to add a little test to scripts/tests/test_validate_docstrings.py


I think we started fixing docstrings 5 years ago ;)

😄 true, but I'm hoping that with a list of what still needs doing (that contributors can tick off as part of their PR), then this will eventually get addressed. at least, in similar checks, having a list of exclusions helped

@@ -83,6 +83,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX04,GL01,GL02,GL03,GL04,GL05,GL06,GL07,GL09,GL10,PR03,PR04,PR05,PR06,PR08,PR09,PR10,RT01,RT04,RT05,SA02,SA03,SA04,SS01,SS02,SS03,SS04,SS05,SS06
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Partially validate docstrings (RT02)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=RT02 --ignore_functions=pandas.Series.align,pandas.Series.dt.total_seconds,pandas.Series.cat.rename_categories,pandas.Series.cat.reorder_categories,pandas.Series.cat.add_categories,pandas.Series.cat.remove_categories,pandas.Series.cat.remove_unused_categories,pandas.Index.all,pandas.Index.any,pandas.CategoricalIndex.rename_categories,pandas.CategoricalIndex.reorder_categories,pandas.CategoricalIndex.add_categories,pandas.CategoricalIndex.remove_categories,pandas.CategoricalIndex.remove_unused_categories,pandas.MultiIndex.drop,pandas.DatetimeIndex.to_pydatetime,pandas.TimedeltaIndex.to_pytimedelta,pandas.core.groupby.SeriesGroupBy.apply,pandas.core.groupby.DataFrameGroupBy.apply,pandas.io.formats.style.Styler.export,pandas.api.extensions.ExtensionArray.astype,pandas.api.extensions.ExtensionArray.dropna,pandas.api.extensions.ExtensionArray.isna,pandas.api.extensions.ExtensionArray.repeat,pandas.api.extensions.ExtensionArray.unique,pandas.DataFrame.align
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to cause too many merge conflicts, can you write it as

    $BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=RT02 --ignore_functions \
        pandas.Series.align \
        pandas.Series.dt.total_seconds \
        pandas.Series.cat.rename_categories \
        pandas.Series.cat.reorder_categories \

please?

see below for what else needs to change to enable this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, each contributor can just make their change, and remove one single line

@@ -473,5 +491,6 @@ def main(func_name, prefix, errors, output_format, ignore_deprecated):
args.errors.split(",") if args.errors else None,
args.format,
args.ignore_deprecated,
args.ignore_functions.split(",") if args.ignore_functions else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use args.ignore_functions

@@ -464,6 +475,13 @@ def main(func_name, prefix, errors, output_format, ignore_deprecated):
"deprecated objects are ignored when validating "
"all docstrings",
)
argparser.add_argument(
"--ignore_functions",
default=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set nargs='*' and remove this default

@Moisan Moisan force-pushed the validate_docstrings_ignore_functions branch from 2db9420 to 2813305 Compare January 6, 2023 20:00
@Moisan
Copy link
Contributor Author

Moisan commented Jan 6, 2023

Thanks for reviews! I did a small refactor to make the testing easier. I also rebased on main.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Moisan, looks great. I added couple of suggestions, but happy to get this merged as is too if you disagree with them.

prefix=None,
ignore_functions=["pandas.DataFrame.align"],
)
assert len(result) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides checking that the ignored function is ignored, I guess it would make sense that a non-ignored function is still present. If we do that, besides checking the length, I guess an assert to check the name of the returned function would be helpful.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Moisan, looks perfect, great job

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Jan 8, 2023
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Moisan !

@MarcoGorelli MarcoGorelli merged commit f47a8b8 into pandas-dev:main Jan 8, 2023
@Moisan Moisan deleted the validate_docstrings_ignore_functions branch January 8, 2023 19:51
@MarcoGorelli
Copy link
Member

I think we started fixing docstrings 5 years ago ;)

EX02 is now enabled 🎉

I've come to believe that the only way to get validation issues fixed is to get something running in CI as quickly as possible (with an exclusions list if necessary). So thanks again @Moisan for this script, looks like it ended up delivering!

Let's see if EX01 follows a similar fate 🤞

@datapythonista
Copy link
Member

Excellent work. Thanks @MarcoGorelli for leading this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Code Style Code style, linting, code_checks Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants