Skip to content

DOC: update docstring validation script #19960

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions scripts/validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import inspect
import importlib
import doctest
import textwrap
import pydoc
try:
from io import StringIO
except ImportError:
Expand Down Expand Up @@ -72,8 +72,7 @@ class Docstring:
def __init__(self, method_name, method_obj):
self.method_name = method_name
self.method_obj = method_obj
self.raw_doc = method_obj.__doc__ or ''
self.raw_doc = textwrap.dedent(self.raw_doc)
self.raw_doc = pydoc.getdoc(method_obj)
self.doc = NumpyDocString(self.raw_doc)

def __len__(self):
Expand Down Expand Up @@ -127,7 +126,12 @@ def doc_parameters(self):

@property
def signature_parameters(self):
if not inspect.isfunction(self.method_obj):
if not (inspect.isfunction(self.method_obj)
or inspect.isclass(self.method_obj)):
return tuple()
if (inspect.isclass(self.method_obj)
and self.method_name.split('.')[-1] in {'dt', 'str', 'cat'}):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it difficult to find the accessors that have been registered, instead of hardcoding them here? So we don't need to remember to change this when a new accessor is created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was looking into the class (eg that I could check they inherit from a base Accessor class, but that's not consistent between them).
So not really sure there is a better way, but totally agree it is not ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is a _accessors attribute. But not fully sure to what extent we can rely on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is then that I need to get the class (eg Series in case of pandas.Series.dt), which is also difficult to introspect. So maybe the easiest is to leave it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's quite useful as it is, we can improve it in the future if we find a way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: #19963

I think isinstance(object.__getattribute__(self, method_obj.__name__), CachedAccessor) might have done it, But having an official registry is better :)

# accessor classes have a signature, but don't want to show this
return tuple()
params = tuple(inspect.signature(self.method_obj).parameters.keys())
if params and params[0] in ('self', 'cls'):
Expand All @@ -149,7 +153,8 @@ def parameter_mismatches(self):
extra = set(doc_params) - set(signature_params)
if extra:
errs.append('Unknown parameters {!r}'.format(extra))
if not missing and not extra and signature_params != doc_params:
if (not missing and not extra and signature_params != doc_params
and not (not signature_params and not doc_params)):
errs.append('Wrong parameters order. ' +
'Actual: {!r}. '.format(signature_params) +
'Documented: {!r}'.format(doc_params))
Expand Down Expand Up @@ -180,6 +185,11 @@ def deprecated(self):
bool(pattern.search(self.summary)) or
bool(pattern.search(self.extended_summary)))

@property
def mentioned_private_classes(self):
private_classes = ['NDFrame', 'IndexOpsMixin']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd keep this in a "constant" at the beginning of the file, but I it's just a personal preference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

return [klass for klass in private_classes if klass in self.raw_doc]

@property
def examples_errors(self):
flags = doctest.NORMALIZE_WHITESPACE | doctest.IGNORE_EXCEPTION_DETAIL
Expand Down Expand Up @@ -311,6 +321,10 @@ def validate_one(func_name):
for param_err in param_errs:
errs.append('\t{}'.format(param_err))

mentioned_errs = doc.mentioned_private_classes
if mentioned_errs:
errs.append('Private classes mentioned: {}'.format(mentioned_errs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be for junior people it'd be more useful a message like "Private classes {} should not be mentioned in public API docstrings". May be I'm underestimating people capacity to understand error messages, but I anticipate people asking what they need to do about this error message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion! I was thinking on a longer message. Because in principle I can even say what they should replace it with, but that will maybe become a bit too complicated


examples_errs = ''
if not doc.examples:
errs.append('No examples section found')
Expand Down