Skip to content

ENH: Use flake8 to check for PEP8 violations in doctests #23399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 4, 2018
33 changes: 31 additions & 2 deletions scripts/tests/test_validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ def head1(self, n=5):

Examples
--------
>>> import pandas as pd
>>> s = pd.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon'])
>>> s.head()
0 Ant
Expand Down Expand Up @@ -164,6 +165,8 @@ def contains(self, pat, case=True, na=np.nan):

Examples
--------
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(['Antelope', 'Lion', 'Zebra', np.nan])
>>> s.str.contains(pat='a')
0 False
Expand Down Expand Up @@ -584,6 +587,29 @@ def prefix_pandas(self):
pass


class BadExamples(object):

def doctest(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doctest is not a very descriptive name if this test is to check whether pep8 should fail because of unknown np. The rest of the example is very verbose to just check that. I'd use the description to explain what is this test about. And in the examples just with >>> np.nan could be enough.

Then, couple more tests could be added, to test the typical pep8 issues we have, like >>> foo = 1+2 (missing spaces)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to use flake8 to check for pep8 issues was to not have to write all these tests. flake8 should be tested enough to work properly.

This test basically only verifies that flake8 is actually testing the docstrings.

"""
Return whether each value contains `pat`.

Examples
--------
>>> import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize before, but there is something tricky here. The convention is to assume numpy and pandas are always imported before the examples. I personally would be explicit, but that's how all the examples are made.

This means two things:

  • pep8 will fail because of this, unless we add the imports when validating
  • we should use different examples for the tests (may be not imorting math?)

And as I said in the previous review, I'd like to have more examples of pep8 failures, like missing spaces around an operator. Also, these examples are too long for what is being tested.

>>> df = pd.DataFrame(np.ones((3, 3)),
... columns=('a', 'b', 'c'))
>>> df.all(1)
0 True
1 True
2 True
dtype: bool

>>> df.all(bool_only=True)
Series([], dtype: bool)
"""
pass


class TestValidator(object):

def _import_path(self, klass=None, func=None):
Expand Down Expand Up @@ -703,10 +729,13 @@ def test_bad_generic_functions(self, func):
# See Also tests
('BadSeeAlso', 'prefix_pandas',
('pandas.Series.rename in `See Also` section '
'does not need `pandas` prefix',))
'does not need `pandas` prefix',)),
# Examples tests
('BadExamples', 'doctest',
('1 F821 undefined name \'np\'',))
])
def test_bad_examples(self, capsys, klass, func, msgs):
result = validate_one(self._import_path(klass=klass, func=func)) # noqa:F821
result = validate_one(self._import_path(klass=klass, func=func))
for msg in msgs:
assert msg in ' '.join(result['errors'])

Expand Down
48 changes: 47 additions & 1 deletion scripts/validate_docstrings.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@
import inspect
import importlib
import doctest
import tempfile
from contextlib import contextmanager

from flake8.main.application import Application as Flake8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it complicates things unnecessarily importing objects other than modules. import contextlib and import flake8.main.application seems like the simplest and best way. And then use accordinly @contextlib.contextmanager and app = flake8.main.application.Application. This way, just by reading those lines it's obvious what is the module and what is being used, and there is not need to go to the imports and start decoding aliases.


try:
from io import StringIO
except ImportError:
Expand Down Expand Up @@ -331,6 +336,41 @@ def parameter_mismatches(self):

return errs

@property
def pep8_violations(self):
with self._file_representation() as file:
application = Flake8()
application.initialize(["--doctests", "--quiet"])
application.run_checks([file.name])
application.report()
stats = application.guide.stats
return [
"{} {} {}".format(s.count, s.error_code, s.message)
for s in stats.statistics_for('')
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally prefer to yield from application.guide.stats.statistics_for('') instead of returning the string. In case the information is used in more than one place, having the components can be useful.

Also, it's an opinion, but I wouldn't use a property here, as this feels more like an action than an attribute. A method validate_pep8 seems easier to understand.


@contextmanager
def _file_representation(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not very descriptive. I think this is the examples as temporary file, so something like _examples_as_temp_file seems much clearer.

Not sure if it's worth having two separate methods, I'd generate the file in the method that validates pep8. But if we have two methods, I'd have this first, as this is being used by the other (pep8_violations).

"""
Temporarily creates file with current function inside.
The signature and body are **not** included.

:returns file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use numpydoc, not this syntax

"""
create_function = 'def {name}():\n' \
' """{doc}"""\n' \
' pass\n'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said, I have a preference to have only the examples code in the temporary file. Extracting the code from the examples is trivial, and once #23161 is merged we'll have a property in this class that returns the lines of code.

What you will have to do is before writing the code in the examples, write import numpy as np and same for pandas, so flake8 does not complain because of that.


name = self.name.split('.')[-1]
lines = self.clean_doc.split("\n")
indented_lines = [(' ' * 4) + line if line else ''
for line in lines[1:]]
doc = '\n'.join([lines[0], *indented_lines])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused with this. I guess I'm missing something. If in our docstring we have something like:

Some desc.

Examples
--------
>>> val = 1+2
>>> print(val)

I'd expect to have in the temporary file to validate:

val = 1+2
print(val)

As we won't validate the description, or anything else. I think numpydoc returns the lines with code, so writing to the file should be a single line of code. Is there any advantage generating a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well flake8 supports doctests by adding --doctests as an argument. And the Examples follow the doctest standard.

Examples
An optional section for examples, using the doctest format.

Having to carve out the examples is not trivial and would require writing multiple tests. So creating a file with a dummy-function is a simple hack, getting ugly cause it has to restore the indentation of the docstring. One test of the BadExamples class ensures it is working.

with tempfile.NamedTemporaryFile(mode='w', suffix='.py') as file:
file.write(create_function.format(name=name, doc=doc))
file.flush()
yield file

@property
def correct_parameters(self):
return not bool(self.parameter_mismatches)
Expand Down Expand Up @@ -446,7 +486,7 @@ def validate_one(func_name):
if doc.summary != doc.summary.lstrip():
errs.append('Summary contains heading whitespaces.')
elif (doc.is_function_or_method
and doc.summary.split(' ')[0][-1] == 's'):
and doc.summary.split(' ')[0][-1] == 's'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be better to not do unrelated changes

errs.append('Summary must start with infinitive verb, '
'not third person (e.g. use "Generate" instead of '
'"Generates")')
Expand Down Expand Up @@ -490,6 +530,12 @@ def validate_one(func_name):
for param_err in param_errs:
errs.append('\t{}'.format(param_err))

pep8_errs = doc.pep8_violations
if pep8_errs:
errs.append('Errors in doctests')
for pep8_err in pep8_errs:
errs.append('\t{}'.format(pep8_err))

if doc.is_function_or_method:
if not doc.returns and "return" in doc.method_source:
errs.append('No Returns section found')
Expand Down