-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Validate in docstrings that numpy and pandas are not imported #23161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Validate in docstrings that numpy and pandas are not imported #23161
Conversation
Hello @thoo! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23161 +/- ##
=======================================
Coverage 92.23% 92.23%
=======================================
Files 161 161
Lines 51197 51197
=======================================
Hits 47220 47220
Misses 3977 3977
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work, added some comments.
@@ -502,6 +502,20 @@ def no_punctuation(self): | |||
return "Hello world!" | |||
|
|||
|
|||
class BadExamples(object): | |||
|
|||
def npPd_import(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this name is a bit cryptic, can you use something more clear.
|
||
def npPd_import(self): | ||
""" | ||
Provide example with numpy and pandas import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you finish this with a period, so it doesn't generate an error for it.
Examples | ||
-------- | ||
import numpy as np | ||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in examples start by >>>
It'd be good to have tests for import numpy
and import pandas
(without the aliases), and have each of them separate. And also a case when something else is imported (e.g. import datetime
)
Not sure if in the good docstrings in the tests we have examples somewhere. We should, to make sure the error is not reported when import numpy
... is not present.
scripts/validate_docstrings.py
Outdated
@@ -512,6 +512,12 @@ def validate_one(func_name): | |||
examples_errs = doc.examples_errors | |||
if examples_errs: | |||
errs.append('Examples do not pass tests') | |||
if 'import numpy as np' in ' '.join(doc.examples): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to use the module doctest
to extract the content of the code in the examples. Meaning that something like this:
Examples
--------
This example does not import pandas as pd
>>> pd.Series(['pd is assumed to be imported as we want'])
should not generate an error.
And probably you want to include an example like this in the tests.
@datapythonista Thanks for the review. Really appreciate it. Will be back. :) |
cf75147
to
5c071ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to have a space at the beginning of the error messages? I don't see it in the other messages, I'd keep consistent.
Also, the rstrip
in the code seems unnecessary, and if we use that method for something else, removing the break lines can be a problem. If I'm not missing something, I'd keep the break lines.
Other than that, looks good.
@datapythonista Thanks for the help. It looks like one of Travis' tests failed last run. Is it possible conda update in Travis somehow breaks it? I am learning a lot. :):) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm checking from the phone, and is difficult to review in detail, but looks good.
Is the test with the good import being checked? If it is, lgtm.
Can't check the travis logs now, but if ut's failing when installing anaconda the only thing we can do is run the build again. For it, you usually update your beanch from master and push it. So, you don't change anything, but as the PR is updated travis runs again. In this case as you did changes is not needed. |
The travis failure can be ignored. I restarted the azure build. That failed with a pyarrow could not load DLL issue. |
Thank you guys. Appreciate it!!! |
@WillAyd can you take a look and merge if you're happy? |
@@ -502,6 +514,33 @@ def no_punctuation(self): | |||
return "Hello world!" | |||
|
|||
|
|||
class BadExamples(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is already an example that imports both of these:
>>> import numpy as np |
Any reason we don't leverage that instead of creating a new class?
…xamples * repo_org/master: (83 commits) DOC: Add docstring validations for "See Also" section (pandas-dev#23143) TST: Fix test assertion (pandas-dev#23357) BUG: Handle Period in combine (pandas-dev#23350) REF: SparseArray imports (pandas-dev#23329) CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992) DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051) Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341) TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304) API: Add sparse Acessor (pandas-dev#23183) PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235) fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302) BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317) TST: re-enable gbq tests (pandas-dev#23303) Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311) isort imports-io (pandas-dev#23332) DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279) REF: Make PeriodArray an ExtensionArray (pandas-dev#22862) DOC: Added Examples for Series max (pandas-dev#23298) API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644) BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291) ...
@WillAyd Could you re-review this? I updated it with your suggestion. |
…xamples * repo_org/master: DOC: Validate that See Also section items do not contain the pandas. prefix (pandas-dev#23145) API/TST: make hasnans always return python booleans (pandas-dev#23349)
@WillAyd I just resolve the conflicts with the base branch and it should be ready for re-review. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, added couple of last things, and I think we're ready to merge.
scripts/validate_docstrings.py
Outdated
def examples_codes(self): | ||
codes = doctest.DocTestParser().get_examples(self.raw_doc) | ||
codes = [line.source for line in codes] | ||
return ' '.join(codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you return a list with each line instead? We'll reuse this property to validate pep8 too, and returning a single string with all the code joined is not useful in that case.
Also, at least to me examples_codes
is somehow misleading. Can you use examples_source_code
instead?
scripts/validate_docstrings.py
Outdated
if 'import numpy' in examples_codes: | ||
errs.append('Examples should not have `import numpy` ') | ||
if 'import pandas' in examples_codes: | ||
errs.append('Examples should not have `import pandas` ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the error could be more descriptive. pandas does not need to be imported in the examples, as it's assumed to be already imported as pd
. I think something like that provides more useful information to devs facing this error.
@@ -218,6 +218,18 @@ def mode(self, axis, numeric_only): | |||
""" | |||
pass | |||
|
|||
def good_import(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def good_import(self): | |
def good_imports(self): |
…xamples * repo_org/master: (66 commits) CLN: doc string (pandas-dev#23469) DOC: Add cookbook entry for triangular correlation matrix (GH22840) (pandas-dev#23032) add number of Errors, Warnings to scripts/validate_docstrings.py (pandas-dev#23150) BUG: Allow freq conversion from dt64 to period (pandas-dev#23460) ENH: Add FrozenList.union and .difference (pandas-dev#23394) REF: cython cleanup, typing, optimizations (pandas-dev#23464) strictness and checks for Timedelta _simple_new (pandas-dev#23433) Fixing flake8 problems new to flake8 3.6.0 (pandas-dev#23472) DOC: Updating the docstring of Series.dot (pandas-dev#22890) TST: Fixturize series/test_analytics.py (pandas-dev#22755) BUG/ENH: Handle NonexistentTimeError in date rounding (pandas-dev#23406) PERF: speed up concat on Series by making _get_axis_number() a classmethod (pandas-dev#23404) REF: Remove DatetimelikeArrayMixin._shallow_copy (pandas-dev#23430) REF: strictness/simplification in DatetimeArray/Index _simple_new (pandas-dev#23431) REF: cython cleanup, typing, optimizations (pandas-dev#23456) TST: tweak Hypothesis configuration and idioms (pandas-dev#23441) BUG: fix HDFStore.append with all empty strings error (GH12242) (pandas-dev#23435) TST: Skip 32bit failing IntervalTree tests (pandas-dev#23442) BUG: Deprecate nthreads argument (pandas-dev#23112) style: fix import format at pandas/core/reshape (pandas-dev#23387) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for the changes.
Merging on green, but I added some final ideas in case you want to apply them, but merging anyway.
scripts/validate_docstrings.py
Outdated
def examples_source_code(self): | ||
codes = doctest.DocTestParser().get_examples(self.raw_doc) | ||
codes = [line.source for line in codes] | ||
return codes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also return in the previous line.
May be lines
is a better name? codes sounds to me like identifiers
scripts/validate_docstrings.py
Outdated
errs.append("Numpy does not need to be imported in the examples, " | ||
"as it's assumed to be already imported as np") | ||
if 'import pandas' in ' '.join(examples_source_code): | ||
errs.append("Pandas does not need to be imported in the examples, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't usually use capitalized pandas, even at the beginning of sentence. And in this case, as it's more the name of the module than the project, I'd use lower case also for numpy.
scripts/validate_docstrings.py
Outdated
@@ -531,6 +537,13 @@ def validate_one(func_name): | |||
examples_errs = doc.examples_errors | |||
if examples_errs: | |||
errs.append('Examples do not pass tests') | |||
examples_source_code = doc.examples_source_code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be you can do the join here, so doesn't need to be repeated? also, is the lines end with \n
I'd join by ''
, otherwise by '\n'
. Doesn't change the result, but conceptually the content of the variable is something that makes more sense.
@datapythonista Thanks for the review. I update the files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I meant in the comment about naming lines and returning (your changes make sense, but just partially, based on my comment)
scripts/validate_docstrings.py
Outdated
@@ -402,6 +402,12 @@ def examples_errors(self): | |||
error_msgs += f.getvalue() | |||
return error_msgs | |||
|
|||
@property | |||
def examples_source_code(self): | |||
codes = doctest.DocTestParser().get_examples(self.raw_doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codes = doctest.DocTestParser().get_examples(self.raw_doc) | |
lines = doctest.DocTestParser().get_examples(self.raw_doc) |
scripts/validate_docstrings.py
Outdated
@property | ||
def examples_source_code(self): | ||
codes = doctest.DocTestParser().get_examples(self.raw_doc) | ||
lines = [line.source for line in codes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines = [line.source for line in codes] | |
return [line.source for line in lines] |
scripts/validate_docstrings.py
Outdated
def examples_source_code(self): | ||
codes = doctest.DocTestParser().get_examples(self.raw_doc) | ||
lines = [line.source for line in codes] | ||
return lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return lines |
Thanks once more @thoo, will merge on green. |
Thanks! |
Fix #23134