Skip to content

DOC: Validate in docstrings that numpy and pandas are not imported #23161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 4, 2018

Conversation

thoo
Copy link
Contributor

@thoo thoo commented Oct 15, 2018

Fix #23134

@pep8speaks
Copy link

Hello @thoo! Thanks for submitting the PR.

@thoo thoo closed this Oct 15, 2018
@thoo thoo reopened this Oct 15, 2018
@codecov
Copy link

codecov bot commented Oct 15, 2018

Codecov Report

Merging #23161 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #23161   +/-   ##
=======================================
  Coverage   92.23%   92.23%           
=======================================
  Files         161      161           
  Lines       51197    51197           
=======================================
  Hits        47220    47220           
  Misses       3977     3977
Flag Coverage Δ
#multiple 90.61% <ø> (ø) ⬆️
#single 42.27% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d78bd7a...ce37c8a. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work, added some comments.

@@ -502,6 +502,20 @@ def no_punctuation(self):
return "Hello world!"


class BadExamples(object):

def npPd_import(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this name is a bit cryptic, can you use something more clear.


def npPd_import(self):
"""
Provide example with numpy and pandas import
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you finish this with a period, so it doesn't generate an error for it.

Examples
--------
import numpy as np
import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in examples start by >>>

It'd be good to have tests for import numpy and import pandas (without the aliases), and have each of them separate. And also a case when something else is imported (e.g. import datetime)

Not sure if in the good docstrings in the tests we have examples somewhere. We should, to make sure the error is not reported when import numpy... is not present.

@@ -512,6 +512,12 @@ def validate_one(func_name):
examples_errs = doc.examples_errors
if examples_errs:
errs.append('Examples do not pass tests')
if 'import numpy as np' in ' '.join(doc.examples):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to use the module doctest to extract the content of the code in the examples. Meaning that something like this:

Examples
--------
This example does not import pandas as pd
>>> pd.Series(['pd is assumed to be imported as we want'])

should not generate an error.

And probably you want to include an example like this in the tests.

@thoo
Copy link
Contributor Author

thoo commented Oct 15, 2018

@datapythonista Thanks for the review. Really appreciate it. Will be back. :)

@datapythonista datapythonista added Docs CI Continuous Integration labels Oct 15, 2018
@datapythonista datapythonista changed the title Check numpy and pandas in Examples DOC: Validate in docstrings that numpy and pandas are not imported Oct 15, 2018
@thoo thoo force-pushed the check_np_pd_fromExamples branch from cf75147 to 5c071ec Compare October 16, 2018 07:40
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to have a space at the beginning of the error messages? I don't see it in the other messages, I'd keep consistent.

Also, the rstrip in the code seems unnecessary, and if we use that method for something else, removing the break lines can be a problem. If I'm not missing something, I'd keep the break lines.

Other than that, looks good.

@thoo
Copy link
Contributor Author

thoo commented Oct 16, 2018

@datapythonista Thanks for the help. It looks like one of Travis' tests failed last run. Is it possible conda update in Travis somehow breaks it? I am learning a lot. :):)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking from the phone, and is difficult to review in detail, but looks good.

Is the test with the good import being checked? If it is, lgtm.

@datapythonista
Copy link
Member

Can't check the travis logs now, but if ut's failing when installing anaconda the only thing we can do is run the build again. For it, you usually update your beanch from master and push it. So, you don't change anything, but as the PR is updated travis runs again. In this case as you did changes is not needed.

@TomAugspurger
Copy link
Contributor

The travis failure can be ignored.

I restarted the azure build. That failed with a pyarrow could not load DLL issue.

@thoo
Copy link
Contributor Author

thoo commented Oct 16, 2018

Thank you guys. Appreciate it!!!

@thoo thoo closed this Oct 19, 2018
@thoo thoo reopened this Oct 19, 2018
@datapythonista
Copy link
Member

@WillAyd can you take a look and merge if you're happy?

@@ -502,6 +514,33 @@ def no_punctuation(self):
return "Hello world!"


class BadExamples(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is already an example that imports both of these:

>>> import numpy as np

Any reason we don't leverage that instead of creating a new class?

thoo added 2 commits October 27, 2018 09:48
…xamples

* repo_org/master: (83 commits)
  DOC: Add docstring validations for "See Also" section (pandas-dev#23143)
  TST: Fix test assertion (pandas-dev#23357)
  BUG: Handle Period in combine (pandas-dev#23350)
  REF: SparseArray imports (pandas-dev#23329)
  CI: Migrate some CircleCI jobs to Azure (pandas-dev#22992)
  DOC: update the is_month_start/is_month_end docstring (pandas-dev#23051)
  Partialy fix issue pandas-dev#23334 - isort pandas/core/groupby directory (pandas-dev#23341)
  TST: Add base test for extensionarray setitem pandas-dev#23300 (pandas-dev#23304)
  API: Add sparse Acessor (pandas-dev#23183)
  PERF: speed up CategoricalIndex.get_loc (pandas-dev#23235)
  fix and test incorrect case in delta_to_nanoseconds (pandas-dev#23302)
  BUG: Handle Datetimelike data in DataFrame.combine (pandas-dev#23317)
  TST: re-enable gbq tests (pandas-dev#23303)
  Switched references of App veyor to azure pipelines in the contributing CI section (pandas-dev#23311)
  isort imports-io (pandas-dev#23332)
  DOC: Added a Multi Index example for the Series.sum method (pandas-dev#23279)
  REF: Make PeriodArray an ExtensionArray (pandas-dev#22862)
  DOC: Added Examples for Series max (pandas-dev#23298)
  API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option (pandas-dev#22644)
  BUG: Let MultiIndex.set_levels accept any iterable (pandas-dev#23273) (pandas-dev#23291)
  ...
@thoo
Copy link
Contributor Author

thoo commented Oct 27, 2018

@WillAyd Could you re-review this? I updated it with your suggestion.

…xamples

* repo_org/master:
  DOC: Validate that See Also section items do not contain the pandas. prefix (pandas-dev#23145)
  API/TST: make hasnans always return python booleans (pandas-dev#23349)
@thoo
Copy link
Contributor Author

thoo commented Oct 27, 2018

@WillAyd I just resolve the conflicts with the base branch and it should be ready for re-review. Thanks.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, added couple of last things, and I think we're ready to merge.

def examples_codes(self):
codes = doctest.DocTestParser().get_examples(self.raw_doc)
codes = [line.source for line in codes]
return ' '.join(codes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you return a list with each line instead? We'll reuse this property to validate pep8 too, and returning a single string with all the code joined is not useful in that case.

Also, at least to me examples_codes is somehow misleading. Can you use examples_source_code instead?

if 'import numpy' in examples_codes:
errs.append('Examples should not have `import numpy` ')
if 'import pandas' in examples_codes:
errs.append('Examples should not have `import pandas` ')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the error could be more descriptive. pandas does not need to be imported in the examples, as it's assumed to be already imported as pd. I think something like that provides more useful information to devs facing this error.

@@ -218,6 +218,18 @@ def mode(self, axis, numeric_only):
"""
pass

def good_import(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def good_import(self):
def good_imports(self):

thoo added 3 commits November 3, 2018 13:19
…xamples

* repo_org/master: (66 commits)
  CLN: doc string (pandas-dev#23469)
  DOC: Add cookbook entry for triangular correlation matrix (GH22840) (pandas-dev#23032)
  add number of Errors, Warnings to scripts/validate_docstrings.py (pandas-dev#23150)
  BUG: Allow freq conversion from dt64 to period (pandas-dev#23460)
  ENH: Add FrozenList.union and .difference (pandas-dev#23394)
  REF: cython cleanup, typing, optimizations (pandas-dev#23464)
  strictness and checks for Timedelta _simple_new (pandas-dev#23433)
  Fixing flake8 problems new to flake8 3.6.0 (pandas-dev#23472)
  DOC: Updating the docstring of Series.dot  (pandas-dev#22890)
  TST: Fixturize series/test_analytics.py (pandas-dev#22755)
  BUG/ENH: Handle NonexistentTimeError in date rounding (pandas-dev#23406)
  PERF: speed up concat on Series by making _get_axis_number() a classmethod (pandas-dev#23404)
  REF: Remove DatetimelikeArrayMixin._shallow_copy (pandas-dev#23430)
  REF: strictness/simplification in DatetimeArray/Index _simple_new (pandas-dev#23431)
  REF: cython cleanup, typing, optimizations (pandas-dev#23456)
  TST: tweak Hypothesis configuration and idioms (pandas-dev#23441)
  BUG: fix HDFStore.append with all empty strings error (GH12242) (pandas-dev#23435)
  TST: Skip 32bit failing IntervalTree tests (pandas-dev#23442)
  BUG: Deprecate nthreads argument (pandas-dev#23112)
  style: fix import format at pandas/core/reshape (pandas-dev#23387)
  ...
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for the changes.

Merging on green, but I added some final ideas in case you want to apply them, but merging anyway.

def examples_source_code(self):
codes = doctest.DocTestParser().get_examples(self.raw_doc)
codes = [line.source for line in codes]
return codes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also return in the previous line.

May be lines is a better name? codes sounds to me like identifiers

errs.append("Numpy does not need to be imported in the examples, "
"as it's assumed to be already imported as np")
if 'import pandas' in ' '.join(examples_source_code):
errs.append("Pandas does not need to be imported in the examples, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't usually use capitalized pandas, even at the beginning of sentence. And in this case, as it's more the name of the module than the project, I'd use lower case also for numpy.

@@ -531,6 +537,13 @@ def validate_one(func_name):
examples_errs = doc.examples_errors
if examples_errs:
errs.append('Examples do not pass tests')
examples_source_code = doc.examples_source_code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be you can do the join here, so doesn't need to be repeated? also, is the lines end with \n I'd join by '', otherwise by '\n'. Doesn't change the result, but conceptually the content of the variable is something that makes more sense.

@thoo
Copy link
Contributor Author

thoo commented Nov 3, 2018

@datapythonista Thanks for the review. I update the files.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I meant in the comment about naming lines and returning (your changes make sense, but just partially, based on my comment)

@@ -402,6 +402,12 @@ def examples_errors(self):
error_msgs += f.getvalue()
return error_msgs

@property
def examples_source_code(self):
codes = doctest.DocTestParser().get_examples(self.raw_doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
codes = doctest.DocTestParser().get_examples(self.raw_doc)
lines = doctest.DocTestParser().get_examples(self.raw_doc)

@property
def examples_source_code(self):
codes = doctest.DocTestParser().get_examples(self.raw_doc)
lines = [line.source for line in codes]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lines = [line.source for line in codes]
return [line.source for line in lines]

def examples_source_code(self):
codes = doctest.DocTestParser().get_examples(self.raw_doc)
lines = [line.source for line in codes]
return lines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return lines

@datapythonista
Copy link
Member

Thanks once more @thoo, will merge on green.

@TomAugspurger TomAugspurger merged commit 047242b into pandas-dev:master Nov 4, 2018
@TomAugspurger
Copy link
Contributor

Thanks!

@thoo thoo deleted the check_np_pd_fromExamples branch January 2, 2019 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants