Skip to content

STYLE: Fixing and refactoring linting #22863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Oct 9, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
1811954
Refactoring lint.sh, wrong path pandas/src corrected, and made more c…
datapythonista Sep 27, 2018
6ace7ef
fixing linting errors in pxi.in files (linting was broken)
datapythonista Sep 28, 2018
a28a612
unifying the flake8 error codes that are being ignored
datapythonista Sep 28, 2018
af76f11
Restoring flake errors C406, C408 and C409
datapythonista Sep 28, 2018
07ebcd0
Unpinning flake8, to see if linting works with the latest version
datapythonista Sep 28, 2018
34e85f0
Synchronizing flake8 config in pep8speaks
datapythonista Sep 29, 2018
adcda47
Unifying all flake8 of .py files in one command, and moving excludes …
datapythonista Sep 29, 2018
c129197
Printing flake8 version, and moving doctests and check_import to lint.sh
datapythonista Sep 29, 2018
803b217
Moving linting from py2.7 to py3.6
datapythonista Sep 29, 2018
62a8813
Moving flake8/cpplint requirements to 3.6, as this is now where the l…
datapythonista Sep 29, 2018
720f18c
Merge remote-tracking branch 'upstream/master' into lint_refactor
datapythonista Sep 29, 2018
cabe353
lint.sh renamed to code_checks.sh, and parameters now supported to ca…
datapythonista Oct 1, 2018
9a76f58
Fixing linting script name in travis
datapythonista Oct 1, 2018
f4e17ad
Merging from master
datapythonista Oct 4, 2018
56eca8f
merging from master
datapythonista Oct 9, 2018
879157f
Merge remote-tracking branch 'upstream/master' into lint_refactor
datapythonista Oct 9, 2018
0ec16fe
Adding documentation to the new linting script
datapythonista Oct 9, 2018
cd38de3
Fix linting error in the code, and added more information for when LI…
datapythonista Oct 9, 2018
8738e80
Fixing formatting in the documentation
datapythonista Oct 9, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions .pep8speaks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,18 @@
scanner:
diff_only: True # If True, errors caused by only the patch are shown

# Opened issue in pep8speaks, so we can directly use the config in setup.cfg
# (and avoid having to duplicate it here):
# https://github.com/OrkoHunter/pep8speaks/issues/95

pycodestyle:
max-line-length: 79
ignore: # Errors and warnings to ignore
ignore:
- W503, # line break before binary operator
- E402, # module level import not at top of file
- E722, # do not use bare except
- E731, # do not assign a lambda expression, use a def
- W503 # line break before binary operator
- E741, # ambiguous variable name 'l'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, this seems like a good change to me! Quick question I recently actually enforced E741 in pep8speaks config, #22795. Hence this error will be flagged on any new PRs which is the behaviour we want? I was going to follow up remove any final instances of this in codebase. See #22122

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now is important to have consistency between the CI and pep8speaks, and among all the different files. I don't have a strong opinion about E741, or the errors discussed in #22122, in general I think the less exceptions we have, the better.

- C406, # Unnecessary list literal - rewrite as a dict literal.
- C408, # Unnecessary dict call - rewrite as a literal.
- C409 # Unnecessary list passed to tuple() - rewrite as a tuple literal.
11 changes: 3 additions & 8 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,14 @@ matrix:
- language-pack-zh-hans
- dist: trusty
env:
- JOB="2.7, lint" ENV_FILE="ci/travis-27.yaml" TEST_ARGS="--skip-slow" LINT=true
- JOB="2.7" ENV_FILE="ci/travis-27.yaml" TEST_ARGS="--skip-slow"
addons:
apt:
packages:
- python-gtk2
- dist: trusty
env:
- JOB="3.6, coverage" ENV_FILE="ci/travis-36.yaml" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" COVERAGE=true DOCTEST=true

- JOB="3.6, lint, coverage" ENV_FILE="ci/travis-36.yaml" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate" COVERAGE=true LINT=true
- dist: trusty
env:
- JOB="3.7, NumPy dev" ENV_FILE="ci/travis-37-numpydev.yaml" TEST_ARGS="--skip-slow --skip-network -W error" PANDAS_TESTING_MODE="deprecate"
Expand Down Expand Up @@ -109,11 +108,7 @@ script:
- ci/run_build_docs.sh
- ci/script_single.sh
- ci/script_multi.sh
- ci/lint.sh
- ci/doctests.sh
- echo "checking imports"
- source activate pandas && python ci/check_imports.py
- echo "script done"
- ci/code_checks.sh

after_success:
- ci/upload_coverage.sh
Expand Down
37 changes: 0 additions & 37 deletions ci/check_imports.py

This file was deleted.

145 changes: 145 additions & 0 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#!/bin/bash
#
# Run checks related to code quality.
#
# This script is intended for both the CI and to check locally that code standards are
# respected. We are currently linting (PEP-8 and similar), looking for patterns of
# common mistakes (sphinx directives with missing blank lines, old style classes,
# unwanted imports...), and we also run doctests here (currently some files only).
# In the future we may want to add the validation of docstrings and other checks here.
#
# Usage:
# $ ./ci/code_checks.sh # run all checks
# $ ./ci/code_checks.sh lint # run linting only
# $ ./ci/code_checks.sh patterns # check for patterns that should not exist
# $ ./ci/code_checks.sh doctests # run doctests

echo "inside $0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some docs here on how to run things, maybe also in the contributing docs themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that makes a lot of sense, thank, didn't think about that before. Updated the PR

[[ $LINT ]] || { echo "NOT Linting. To lint use: LINT=true $0 $1"; exit 0; }
[[ -z "$1" || "$1" == "lint" || "$1" == "patterns" || "$1" == "doctests" ]] || { echo "Unkown command $1. Usage: $0 [lint|patterns|doctests]"; exit 9999; }

source activate pandas
RET=0
CHECK=$1


### LINTING ###
if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then

# `setup.cfg` contains the list of error codes that are being ignored in flake8

echo "flake8 --version"
flake8 --version

# pandas/_libs/src is C code, so no need to search there.
MSG='Linting .py code' ; echo $MSG
flake8 .
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Linting .pyx code' ; echo $MSG
flake8 pandas --filename=*.pyx --select=E501,E302,E203,E111,E114,E221,E303,E128,E231,E126,E265,E305,E301,E127,E261,E271,E129,W291,E222,E241,E123,F403,C400,C401,C402,C403,C404,C405,C406,C407,C408,C409,C410,C411
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Linting .pxd and .pxi.in' ; echo $MSG
flake8 pandas/_libs --filename=*.pxi.in,*.pxd --select=E501,E302,E203,E111,E114,E221,E303,E231,E126,F403
RET=$(($RET + $?)) ; echo $MSG "DONE"

# readability/casting: Warnings about C casting instead of C++ casting
# runtime/int: Warnings about using C number types instead of C++ ones
# build/include_subdir: Warnings about prefacing included header files with directory

# We don't lint all C files because we don't want to lint any that are built
# from Cython files nor do we want to lint C files that we didn't modify for
# this particular codebase (e.g. src/headers, src/klib, src/msgpack). However,
# we can lint all header files since they aren't "generated" like C files are.
MSG='Linting .c and .h' ; echo $MSG
cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi

### PATTERNS ###
if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then

# Check for imports from pandas.core.common instead of `import pandas.core.common as com`
MSG='Check for non-standard imports' ; echo $MSG
! grep -R --include="*.py*" -E "from pandas.core.common import " pandas
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Check for pytest warns' ; echo $MSG
! grep -r -E --include '*.py' 'pytest\.warns' pandas/tests/
RET=$(($RET + $?)) ; echo $MSG "DONE"

# Check for the following code in testing: `np.testing` and `np.array_equal`
MSG='Check for invalid testing' ; echo $MSG
! grep -r -E --include '*.py' --exclude testing.py '(numpy|np)(\.testing|\.array_equal)' pandas/tests/
RET=$(($RET + $?)) ; echo $MSG "DONE"

# Check for the following code in the extension array base tests: `tm.assert_frame_equal` and `tm.assert_series_equal`
MSG='Check for invalid EA testing' ; echo $MSG
! grep -r -E --include '*.py' --exclude base.py 'tm.assert_(series|frame)_equal' pandas/tests/extension/base
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Check for deprecated messages without sphinx directive' ; echo $MSG
! grep -R --include="*.py" --include="*.pyx" -E "(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)" pandas
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Check for old-style classes' ; echo $MSG
! grep -R --include="*.py" -E "class\s\S*[^)]:" pandas scripts
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Check for backticks incorrectly rendering because of missing spaces' ; echo $MSG
! grep -R --include="*.rst" -E "[a-zA-Z0-9]\`\`?[a-zA-Z0-9]" doc/source/
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Check for incorrect sphinx directives' ; echo $MSG
! grep -R --include="*.py" --include="*.pyx" --include="*.rst" -E "\.\. (autosummary|contents|currentmodule|deprecated|function|image|important|include|ipython|literalinclude|math|module|note|raw|seealso|toctree|versionadded|versionchanged|warning):[^:]" ./pandas ./doc/source
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Check for modules that pandas should not import' ; echo $MSG
python -c "
import sys
import pandas

blacklist = {'bs4', 'gcsfs', 'html5lib', 'ipython', 'jinja2' 'hypothesis',
'lxml', 'numexpr', 'openpyxl', 'py', 'pytest', 's3fs', 'scipy',
'tables', 'xlrd', 'xlsxwriter', 'xlwt'}
mods = blacklist & set(m.split('.')[0] for m in sys.modules)
if mods:
sys.stderr.write('pandas should not import: {}\n'.format(', '.join(mods)))
sys.exit(len(mods))
"
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi

### DOCTESTS ###
if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then

MSG='Doctests frame.py' ; echo $MSG
pytest --doctest-modules -v pandas/core/frame.py \
-k"-axes -combine -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata"
RET=$(($RET + $?)) ; echo $MSG "DONE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a response on splitting this file up into 3 separate files, you can make a sub-dir fi that is helpful and have a top-level script to call them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned above. What I did is make the script able to run for a specific section. That keeps the advantage of simplicity, and addresses your concerns of being able to run just part of it in Makefile or manually. These are the calls:

  • ./ci/code_checks.sh lint <- flake8 and cpplint
  • ./ci/code_checks.sh patterns <- greps
  • ./ci/code_checks.sh doctests <- doctests


MSG='Doctests series.py' ; echo $MSG
pytest --doctest-modules -v pandas/core/series.py \
-k"-nonzero -reindex -searchsorted -to_dict"
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Doctests generic.py' ; echo $MSG
pytest --doctest-modules -v pandas/core/generic.py \
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -resample -to_json -transpose -values -xs"
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Doctests top-level reshaping functions' ; echo $MSG
pytest --doctest-modules -v \
pandas/core/reshape/concat.py \
pandas/core/reshape/pivot.py \
pandas/core/reshape/reshape.py \
pandas/core/reshape/tile.py \
-k"-crosstab -pivot_table -cut"
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi

exit $RET
60 changes: 0 additions & 60 deletions ci/doctests.sh

This file was deleted.

Loading