Skip to content

Linting and PEP8 compliance #11928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
wesm opened this issue Dec 30, 2015 · 35 comments · Fixed by #12208
Closed
1 task

Linting and PEP8 compliance #11928

wesm opened this issue Dec 30, 2015 · 35 comments · Fixed by #12208
Labels
Code Style Code style, linting, code_checks
Milestone

Comments

@wesm
Copy link
Member

wesm commented Dec 30, 2015

  • update docs with tools to use & that Travis is checking

I have been away from the codebase for a while and ouch, my eyes. I have used pyflakes or flake8 assiduously for Python development for a long time now and it seems that conventional style isn't being followed in the codebase:

$ flake8 pandas/core | wc -l
2047

It's a big codebase so adding a pre-commit hook might be too onerous, but can we get flake8 back to zero (with some well-placed # noqa marks on unintentionally-unused-imports)?

Secondly, perhaps we should add a Readability or Style label for these kinds of issues?

@jorisvandenbossche
Copy link
Member

See also #8317 and #6248

So there has been some discussion before, and in principle we ask of PRs to be PEP8 compatible, only we do not really enforce it in practice.
To better enforce it on new code, something like checking for this in the test suite (eg matplotlib and scipy do this) is probably a better idea than manual reviewing.

@wesm
Copy link
Member Author

wesm commented Dec 30, 2015

Okay. IMHO the code smell has reached pretty intense levels. I suggest we engage in a cleanup to get flake8 passing cleanly. I see concerns in #6248 that it will break PRs and git blame, and this is true, but it's the price we pay for having not been more fastidious on an on going basis. Let's please stick to 80 characters per line.

@jreback jreback added Difficulty Novice Code Style Code style, linting, code_checks labels Dec 30, 2015
@jreback jreback added this to the 0.18.0 milestone Dec 30, 2015
@jreback
Copy link
Contributor

jreback commented Dec 30, 2015

So prob a lot my fault. I tend to use long lines. :)

[jreback-~/pandas] flake8 pandas/core/|wc
    1922   15211  134394
[jreback-~/pandas] flake8 pandas/core/ | grep 'line too long'|wc
     924    8316   66073
[jreback-~/pandas] flake8 pandas/core/ | grep 'missing whitespace' | wc
     351    2106   23052
[jreback-~/pandas] flake8 pandas/core/ | grep 'never used' | wc
      24     264    2135
[jreback-~/pandas] flake8 pandas/core/ | grep 'expected 2 ' | wc
     131    1048    8789

@max-sixty
Copy link
Contributor

How about hooking a check into GitHub? A service such as Landscape or Code Climate. Many of these have a broader scope but could be restricted to checking style only.

If that were enforced for new commits, the library would get there over time...

@wesm
Copy link
Member Author

wesm commented Jan 2, 2016

I would be strongly +1 on adding a flake8 check to our Travis CI builds to make linting errors be failed builds. Any thoughts on how to proceed with the cleanup?

@shoyer
Copy link
Member

shoyer commented Jan 2, 2016

I have not been impressed by the reliability of Landscape as a CI system.

That said, I'm +1 for giving this a try and taking a more stringent approach going forward. This should give me a little more leverage when reviewing @jreback's long lines with missing whitespace :).

@jreback
Copy link
Contributor

jreback commented Jan 2, 2016

hah, this turned out to be easy to fix in for example this PR: #11892

@wesm
Copy link
Member Author

wesm commented Jan 2, 2016

Perhaps to get our training wheels let's fix the style in pandas/core and add a flake8 pandas/core to Travis and see how it goes?

@jreback
Copy link
Contributor

jreback commented Jan 2, 2016

okie dokie (always wanted to write that)!

@wesm
Copy link
Member Author

wesm commented Jan 2, 2016

I'm not sure what's the best practice for using flake8 in CI; I've used prospector but it combines pylint with pep8 which is a little too draconian (pylint gets grumpy about a host of things).

@wesm
Copy link
Member Author

wesm commented Jan 2, 2016

prospector supports pyflakes and pep8, here's an example config that runs pep8 only, probably not hard to add pyflakes to it

https://github.com/cloudera/ibis/blob/master/.landscape.yaml

@scari
Copy link
Contributor

scari commented Jan 2, 2016

I would strongly support the way you're about to go and will do follow the rule for my further contribution. but I have a tiny concern on it. Fixing the style in pandas/core would generates huge diffs in one commit and make it difficult to take a look into changes in commit history. I just want to know that how you guys thinking about this?

@jreback
Copy link
Contributor

jreback commented Jan 2, 2016

#11941 will now report the linting as a separate section on the travis builds, won't fail them yet. But as soon as we are done conversions (and after pandas/core, can do other sections), can simply flip the switch.

I am now going to be doing a check on PR's and promoting: git diff master | flake --diff.

@jorisvandenbossche
Copy link
Member

It was one thing to have code that follows PEP8 (and will pass the flake8 CI test), but this does not always say something about the code style itself.

Eg the discussion about using parentheses instead of \ (although this is actually quite explicit in PEP8), see #11945 (comment) and #11951 (comment). Or other possible discussion about how to do hanging idents, ..

What are the opinions about using some tool for that as well? Eg https://github.com/google/yapf

@wesm
Copy link
Member Author

wesm commented Jan 4, 2016

Might be worth exploring a tool if it will do what we need. I know I have some style preferences that are inconsistent with PEP8.

@jorisvandenbossche
Copy link
Member

The motivation from https://github.com/google/yapf:

The idea is also similar to the 'gofmt' tool for the Go programming language: end all holy wars about formatting - if the whole code base of a project is simply piped through YAPF whenever modifications are made, the style remains consistent throughout the project and there's no point arguing about style in every code review.

And there is some configuration you can adapt.

@jorisvandenbossche
Copy link
Member

As an example of the output of YAPF, I quickly ran it on the same files as @rockg adapted manually or with autopep8 in #11951 => #11956

@wesm
Copy link
Member Author

wesm commented Jan 6, 2016

How are we doing here? Can I help?

@rockg
Copy link
Contributor

rockg commented Jan 6, 2016

I'm personally waiting for some consensus. I vote for using YAPF with some manual modifications. In particular the error strings @wesm mentioned (was the example in the PR pandas preferred style?) along with some weird parentheses splitting that I find weird. If this is agreeable, then I can do the other files this way. I did not like the output of autopep8 versus what @jorisvandenbossche was able to produce via YAPF.

@jreback
Copy link
Contributor

jreback commented Jan 6, 2016

@rockg ok, why don't you reformat the PR #11951 with YAPF / manual changes as need, and we'll close #11956 and can proceed?

@wesm
Copy link
Member Author

wesm commented Jan 6, 2016

+1

@wesm
Copy link
Member Author

wesm commented Jan 16, 2016

Let me know when I can proceed with more code reorg and test refactoring. It would be nice if we could clean up pandas/tests since I fixed all the style issues manually (with a touch of autopep8 for the each stuff) in the test_frame break up.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2016

I think ok to proceed as you see fit. other than collisions with @rockg general cleanup & a couple of larger PR's (the resample one, but that only touches a couple of files), and @shoyer IntervalIndex. its not a big deal to rebase.

@wesm
Copy link
Member Author

wesm commented Jan 16, 2016

OK, I will start working on pandas/tests and pandas/tseries/tests

@wesm
Copy link
Member Author

wesm commented Jan 18, 2016

I'm going to take a crack at pandas/tseries. It would be really great if we can get flake8 warning == build failure by end of the month.

@wesm
Copy link
Member Author

wesm commented Jan 18, 2016

The ugliest subpackage left is pandas/io. Takers? I can take pandas/tools

@rockg
Copy link
Contributor

rockg commented Jan 18, 2016

One thing that we need to think about is how to prevent yaps from removing certain style choices. For example, some formatting choices yapf makes are not very nice looking and I have manually fixed them. However, if yapf is subsequently run on these files, that bad formatting will be returned. There is a way to disable yapf from formatting sections of code, but I'm not too happy about having to go back and add this. Perhaps this can be considered a one-time process and understood flake8 will catch subsequent issues? In order to make yapf work the way we want, changes will need to be made to it.

@wesm
Copy link
Member Author

wesm commented Jan 18, 2016

I have to say: I really don't like yapf from my trying to use it and have ended up doing most of the work manually in Emacs. I don't think we should use it after we are done here on this first round.

@rockg
Copy link
Contributor

rockg commented Jan 18, 2016

I usually yapf it and then go through the file manually and then do a final check with a diff. I agree with your suggestion of not using it for subsequent style cleanups.

@wesm
Copy link
Member Author

wesm commented Jan 21, 2016

we're at the home stretch

08:34 ~/code/pandas  (master)$ flake8 pandas | wc -l
986
08:37 ~/code/pandas  (master)$ flake8 pandas/stats/ | wc -l
170
08:38 ~/code/pandas  (master)$ flake8 pandas/computation/ | wc -l
82
08:39 ~/code/pandas  (master)$ flake8 pandas/core/ | wc -l
196
08:39 ~/code/pandas  (master)$ flake8 pandas/sparse/ | wc -l
191

@rockg
Copy link
Contributor

rockg commented Jan 22, 2016

I'll do pandas/sparse.

@wesm
Copy link
Member Author

wesm commented Jan 24, 2016

OK, we are down to 285 warnings overall. 196 of these are pandas/core/groupby.py. As soon as #11841 is merged let's get to zero and then make flake8 warnings fail the build?

@jreback
Copy link
Contributor

jreback commented Jan 24, 2016

sounds good

@jreback
Copy link
Contributor

jreback commented Feb 3, 2016

ok, linter is now enabled on Travis-CI!

it will run tests, then check for any failures, which will fail the build.

@wesm
Copy link
Member Author

wesm commented Feb 3, 2016

Woo, thanks @jreback @rockg and all who worked on this for the huge effort. Clean code makes productive teams! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants