Skip to content

BLD: Use Azure pipelines for linting #22844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datapythonista opened this issue Sep 26, 2018 · 8 comments · Fixed by #22854
Closed

BLD: Use Azure pipelines for linting #22844

datapythonista opened this issue Sep 26, 2018 · 8 comments · Fixed by #22854
Labels
Build Library building on various platforms CI Continuous Integration good first issue Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@datapythonista
Copy link
Member

At the moment we have the script ci/lint.sh for validating PEP-8 and other linting. The script has several validations in the form of:

   echo "Linting *.py"
    flake8 pandas --filename=*.py --exclude pandas/_libs/src --ignore=C406,C408,C409,E402,E731,E741,W503
    if [ $? -ne "0" ]; then
        RET=1
    fi

If I'm not wrong, moving all those to the azure-pipelines configuration would have two advantages.

First, the code would be simpler, as all the bash stuff would be automatically managed, and we'ld simply have:

  - script: flake8 pandas --filename=*.py --exclude pandas/_libs/src --ignore=C406,C408,C409,E402,E731,E741,W503
displayName: 'Linting *.py'

Second is that when one of those fails, it'll be faster to see which is the reason for the fail in the azure pipeline dashboard, instead of having to go into the details of the travis log.

Does this make sense? @jreback @TomAugspurger @chrisrpatterson @jeremyepling

@datapythonista datapythonista added Build Library building on various platforms CI Continuous Integration Effort Low Needs Discussion Requires discussion from core team before further action good first issue labels Sep 26, 2018
@TomAugspurger
Copy link
Contributor

First, the code would be simpler, as all the bash stuff would be automatically managed

I think we'll still need something similar to what's currently in lint.sh, since we have custom checks that aren't part of flake8.

I agree we should do something to make seeing lint failures easier for users. I'm not sure whether this needs to be done on azure or not. I haven't played with more complex pipelines or reporting, beyond the basic "here's a file of junit XML results".

@datapythonista
Copy link
Member Author

I was having a look, and I think for the ones that are a single command (flake8, cpplint, grep...) should be just as the example in the description.

If I'm not wrong, the only other case that we've got are loops. I guess it's possible to have the whole loop as the script, but that wouldn't show directly which file failed (it'd probably be an improvement, even if to know the exact file we need to check the log). But I think it'd be even better, if azure provides a way to create a separate item for the values of the loop. Not sure if that's doable or how complex would the configuration. This is a case:

    echo "Linting *.c and *.h"
    for path in '*.h' 'parser' 'ujson'
    do
        echo "linting -> pandas/_libs/src/$path"
        cpplint --quiet --extensions=c,h --headers=h --filter=-readability/casting,-runtime/int,-build/include_subdir --recursive pandas/_libs/src/$path
        if [ $? -ne "0" ]; then
            RET=1
        fi
    done

@chrispat
Copy link

@datapythonista we have a model for highlighting issues in the log and in the summary view that also flows back to checks.

You can echo out
##vso[task.logissue type=error;sourcepath=consoleapp/main.cs;linenumber=1;columnnumber=1;code=100;]this is an error

And that will get a treatment on the summary view that will include a link to the sourcefile and line number where the issues are found. For more details see https://github.com/Microsoft/vsts-tasks/blob/master/docs/authoring/commands.md.

@datapythonista
Copy link
Member Author

datapythonista commented Sep 29, 2018

Thanks for the info @chrisrpatterson. That would be really nice to have. But I don't see how can I get this with flake8, which is generating the messages. The output we've got from the command is something like this:

stat_ops.py:110:5: F811 redefinition of unused 'setup' from line 4
strings.py:95:9: E731 do not assign a lambda expression, use a def
timeseries.py:1:1: F401 'warnings' imported but unused

Is there a "standards" way in Azure to change the format? We could do a script that changes from one format to another, and call flake8 . | azure_log_flake8_formatter, but seems suboptimal that every project has to implement the azure_log_flake8_formatter.

@zooba suggested to start the lines with ##[error], which has the same problem (except that it's easier to implement with awk: flake8 . | awk '{print "##[error] " $0}', but that looses the exit status and would make the build succeed), so I'm not sure if I'm missing something here.

@datapythonista
Copy link
Member Author

datapythonista commented Sep 30, 2018

ok, in a first check to flake8 parameters I didn't see the format could be specified, but I double checked, and everything makes sense now: flake8 --format="##vso[task.logissue type=error;sourcepath=%(path)s;linenumber=%(row)s;columnnumber=%(col)s;code=%(code)s;]%(text)s"

Will give it a try.
`

@datapythonista
Copy link
Member Author

Made some progress on this, and managed to make flake8 generate the correct output:

https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=437&view=logs

It really improves the clarity of the errors (and even much more compared to our current solution, in the travis log). But still some issues:

  • cpplint doesn't seem to allow custom output (it has a --output parameter that can be emacs, vs7, eclipse... but not a custom one)
  • The number of errors reported by azure is wrong. We've got like 50 errors in the first failing step if we enter the log, but in the preview there are 11, and it says we've got just 11. Weird
  • The link is not created. I see there is a ##vso[task.setendpoint]value option in https://github.com/Microsoft/vsts-tasks/blob/master/docs/authoring/commands.md, but it's unclear how this needs to be used.

We probably don't want to change the flake8 format in setup.cfg (would change it when flake8 is run locally too). But in any case, doesn't seem to work, as the %(path)s are rendered by ConfigParser, and it fails to install flake8 in azure.

@chrispat
Copy link

chrispat commented Oct 1, 2018

We have been having some discussions about other ways to make this sort of parsing a bit more generic but nothing has come to us yet. However, similar to test results there are probably a relatively small number of regex based formats we could build in support for recognizing. The other option might be to have a generic regex you could specify as part of the job or even the step that would be used for parsing out issues.

@gfyoung
Copy link
Member

gfyoung commented Dec 3, 2018

it'll be faster to see which is the reason for the fail in the azure pipeline dashboard, instead of having to go into the details of the travis log.

FYI @pandas-dev/pandas-core : the reason why we can't view the pipeline dashboard for Travis is because we don't have the Travis GitHub app enabled on this repository.

Unfortunately, there is no way to migrate from travis-ci.org to travis-ci.com at the moment, which is pre-requisite for using the GitHub app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms CI Continuous Integration good first issue Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants