Skip to content

CLN: Condense PR style checklist into one script #30773

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Jan 7, 2020

A script is easier to execute and manage as we manage our style-checking tools.

Started with flake8, black, and isort, as those are the main ones for Python-related changes (for reference, we didn't even have isort in the checklist beforehand).

If we're happy with the structure, we can always add more OR modify checks if we want going forward while keeping it easy for folks to check their PR's.

@gfyoung gfyoung added the Clean label Jan 7, 2020
@gfyoung gfyoung added this to the 1.0 milestone Jan 7, 2020
@gfyoung gfyoung force-pushed the easy-pr-check-script branch from 5b88c4e to d3efe65 Compare January 7, 2020 07:23
@datapythonista
Copy link
Member

I like the idea, but shouldn't we reuse ci/code_checks.sh? The new script feels like a subset of it. If we don't want all the checks there, we can probably create a new category to run only what we want (but I think running all the checks is probably ok).

@gfyoung
Copy link
Member Author

gfyoung commented Jan 7, 2020

Not quite a subset. As the echo messages indicate, the way we're running them in the script allows the users to fix mistakes simultaneously checks for black and isort.

code_checks.sh, however, is purely for correctness (i.e. you wouldn't get the auto-correct) and contains a ton more checks (e.g. doctests, cpplint) that IMO are probably not necessary on average.

I'm not necessarily against combining in the future, but given the differences in what this script does vs. code_checks, I think that's a little more acrobatics than I would want to do for this PR.

@datapythonista
Copy link
Member

I see, I misunderstood that. Makes sense then. But may be the name *_checks is a bit misleading if the script is actually performing a the changes to the code. Do you think it makes sense to rename it to something like fix_code_style or something similar?

@gfyoung
Copy link
Member Author

gfyoung commented Jan 7, 2020

Do you think it makes sense to rename it to something like fix_code_style or something similar?

I get your point. It's a hybrid though, as flake8 does not actually fix your errors, only lists them out.

@datapythonista
Copy link
Member

Not a big deal, but I think it's better to expect a script to change code, and that it possibly doesn't (even black and isort won't in many cases). Than have a script to check your code, that changes it unexpectedly. I don't even like that we need to use black and isort to change automatically our code. Having a script that does that in a somehow (unintentionally) sneaky way would be really annoying to me.

@gfyoung
Copy link
Member Author

gfyoung commented Jan 7, 2020

Not a big deal, but I think it's better to expect a script to change code, and that it possibly doesn't (even black and isort won't in many cases). Than have a script to check your code, that changes it unexpectedly.

True (though the echo messages mitigate the surprise I might argue).

How about quick_style_check_and_fix.sh then (to reflect that we're doing both)?

@datapythonista
Copy link
Member

though the echo messages mitigate the surprise

Imagine a case where you didn't commit the changes, you run the checks to see if there is anything you missed, and then you realize that the checks made black actually destroyed all your styles. The surprise would still be there. ;)

Anyway, quick_style_check_and_fix.sh sounds good, and without the quick too to make it a bit shorter.

In a follow up I may check if it's easy to display different messages depending on whether changes need to be committed or not, so users of the script don't need to read and think too much.

@TomAugspurger
Copy link
Contributor

@gfyoung are you using pre-commit? This script duplicates what it does.

@gfyoung
Copy link
Member Author

gfyoung commented Jan 7, 2020

@TomAugspurger : I don't actually use it because the mypy check takes some time to run on my machine. You do raise a good point that there is overlap due to the other checks.

Unless we decide to make it a required thing on the PR checklist to use pre-commit, I think the script can allow us to be more comprehensive with the checklist without having to enumerate all of the different checks we run (even pre-commit is missing at least one: cpplint)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 7, 2020 via email

@gfyoung
Copy link
Member Author

gfyoung commented Jan 7, 2020

cc @pandas-dev/pandas-core : Thoughts?

The question is whether we want people to go in on using pre-commit and thereby remove the redundancy we have with the PR checklist altogether. My only concern with pre-commit (as is) is that it can slow down development (see #30773 (comment)), especially if the changes are small.

Alternatively, we condense the lint checkboxes into a single script as I have done here. The downside is that we still have some redundancy in maintaining linting checks.

@TomAugspurger
Copy link
Contributor

My only concern with pre-commit (as is) is that it can slow down development (see #30773 (comment)), especially if the changes are small.

If we're concerned about the speed of the pre-commit hooks, we can discuss which items we want included by default. I don't have a strong opinion. Perhaps @xhochy can share his thoughts on he thought including mypy was a good idea.

@xhochy
Copy link
Contributor

xhochy commented Jan 7, 2020

I would also strongly argue that people should just pre-commit instead of an additional script as it takes care of a lot of pitfalls one can run into. Normally I have all static checks in pre-commit add its an easy-to-setup way to ensure that people don't push code that will then fail on public CI but still consume resources there.

I first also thought that mypy may be too slow but with a pre-filled cache, it is running really fast for me. Could be that it is only fast due my hardware and the cache is not fast enough for others, then it is probably better to remove mypy again until we get a better incremental mode for to work. People not using pre-commit because of this would be even worse.

@gfyoung
Copy link
Member Author

gfyoung commented Jan 7, 2020

Could be that it is only fast due my hardware and the cache is not fast enough for others, then it is probably better to remove mypy again until we get a better incremental mode for to work.

That would be a reasonable compromise.

@TomAugspurger
Copy link
Contributor

Anecdotally, I'm getting used to mypy running after a day or two, but it slightly annoyed me at first 😄. The initial check after changing branches can feel slow, but subsequent ones are much faster. So I'm happy either way.

@gfyoung
Copy link
Member Author

gfyoung commented Jan 7, 2020

So I'm happy either way.

Sounds good. I'll open a new PR to move in this new direction then.

@gfyoung gfyoung force-pushed the easy-pr-check-script branch from d3efe65 to a483da4 Compare January 8, 2020 06:21
@xhochy
Copy link
Contributor

xhochy commented Jan 8, 2020

@gfyoung @TomAugspurger Can you give me a rough number how much time the mypy check with a pre-filled cache took for you?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 8, 2020

I have to say I have been quite annoyed the last days with the slowness of the commit hook now mypy was added. Is the caching normally working out of the box, or do I need to do something for that? But I have been switching a lot between branches, so that might a reason to be more annoyed.

The mypy step in the pre-commit hook on a fresh branch takes around 20s for me.

@jorisvandenbossche
Copy link
Member

Also a second time on the same branch, it still takes a long time (around 20s), so even with caching, it is still annoyingly slow for committing IMO

@xhochy
Copy link
Contributor

xhochy commented Jan 8, 2020

Removed it over at #30811

For me the timings are 18s without and 0.6s with a cache, so it seemed appropriate to me. I wasn't aware that the impact could be this big on different setups.

@jorisvandenbossche
Copy link
Member

Or my cache is not working properly for some reason? Although I see a .mypy_cache directory in the pandas repo, with files changed at the moment I did the pre-commit tests I mentioned above.

@xhochy
Copy link
Contributor

xhochy commented Jan 8, 2020

Or my cache is not working properly for some reason? Although I see a .mypy_cache directory in the pandas repo, with files changed at the moment I did the pre-commit tests I mentioned above.

In my case only @plugins_snapshot.json gets changed, all other files stay the same.

@gfyoung
Copy link
Member Author

gfyoung commented Jan 8, 2020

Can you give me a rough number how much time the mypy check with a pre-filled cache took for you?

Similar to what @jorisvandenbossche gave.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jan 8, 2020 via email

@jorisvandenbossche jorisvandenbossche removed this from the 1.0 milestone Jan 8, 2020
@jbrockmendel
Copy link
Member

my mypy runtimes are closer to Joris's

@gfyoung gfyoung added this to the No action milestone Jan 8, 2020
@gfyoung
Copy link
Member Author

gfyoung commented Jan 8, 2020

Closing this so we can focus on #30814

@gfyoung gfyoung closed this Jan 8, 2020
@gfyoung gfyoung deleted the easy-pr-check-script branch January 8, 2020 22:35
gfyoung added a commit that referenced this pull request Jan 10, 2020
Previously, we stated it as merely optional

xref:

#30773
#30814
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants