-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Extend check namespace usage #38347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend check namespace usage #38347
Conversation
I hadn't thought of that...not entirely sure what the best way to go about this is but I'll get back to you on this |
Hey @kevinetienne Sorry this taken me a bit to get back to you on. I've looked into it further and a few suggestions:
I think that applying the following diff should get us a good way there: diff --git a/scripts/check_for_inconsistent_pandas_namespace.py b/scripts/check_for_inconsistent_pandas_namespace.py
index b213d931e7..908a056fdc 100644
--- a/scripts/check_for_inconsistent_pandas_namespace.py
+++ b/scripts/check_for_inconsistent_pandas_namespace.py
@@ -16,20 +16,20 @@ from typing import Optional, Sequence
PATTERN = r"""
(
- (?<!pd\.)(?<!\w) # check class_name doesn't start with pd. or character
- ([A-Z]\w+)\( # match DataFrame but not pd.DataFrame or tm.makeDataFrame
- .* # match anything
- pd\.\2\( # only match e.g. pd.DataFrame
+ (?<![\.\w]) # check class_name doesn't start with . or character
+ (\w+)\( # match DataFrame but not pd.DataFrame or tm.makeDataFrame
+ .* # match anything
+ pd\.\2\( # only match e.g. pd.DataFrame
)|
(
- pd\.([A-Z]\w+)\( # only match e.g. pd.DataFrame
- .* # match anything
- (?<!pd\.)(?<!\w) # check class_name doesn't start with pd. or character
- \4\( # match DataFrame but not pd.DataFrame or tm.makeDataFrame
+ pd\.(\w+)\( # only match e.g. pd.DataFrame
+ .* # match anything
+ (?<![\.\w]) # check class_name doesn't start with . or character
+ \4\( # match DataFrame but not pd.DataFrame or tm.makeDataFrame
)
"""
ERROR_MESSAGE = "Found both `pd.{class_name}` and `{class_name}` in {path}"
-
+BLOCK_LIST = {'eval'}
def main(argv: Optional[Sequence[str]] = None) -> None:
parser = argparse.ArgumentParser()
@@ -45,11 +45,11 @@ def main(argv: Optional[Sequence[str]] = None) -> None:
match = pattern.search(contents)
if match is None:
continue
- if match.group(2) is not None:
+ if match.group(2) is not None and match.group(2).decode() not in BLOCK_LIST:
raise AssertionError(
ERROR_MESSAGE.format(class_name=match.group(2).decode(), path=str(path))
)
- if match.group(4) is not None:
+ if match.group(4) is not None and match.group(4).decode() not in BLOCK_LIST:
raise AssertionError(
ERROR_MESSAGE.format(class_name=match.group(4).decode(), path=str(path))
) Also, there's a few conflicts, could you please fetch and merge upstream/master? |
@kevinetienne pls merge master. this will need to be merged soon after otherwise likely will have further conflicts. ping on green. |
@kevinetienne would prefer that you limit scope to this to a single type of change (and instead use multiple PRs) as large PRs are very likley to take a long time to review (unless they are trivial changes but that is not the case here) |
I'm closing this PR and I'll update the main ticket later. I'll break this PR in several parts. It would be easier for me to deliver too. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
TLDR: This code changes might be not in a mergeable state as false positives remain when updating the pattern.
The change is a bit bigger to what is described in #38093 as more files needed to be fixed, which is fine I think.
The stack displaying the files to fix might be limited to a few items (9 items in my case) when running the command
pre-commit run inconsistent-namespace-usage --all-files
. Every time a set of files was fixed a few others appeared which is why I ended fixing up more files and why my list of false positive is not exhaustive.I have committed the change to the PATTERN in a separate commit as we have false positives. Such as the following files which contain:
pandas/pandas/tests/test_strings.py
Line 188 in 862cd05
pandas/pandas/tests/io/test_parquet.py
Line 839 in 862cd05
df.isna
andaggr.isna
seepandas/pandas/tests/groupby/test_categorical.py
Line 528 in 862cd05
pandas/pandas/tests/groupby/test_categorical.py
Line 1570 in 862cd05
factorize
inpandas/pandas/tests/test_algos.py
Line 52 in 862cd05
eval
vspd.eval
(see https://github.com/pandas-dev/pandas/blob/862cd05df4452592a99dd1a4fa10ce8cfb3766f7/pandas/tests/computation/test_eval.py)I'm probably missing a few other cases but 1) 2) and 3) can be added in the negative lookahead (ignore function/methods which start with np/pyarrow/df). However the regex might grow bigger and bigger. 5) can be ignored entirely either within the regex or with an exclusion list.
Let me know you're thinking as I see three solutions: