Skip to content

WIP for MyPy CI Integration #25622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ dist
.coverage
coverage.xml
coverage_html_report
*.mypy_cache
*.pytest_cache
# hypothesis test database
.hypothesis/
Expand Down Expand Up @@ -111,4 +112,4 @@ doc/build/html/index.html
# Windows specific leftover:
doc/tmp.sv
env/
doc/source/savefig/
doc/source/savefig/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need a newline here

8 changes: 8 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[mypy]
follow_imports=silent
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less than ideal but without this the errors are out of control. My motivation for setting this comes from the Mypy documentation:

https://mypy.readthedocs.io/en/latest/running_mypy.html#following-imports

To wit:

If you are planning on adding type hints to a large, existing code base, we recommend you start by trying to make your entire codebase (including files that do not use type hints) pass under --follow-imports=normal. This is usually not too difficult to do: mypy is designed to report as few error messages as possible when it is looking at unannotated code.

If doing this is intractable, we recommend passing mypy just the files you want to type check and use --follow-imports=silent. Even if mypy is unable to perfectly type check a file, it can still glean some useful information by parsing it (for example, understanding what methods a given object has). See Using mypy with an existing codebase for more recommendations.


[mypy-numpy.*]
ignore_missing_imports=True

[mypy-pandas._libs.*]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally doesn't look like Mypy likes Cython imports. Have this in the config but also noted specifically on imports within the code.

There's probably a better approach to this just haven't found yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the PEP, you should be able to use .pyi/stub files to annotate compiled extensions. Not ideal though since they'd have to stay in-sync with the extension

ignore_missing_imports=True
17 changes: 17 additions & 0 deletions mypy_whitelist.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
pandas/core/dtypes/base.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all of the files that currently have some sort of hints in them. The motivation for this comes form this part in the documentation:

https://mypy.readthedocs.io/en/latest/running_mypy.html#reading-a-list-of-files-from-a-file

So thinking for initial CI runs we can whitelist the modules we want to run against, though this is ideally just a temporary file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear when running this from the project root you would do mypy @mypy_whitelist.txt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naddiseo any chance you have experience with this? Reading from docs I think suggested approach will be to whitelist particular modules at the outset and slowly open up as more are added.

I'd like to avoid having two files to control configuration here but I don't see an easy way in the .ini file to control which modules actually get checked

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd, not with the whitelist approach. I went with the blacklist approach instead, and had a bunch of modules and packages with ignore_errors set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Was that done per package / module in the config file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I'm just ignoring packages. But, I think at one point I was doing both, and I seem to remember doing a mixture where I'd ignore everything in a package except a specific file.
I think it looked like:

[mypy-package.subpackage]
ignore_errors=True
[mypy-package.subpackage.module]
ignore_errors=False

However, I don't remember if it worked or not.

pandas/core/groupby/groupby.py
pandas/core/internals/blocks.py
pandas/core/internals/managers.py
pandas/core/common.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not possible in the setup file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I was hoping that Mypy_path in the ini file would work but didn't have any luck. I didn't also see a very good way to make this readable therein since the ini would expect this to be a comma separated list.

With that said this may just be temporary anyway - would ideally like to run on the entire code. Modules without existing types would be ignored by default anyway

pandas/core/arrays/datetimes.py
pandas/core/arrays/integer.py
pandas/core/arrays/period.py
pandas/core/arrays/sparse.py
pandas/core/arrays/datetimelike.py
pandas/core/arrays/array_.py
pandas/core/arrays/base.py
pandas/core/frame.py
pandas/core/base.py
pandas/core/indexes/period.py
pandas/core/indexes/datetimelike.py
pandas/core/indexes/base.py
1 change: 1 addition & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from datetime import datetime, timedelta
from functools import partial
import inspect
from typing import Any

import numpy as np

Expand Down
4 changes: 3 additions & 1 deletion pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
"""Extend pandas with custom array types"""
from typing import List, Optional, Type

import numpy as np

from pandas.errors import AbstractMethodError
Expand Down Expand Up @@ -211,7 +213,7 @@ def __str__(self):

@property
def type(self):
# type: () -> type
# type: () -> Type
"""
The scalar type for the array, e.g. ``int``

Expand Down
15 changes: 8 additions & 7 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ class providing the base-class of operations.
from functools import partial, wraps
import types
import warnings
from typing import FrozenSet, Optional, Type

import numpy as np

from pandas._libs import Timestamp, groupby as libgroupby
from pandas._libs import Timestamp, groupby as libgroupby # type: ignore
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note on Cython imports

import pandas.compat as compat
from pandas.compat import range, set_function_name, zip
from pandas.compat.numpy import function as nv
Expand Down Expand Up @@ -325,7 +326,7 @@ def _group_selection_context(groupby):

class _GroupBy(PandasObject, SelectionMixin):
_group_selection = None
_apply_whitelist = frozenset()
_apply_whitelist = frozenset() # type: FrozenSet[str]

def __init__(self, obj, keys=None, axis=0, level=None,
grouper=None, exclusions=None, selection=None, as_index=True,
Expand Down Expand Up @@ -1041,7 +1042,7 @@ def _bool_agg(self, val_test, skipna):
"""

def objs_to_bool(vals):
# type: (np.ndarray) -> (np.ndarray, typing.Type)
# type: (np.ndarray) -> (np.ndarray, Type)
if is_object_dtype(vals):
vals = np.array([bool(x) for x in vals])
else:
Expand All @@ -1050,7 +1051,7 @@ def objs_to_bool(vals):
return vals.view(np.uint8), np.bool

def result_to_bool(result, inference):
# type: (np.ndarray, typing.Type) -> np.ndarray
# type: (np.ndarray, Type) -> np.ndarray
return result.astype(inference, copy=False)

return self._get_cythonized_result('group_any_all', self.grouper,
Expand Down Expand Up @@ -1743,7 +1744,7 @@ def quantile(self, q=0.5, interpolation='linear'):
"""

def pre_processor(vals):
# type: (np.ndarray) -> (np.ndarray, Optional[typing.Type])
# type: (np.ndarray) -> (np.ndarray, Optional[Type])
if is_object_dtype(vals):
raise TypeError("'quantile' cannot be performed against "
"'object' dtypes!")
Expand All @@ -1758,7 +1759,7 @@ def pre_processor(vals):
return vals, inference

def post_processor(vals, inference):
# type: (np.ndarray, Optional[typing.Type]) -> np.ndarray
# type: (np.ndarray, Optional[Type]) -> np.ndarray
if inference:
# Check for edge case
if not (is_integer_dtype(inference) and
Expand Down Expand Up @@ -2021,7 +2022,7 @@ def _get_cythonized_result(self, how, grouper, aggregate=False,
Function to be applied to result of Cython function. Should accept
an array of values as the first argument and type inferences as its
second argument, i.e. the signature should be
(ndarray, typing.Type).
(ndarray, Type).
**kwargs : dict
Extra arguments to be passed back to Cython funcs

Expand Down
12 changes: 9 additions & 3 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@
import inspect
import re
import warnings
from typing import Any, List, Optional

import numpy as np

from pandas._libs import internals as libinternals, lib, tslib, tslibs
from pandas._libs import (internals as libinternals, # type: ignore
lib, tslib, tslibs)
from pandas._libs.tslibs import Timedelta, conversion, is_null_datetimelike
import pandas.compat as compat
from pandas.compat import range, zip
Expand Down Expand Up @@ -1826,8 +1828,12 @@ def interpolate(self, method='pad', axis=0, inplace=False, limit=None,
limit=limit),
placement=self.mgr_locs)

def shift(self, periods, axis=0, fill_value=None):
# type: (int, Optional[BlockPlacement], Any) -> List[ExtensionBlock]
def shift(self,
periods, # type: int
axis=0, # type: Optional[libinternals.BlockPlacement]
fill_value=None # type: Any
):
# type: (...) -> List[ExtensionBlock]
"""
Shift the block by `periods`.

Expand Down
4 changes: 3 additions & 1 deletion pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@
import itertools
import operator
import re
from typing import List, Optional, Union

import numpy as np

from pandas._libs import internals as libinternals, lib
from pandas._libs import internals as libinternals, lib # type: ignore
from pandas.api.extensions import ExtensionDtype
from pandas.compat import map, range, zip
from pandas.util._validators import validate_bool_kwarg

Expand Down