Skip to content

TYP: handle mypy ignore in SelectionMixin #38239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 23 additions & 50 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
Dict,
FrozenSet,
Optional,
Set,
TypeVar,
Union,
cast,
Expand All @@ -19,7 +20,7 @@
import numpy as np

import pandas._libs.lib as lib
from pandas._typing import DtypeObj, IndexLabel
from pandas._typing import DtypeObj, IndexLabel, Label
from pandas.compat import PYPY
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
Expand Down Expand Up @@ -148,10 +149,18 @@ class SpecificationError(Exception):

class SelectionMixin:
"""
mixin implementing the selection & aggregation interface on a group-like
Mixin implementing the selection & aggregation interface on a group-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when making changes to docstrings, if possible can you make the first line a one-liner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made as a one-liner.

object sub-classes need to define: obj, exclusions

obj : FrameOrSeries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tend not to use the typing aliases in docstrings

Target object for the selection and aggregation.
exclusions : set, optional
Columns to exclude.
"""

obj: Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj here is Any and FrameOrSeries in docstring.

Copy link
Member Author

@ivanovmg ivanovmg Jan 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks odd.
Initially I typed obj as FrameOrSeries, but then mypy would raise errors of this kind.

pandas\core\window\rolling.py:228: error: FrameOrSeries? has no attribute "ndim"  [attr-defined]
pandas\core\window\rolling.py:229: error: FrameOrSeries? has no attribute "__iter__" (not iterable)  [attr-defined]
pandas\core\window\rolling.py:236: error: FrameOrSeries? has no attribute "__iter__" (not iterable)  [attr-defined]
pandas\core\window\rolling.py:244: error: FrameOrSeries? has no attribute "_dir_additions"  [attr-defined]
pandas\core\window\rolling.py:324: error: FrameOrSeries? has no attribute "index"  [attr-defined]

The reason is that DataFrame and Series interface is similar, but different.
Some attributes are defined in one, but not defined in another, or the methods have different signatures.

So, I had to fallback with Any.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FrameOrSeries is a typevar, so to use it on a class/instance variable requires to make the SelectionMixin class generic. The mypy error suggests this.

pandas/core/base.py:161: error: Type variable "pandas._typing.FrameOrSeries" is unbound  [valid-type]
pandas/core/base.py:161: note: (Hint: Use "Generic[FrameOrSeries]" or "Protocol[FrameOrSeries]" base class to bind "FrameOrSeries" inside a class)
pandas/core/base.py:161: note: (Hint: Use "FrameOrSeries" in function signature to bind "FrameOrSeries" inside a function)

if having self.obj not maintain it's type for the duration of the instance life is ok, then use FrameOrSeriesUnion instead.

mypy errors such as pandas/core/window/rolling.py:108: error: Incompatible types in assignment (expression has type "FrameOrSeries", variable has type "Union[DataFrame, Series]") [assignment] if doing this are where FrameOrSeries aliases are used elsewhere that also would need to change to FrameOrSeriesUnion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 34 mypy errors of this kind (see details below) when setting to FrameOrSeriesUnion.
Would it be a good idea for the follow up PR to fix them?

pandas\core\window\rolling.py:108: error: Incompatible types in assignment (expression has type "FrameOrSeries", variable has type "Union[DataFrame, Series]") [assignment] pandas\core\groupby\groupby.py:571: error: Incompatible types in assignment (expression has type "FrameOrSeries", variable has type "Union[DataFrame, Series]") [assignment] pandas\core\groupby\groupby.py:872: error: Argument 1 to "get_iterator" of "BaseGrouper" has incompatible type "Union[DataFrame, Series]"; expected "FrameOrSeries" [arg-type] pandas\core\groupby\generic.py:233: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:358: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:440: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:467: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:501: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:567: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:567: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "Series") [return-value] pandas\core\groupby\generic.py:668: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:669: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "Series") [return-value] pandas\core\groupby\generic.py:777: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:809: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:828: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:834: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "Series") [return-value] pandas\core\groupby\generic.py:947: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1171: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1171: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1278: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1289: error: Argument 1 to "_insert_inaxis_grouper_inplace" of "DataFrameGroupBy" has incompatible type "Union[DataFrame, Series]"; expected "DataFrame" [arg-type] pandas\core\groupby\generic.py:1325: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1353: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1398: error: Item "type" of "Union[Type[DataFrame], Type[Series]]" has no attribute "_from_arrays" [union-attr] pandas\core\groupby\generic.py:1462: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1462: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1597: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1601: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1601: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1648: error: Argument 1 to "_insert_inaxis_grouper_inplace" of "DataFrameGroupBy" has incompatible type "Union[DataFrame, Series]"; expected "DataFrame" [arg-type] pandas\core\groupby\generic.py:1656: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1686: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1694: error: Argument 1 to "_insert_inaxis_grouper_inplace" of "DataFrameGroupBy" has incompatible type "Union[DataFrame, Series]"; expected "DataFrame" [arg-type] pandas\core\groupby\generic.py:1704: error: Incompatible return value type (got "Union[DataFrame, Any, Series]", expected "DataFrame") [return-value]

I think I can do them here, but first need to figure out how to deal with "CI Web and docs", since it complains about the missing documentation

2021-01-10T11:33:04.9021813Z /home/runner/work/pandas/pandas/doc/source/reference/api/pandas.DataFrame.exclusions.rst: WARNING: document isn't included in any toctree
2021-01-10T11:33:04.9024422Z /home/runner/work/pandas/pandas/doc/source/reference/api/pandas.Series.exclusions.rst: WARNING: document isn't included in any toctree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea for the follow up PR to fix them?

put a big TODO by the variable declaration otherwise this could get missed. Any should only be used where we want to allow dynamic typing ( and lose the benefits of static type checking)

exclusions: Set[Label] = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the docbuild failure is bc this is public but undocumented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states exclusions : set, optional. Is the docstring referring to required class attributes? The docstring states sub-classes need to define: obj, exclusions. why is exclusions initialised here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exclusions is initialized here to deal with the following mypy errors.

pandas\core\base.py:230: error: "SelectionMixin" has no attribute "exclusions"  [attr-defined]
pandas\core\base.py:231: error: "SelectionMixin" has no attribute "exclusions"  [attr-defined]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to initialize if sub-classes need to define as stated in the docstring, declaration will silence those errors.

Suggested change
exclusions: Set[Label] = set()
exclusions: Set[Label]


_selection: Optional[IndexLabel] = None
_internal_names = ["_cache", "__setstate__"]
_internal_names_set = set(_internal_names)
Expand Down Expand Up @@ -206,77 +215,41 @@ def _selection_list(self):

@cache_readonly
def _selected_obj(self):
# pandas\core\base.py:195: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]
if self._selection is None or isinstance(
self.obj, ABCSeries # type: ignore[attr-defined]
):
# pandas\core\base.py:194: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]
return self.obj # type: ignore[attr-defined]
else:
# pandas\core\base.py:204: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]
return self.obj[self._selection] # type: ignore[attr-defined]
if self._selection is None or isinstance(self.obj, ABCSeries):
return self.obj
return self.obj[self._selection]

@cache_readonly
def ndim(self) -> int:
return self._selected_obj.ndim

@cache_readonly
def _obj_with_exclusions(self):
# pandas\core\base.py:209: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]
if self._selection is not None and isinstance(
self.obj, ABCDataFrame # type: ignore[attr-defined]
):
# pandas\core\base.py:217: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]
return self.obj.reindex( # type: ignore[attr-defined]
columns=self._selection_list
)
if self._selection is not None and isinstance(self.obj, ABCDataFrame):
return self.obj.reindex(columns=self._selection_list)

# pandas\core\base.py:207: error: "SelectionMixin" has no attribute
# "exclusions" [attr-defined]
if len(self.exclusions) > 0: # type: ignore[attr-defined]
# pandas\core\base.py:208: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]

# pandas\core\base.py:208: error: "SelectionMixin" has no attribute
# "exclusions" [attr-defined]
return self.obj.drop(self.exclusions, axis=1) # type: ignore[attr-defined]
if len(self.exclusions) > 0:
return self.obj.drop(self.exclusions, axis=1)
else:
# pandas\core\base.py:210: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]
return self.obj # type: ignore[attr-defined]
return self.obj

def __getitem__(self, key):
if self._selection is not None:
raise IndexError(f"Column(s) {self._selection} already selected")

if isinstance(key, (list, tuple, ABCSeries, ABCIndex, np.ndarray)):
# pandas\core\base.py:217: error: "SelectionMixin" has no attribute
# "obj" [attr-defined]
if len(
self.obj.columns.intersection(key) # type: ignore[attr-defined]
) != len(key):
# pandas\core\base.py:218: error: "SelectionMixin" has no
# attribute "obj" [attr-defined]
bad_keys = list(
set(key).difference(self.obj.columns) # type: ignore[attr-defined]
)
if len(self.obj.columns.intersection(key)) != len(key):
bad_keys = list(set(key).difference(self.obj.columns))
raise KeyError(f"Columns not found: {str(bad_keys)[1:-1]}")
return self._gotitem(list(key), ndim=2)

elif not getattr(self, "as_index", False):
# error: "SelectionMixin" has no attribute "obj" [attr-defined]
if key not in self.obj.columns: # type: ignore[attr-defined]
if key not in self.obj.columns:
raise KeyError(f"Column not found: {key}")
return self._gotitem(key, ndim=2)

else:
# error: "SelectionMixin" has no attribute "obj" [attr-defined]
if key not in self.obj: # type: ignore[attr-defined]
if key not in self.obj:
raise KeyError(f"Column not found: {key}")
return self._gotitem(key, ndim=1)

Expand Down
2 changes: 0 additions & 2 deletions pandas/core/window/rolling.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
Dict,
List,
Optional,
Set,
Tuple,
Type,
Union,
Expand Down Expand Up @@ -85,7 +84,6 @@ class BaseWindow(ShallowMixin, SelectionMixin):
"on",
"closed",
]
exclusions: Set[str] = set()

def __init__(
self,
Expand Down