Skip to content

TYP: handle mypy ignore in SelectionMixin #38239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

ivanovmg
Copy link
Member

@ivanovmg ivanovmg commented Dec 2, 2020

Handle mypy ignore comments in pandas/core/base.py, class SelectionMixin.
There were issues with attributes obj and exclusions not defined in the mixin, but defined in the child classes.

I wish I made obj of type FrameOrSeries rather than Any.
This is the type used in the child classes, but this results in multiple mypy errors of this kind.

pandas\core\window\rolling.py:197: error: FrameOrSeries? has no attribute "ndim"  [attr-defined]

If I go with FrameOrSeriesUnion, then it is necessary to go and do casting where required to DataFrame or Series.
This is necessary since, for example, some kwargs for either of the types are irrelevant.

@@ -152,6 +154,9 @@ class SelectionMixin:
object sub-classes need to define: obj, exclusions
"""

obj: Any
exclusions: Set[Optional[Hashable]] = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Label?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Thank you.

@ivanovmg
Copy link
Member Author

ivanovmg commented Dec 3, 2020

I am not sure about the problem, but docs build finished with two warnings, causing CI to fail.

build finished with problems, 2 warnings.
Error: Process completed with exit code 1.

@@ -152,6 +153,9 @@ class SelectionMixin:
object sub-classes need to define: obj, exclusions
"""

obj: Any
exclusions: Set[Label] = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the docbuild failure is bc this is public but undocumented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states exclusions : set, optional. Is the docstring referring to required class attributes? The docstring states sub-classes need to define: obj, exclusions. why is exclusions initialised here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exclusions is initialized here to deal with the following mypy errors.

pandas\core\base.py:230: error: "SelectionMixin" has no attribute "exclusions"  [attr-defined]
pandas\core\base.py:231: error: "SelectionMixin" has no attribute "exclusions"  [attr-defined]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to initialize if sub-classes need to define as stated in the docstring, declaration will silence those errors.

Suggested change
exclusions: Set[Label] = set()
exclusions: Set[Label]

@ivanovmg
Copy link
Member Author

I documented obj and exclusions in SelectionMixin docstring.
But there are still complains in build docs.

/home/runner/work/pandas/pandas/doc/source/reference/api/pandas.DataFrame.exclusions.rst: WARNING: document isn't included in any toctree
/home/runner/work/pandas/pandas/doc/source/reference/api/pandas.Series.exclusions.rst: WARNING: document isn't included in any toctree

Any idea how to fix it?

@jbrockmendel
Copy link
Member

i was thinking just make them private

@ivanovmg
Copy link
Member Author

i was thinking just make them private

I tried, but the thing is that they are used heavily in the classes (at least obj), which subclass from SelectionMixin.
I guess, than avoiding mypy errors by making an actively used attributes private and thus changing quite the code base, is too costly.

@jreback jreback added the Typing type annotations, mypy/pyright type checking label Dec 22, 2020
@jreback jreback added this to the 1.3 milestone Dec 22, 2020
@jreback
Copy link
Contributor

jreback commented Dec 22, 2020

can you merge master

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanovmg for the PR

I wish I made obj of type FrameOrSeries rather than Any.

If the type of self.obj does not change during it's lifetime, then this is the correct annotation

If I go with FrameOrSeriesUnion

This would be correct if the type of self.obj is allowed to change.

object sub-classes need to define: obj, exclusions

obj : FrameOrSeries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we tend not to use the typing aliases in docstrings

@@ -148,10 +149,18 @@ class SpecificationError(Exception):

class SelectionMixin:
"""
mixin implementing the selection & aggregation interface on a group-like
Mixin implementing the selection & aggregation interface on a group-like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when making changes to docstrings, if possible can you make the first line a one-liner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made as a one-liner.

"""

obj: Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obj here is Any and FrameOrSeries in docstring.

Copy link
Member Author

@ivanovmg ivanovmg Jan 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks odd.
Initially I typed obj as FrameOrSeries, but then mypy would raise errors of this kind.

pandas\core\window\rolling.py:228: error: FrameOrSeries? has no attribute "ndim"  [attr-defined]
pandas\core\window\rolling.py:229: error: FrameOrSeries? has no attribute "__iter__" (not iterable)  [attr-defined]
pandas\core\window\rolling.py:236: error: FrameOrSeries? has no attribute "__iter__" (not iterable)  [attr-defined]
pandas\core\window\rolling.py:244: error: FrameOrSeries? has no attribute "_dir_additions"  [attr-defined]
pandas\core\window\rolling.py:324: error: FrameOrSeries? has no attribute "index"  [attr-defined]

The reason is that DataFrame and Series interface is similar, but different.
Some attributes are defined in one, but not defined in another, or the methods have different signatures.

So, I had to fallback with Any.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FrameOrSeries is a typevar, so to use it on a class/instance variable requires to make the SelectionMixin class generic. The mypy error suggests this.

pandas/core/base.py:161: error: Type variable "pandas._typing.FrameOrSeries" is unbound  [valid-type]
pandas/core/base.py:161: note: (Hint: Use "Generic[FrameOrSeries]" or "Protocol[FrameOrSeries]" base class to bind "FrameOrSeries" inside a class)
pandas/core/base.py:161: note: (Hint: Use "FrameOrSeries" in function signature to bind "FrameOrSeries" inside a function)

if having self.obj not maintain it's type for the duration of the instance life is ok, then use FrameOrSeriesUnion instead.

mypy errors such as pandas/core/window/rolling.py:108: error: Incompatible types in assignment (expression has type "FrameOrSeries", variable has type "Union[DataFrame, Series]") [assignment] if doing this are where FrameOrSeries aliases are used elsewhere that also would need to change to FrameOrSeriesUnion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 34 mypy errors of this kind (see details below) when setting to FrameOrSeriesUnion.
Would it be a good idea for the follow up PR to fix them?

pandas\core\window\rolling.py:108: error: Incompatible types in assignment (expression has type "FrameOrSeries", variable has type "Union[DataFrame, Series]") [assignment] pandas\core\groupby\groupby.py:571: error: Incompatible types in assignment (expression has type "FrameOrSeries", variable has type "Union[DataFrame, Series]") [assignment] pandas\core\groupby\groupby.py:872: error: Argument 1 to "get_iterator" of "BaseGrouper" has incompatible type "Union[DataFrame, Series]"; expected "FrameOrSeries" [arg-type] pandas\core\groupby\generic.py:233: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:358: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:440: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:467: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:501: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:567: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:567: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "Series") [return-value] pandas\core\groupby\generic.py:668: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:669: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "Series") [return-value] pandas\core\groupby\generic.py:777: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:809: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:828: error: Unexpected keyword argument "name" for "DataFrame" [call-arg] pandas\core\frame.py:511: note: "DataFrame" defined here pandas\core\groupby\generic.py:834: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "Series") [return-value] pandas\core\groupby\generic.py:947: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1171: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1171: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1278: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1289: error: Argument 1 to "_insert_inaxis_grouper_inplace" of "DataFrameGroupBy" has incompatible type "Union[DataFrame, Series]"; expected "DataFrame" [arg-type] pandas\core\groupby\generic.py:1325: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1353: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1398: error: Item "type" of "Union[Type[DataFrame], Type[Series]]" has no attribute "_from_arrays" [union-attr] pandas\core\groupby\generic.py:1462: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1462: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1597: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1601: error: Unexpected keyword argument "columns" for "Series" [call-arg] pandas\core\series.py:218: note: "Series" defined here pandas\core\groupby\generic.py:1601: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1648: error: Argument 1 to "_insert_inaxis_grouper_inplace" of "DataFrameGroupBy" has incompatible type "Union[DataFrame, Series]"; expected "DataFrame" [arg-type] pandas\core\groupby\generic.py:1656: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1686: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "DataFrame") [return-value] pandas\core\groupby\generic.py:1694: error: Argument 1 to "_insert_inaxis_grouper_inplace" of "DataFrameGroupBy" has incompatible type "Union[DataFrame, Series]"; expected "DataFrame" [arg-type] pandas\core\groupby\generic.py:1704: error: Incompatible return value type (got "Union[DataFrame, Any, Series]", expected "DataFrame") [return-value]

I think I can do them here, but first need to figure out how to deal with "CI Web and docs", since it complains about the missing documentation

2021-01-10T11:33:04.9021813Z /home/runner/work/pandas/pandas/doc/source/reference/api/pandas.DataFrame.exclusions.rst: WARNING: document isn't included in any toctree
2021-01-10T11:33:04.9024422Z /home/runner/work/pandas/pandas/doc/source/reference/api/pandas.Series.exclusions.rst: WARNING: document isn't included in any toctree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea for the follow up PR to fix them?

put a big TODO by the variable declaration otherwise this could get missed. Any should only be used where we want to allow dynamic typing ( and lose the benefits of static type checking)

@@ -152,6 +153,9 @@ class SelectionMixin:
object sub-classes need to define: obj, exclusions
"""

obj: Any
exclusions: Set[Label] = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states exclusions : set, optional. Is the docstring referring to required class attributes? The docstring states sub-classes need to define: obj, exclusions. why is exclusions initialised here?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 11, 2021
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanovmg for working on this.

There are many type: ignores that are being removed here, which is good. But I do have reservations about silencing the errors using Any. This is not really addressing the problem, just making it less visible.

Could you look into actually fixing this in this PR. We could then decide to split into separate PRs if necessary.

"""

obj: Any
"""Target object for the selection and aggregation."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove or change to comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

# however this creates multiple mypy errors elsewhere.
# Those have to be addressed in a separate PR.
exclusions: Set[Hashable]
"""Columns to exclude."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

obj: Any
"""Target object for the selection and aggregation."""
# GH 38239
# TODO obj here must be typed as FrameOrSeriesUnion,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO obj here must be typed as FrameOrSeriesUnion,
# TODO: obj here must be typed as FrameOrSeriesUnion,

Comment on lines 162 to 163
# however this creates multiple mypy errors elsewhere.
# Those have to be addressed in a separate PR.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this adds value. can remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Co-authored-by: Simon Hawkins <[email protected]>
@jreback
Copy link
Contributor

jreback commented Feb 15, 2021

can you merge master

@pep8speaks
Copy link

pep8speaks commented Feb 17, 2021

Hello @ivanovmg! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-03-16 06:40:20 UTC

@jreback
Copy link
Contributor

jreback commented Feb 17, 2021

@ivanovmg well my browser change added some whitespace :-< if you can fix up and push when you can.

@simonjayhawkins
Copy link
Member

@ivanovmg can you merge master and fixup mypy failures.

@ivanovmg
Copy link
Member Author

@simonjayhawkins I merged master.
It turns out that the changes introduced here regarding typing induce the following issues:

pandas/core/groupby/generic.py:1557: error: Unexpected keyword argument "columns" for "Series"  [call-arg]
pandas/core/groupby/generic.py:852: error: Unexpected keyword argument "name" for "DataFrame"  [call-arg]

In my opinion, there is a problem with the interchangeability of Series and DataFrame.
Fixing these problems is going to be extremely intrusive.
Considering the trade offs, IMHO, it is better to close this PR.

@jbrockmendel
Copy link
Member

Considering the trade offs, IMHO, it is better to close this PR.

That's your call.

@ivanovmg
Copy link
Member Author

Closing.

@ivanovmg ivanovmg closed this Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants