Skip to content

TYP: fix reportInvalidTypeVarUse from pyright #42367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 26, 2021
Merged

TYP: fix reportInvalidTypeVarUse from pyright #42367

merged 6 commits into from
Sep 26, 2021

Conversation

twoertwein
Copy link
Member

@twoertwein twoertwein commented Jul 4, 2021

This fixes the following warning from pyright: TypeVar "..." appears only once in generic function signature (reportInvalidTypeVarUse).

It would be great to enable pyright on the CI. It takes only ~90sec to run pyright on the entire codebase with the following configuration in pyproject.toml (ignores most type errors for now):

[tool.pyright]
pythonVersion = "3.8"
include = ["pandas"]
ignore = ["pandas/tests"]
reportOptionalMemberAccess = false
reportOptionalOperand = false
reportGeneralTypeIssues = false
reportMissingImports = false
reportUnboundVariable = false
reportMissingModuleSource = false
reportOptionalSubscript = false
reportOptionalCall = false
reportOptionalIterable = false
reportPropertyTypeMismatch = false

The following errors would need to be taken care of before then:
pandas/pandas/core/frame.py:4575:9 - error: Method "_reindex_axes" cannot override final method defined in class "NDFrame"
pandas/pandas/core/frame.py:10736:9 - error: Method "mask" cannot override final method defined in class "NDFrame"
pandas/pandas/core/series.py:5449:9 - error: Method "mask" cannot override final method defined in class "NDFrame"
pandas/pandas/core/computation/expr.py:268:26 - error: TypeVar bound type cannot be generic
pandas/pandas/core/groupby/generic.py:676:9 - error: Method "describe" cannot override final method defined in class "GroupBy"
pandas/pandas/core/groupby/generic.py:856:9 - error: Method "pct_change" cannot override final method defined in class "GroupBy"

@simonjayhawkins
Copy link
Member

Thanks @twoertwein for the PR. I have merged just merged #41955, so need to change FrameOrSeriesUnion -> DataFrame | Series

A bit of history here...

originally we just had the FrameOrSeries alias and later needed the FrameOrSeriesUnion alias

so some uses of FrameOrSeries remain where only FrameOrSeriesUnion was needed.

I think mypy does pick up some cases where FrameOrSeries (i.e. TypeVar) is used and is unbounded, but not all.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jul 4, 2021
@twoertwein twoertwein marked this pull request as draft July 4, 2021 15:50
@twoertwein
Copy link
Member Author

I should have replaced FrameOrSeries with NDFrame and not with Series | DataFrame as

FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")

If FrameOrSeries is only ever used for DataFrames and Series, it should probably be

FrameOrSeries = TypeVar("FrameOrSeries", DataFrame, Series)

@simonjayhawkins
Copy link
Member

If FrameOrSeries is only ever used for DataFrames and Series, it should probably be

originally, that was what was used. but that created issues with core.generic (and some of the groupby IIRC)

I should have replaced FrameOrSeries with NDFrame and not with Series | DataFrame as

probably only in needed in core.generic and maybe groupby. elsewhere Series | DataFrame is probably fine.

@simonjayhawkins
Copy link
Member

I should have replaced FrameOrSeries with NDFrame and not with Series | DataFrame as

FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")

If FrameOrSeries is only ever used for DataFrames and Series, it should probably be

FrameOrSeries = TypeVar("FrameOrSeries", DataFrame, Series)

There is another subtle difference which maybe significant in some situations. TypeVar("FrameOrSeries", bound="NDFrame") maintains types for subclasses, whereas the TypeVar("FrameOrSeries", DataFrame, Series) variant does not. Internally, this is not really an issue, but for parts of the public api, this difference could be important to a group of users.

@twoertwein
Copy link
Member Author

TypeVar("FrameOrSeries", bound="NDFrame") maintains types for subclasses, whereas the TypeVar("FrameOrSeries", DataFrame, Series) variant does not. Internally, this is not really an issue, but for parts of the public api, this difference could be important to a group of users.

Thanks for all the information! I will prefer NDFrame in public functions and see whether that leaves any DataFrame | Series (currently it is the opposite way: NDFrame where necessary - I will change that).

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein generally lgtm.

@twoertwein twoertwein marked this pull request as ready for review July 5, 2021 18:38
@twoertwein
Copy link
Member Author

I think this PR is ready from my perspective (happy to remove the pyright config in pyproject.toml). Let me know what you think @simonjayhawkins

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein comment about comments for ignores otherwise lgtm.

@@ -297,7 +303,7 @@ def ax(self) -> Index:
raise ValueError("_set_grouper must be called before ax is accessed")
return index

def _get_grouper(self, obj: FrameOrSeries, validate: bool = True):
def _get_grouper(self, obj: NDFrame, validate: bool = True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this Typevar may be needed to maintain the return type, but the return is not typed, so I guess ok to use NDFrame for now to silence pyright.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I think the return type should be tuple[Any, ops.BaseGrouper, FrameOrSeries] I'm not sure what self.binner is supposed to be.

Mypy thinks that all three variables should be None. So I would need to add an ignore comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I "partially" typed _get_grouper

    def _get_grouper(
        self, obj: FrameOrSeries, validate: bool = True
    ) -> tuple[Any, ops.BaseGrouper, FrameOrSeries]:

The Any is not great, but it is definitely better than before.

@simonjayhawkins
Copy link
Member

@twoertwein can you avoid squashing and force pushing, it makes reviewing updates more difficult.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 7, 2021
@twoertwein
Copy link
Member Author

Rebased (but this time not squashed)

@mroeschke mroeschke removed the Stale label Aug 7, 2021
@mroeschke mroeschke added this to the 1.4 milestone Aug 7, 2021
@alimcmaster1
Copy link
Member

@twoertwein whenever you have a second mind merging master

@twoertwein
Copy link
Member Author

@alimcmaster1 rebased. Let me know if I should remove some parts that might be controversial (pyright settings in pyproject.toml?).

@jreback
Copy link
Contributor

jreback commented Sep 20, 2021

It would be great to enable pyright on the CI. It takes only ~90sec to run pyright on the entire codebase with the following configuration in pyproject.toml (ignores most type errors for now):

@twoertwein can you setup this?

@twoertwein
Copy link
Member Author

twoertwein commented Sep 20, 2021

It would be great to enable pyright on the CI. It takes only ~90sec to run pyright on the entire codebase with the following configuration in pyproject.toml (ignores most type errors for now):

@twoertwein can you setup this?

I will look into that. Should that be part of this PR or a followup (it might take me some time to play around with the CI).

Where in the CI should pyright be integrated (I assume in "CI / Checks")? Should pyright also be part of the pre-commit? There doesn't seem to be a pre-commit interface from pyright, except this third-party wrapper: https://github.com/necaris/pre-commit-pyright.

Other points to consider:

  • Pyright depends on node.js
  • Pyright fixes bugs at a rapid speed: it has ~1 release per week (probably need to pin the version)
  • Mypy and pyright might disagree (I don't think this is a concern right now, since most type checking rules need to be disabled to pass pyright).

edit: pyright can apparently be easily integrated in pre-commit: https://github.com/microsoft/pyright/blob/main/docs/ci-integration.md

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to enable pyright on the CI. It takes only ~90sec to run pyright on the entire codebase with the following configuration in pyproject.toml (ignores most type errors for now):

@twoertwein can you setup this?

I will look into that. Should that be part of this PR or a followup (it might take me some time to play around with the CI).

Where in the CI should pyright be integrated (I assume in "CI / Checks")? Should pyright also be part of the pre-commit? There doesn't seem to be a pre-commit interface from pyright, except this third-party wrapper: https://github.com/necaris/pre-commit-pyright.

Other points to consider:

  • Pyright depends on node.js
  • Pyright fixes bugs at a rapid speed: it has ~1 release per week (probably need to pin the version)
  • Mypy and pyright might disagree (I don't think this is a concern right now, since most type checking rules need to be disabled to pass pyright).

absolutely separate PR. in fact you might want to get the job in and can mark it failures_allowed for now, then PRs to fix up issues to get it to pass (and a todo list if needed).

  • Pyright depends on node.js
    prob need to add setup instructions in the docs then on how to run it
  • Pyright fixes bugs at a rapid speed: it has ~1 release per week (probably need to pin the version)
    sounds fine to pin

@jreback
Copy link
Contributor

jreback commented Sep 21, 2021

@twoertwein happy to merge this when ready

@twoertwein
Copy link
Member Author

@twoertwein happy to merge this when ready

I partially typed _get_grouper in the latest commit (contains an Any) to keep the intended TypeVar as pointed out by @simonjayhawkins

Apart for double-checking that, it should be ready from my side.

@twoertwein
Copy link
Member Author

Removed the pyright configuration (will be added in #43672).

@twoertwein
Copy link
Member Author

Rebased (includes #43672) and enabled checks for reportInvalidTypeVarUse and reportSelfClsParameterName.

@@ -1337,7 +1337,7 @@ def _make_unique_kwarg_list(


def relabel_result(
result: FrameOrSeries,
result: DataFrame | Series,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not able to use the alias, this is the entire point of it (i mean if we want to switch exclusively to DataFrame | Series i guess its ok, but we have an alias on purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
A TypeVar needs to be bound which isn't the case here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be helpful to rename FrameOrSeries to NDFrameT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh ok, then lets do that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this PR or a followup?

I don't mind adding more (I'm not the one reviewing it)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up ok. we have the naming for historical reasons and IIRC other reviewers have strong views on the naming, so IMO best not to mix in anything that could be controversial.

def shift(
self: IntervalArrayT, periods: int = 1, fill_value: object = None
) -> IntervalArray:
def shift(self, periods: int = 1, fill_value: object = None) -> IntervalArray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these not valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here:
IntervalArrayT = TypeVar("IntervalArrayT", bound="IntervalArray")
if the return value was of type IntervalArrayT, it would be valid.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein very nice cleanup!

we may want to try to replace more NDFrame usages outside of core (core.generic/core.groupby) where possible now that the invalid use of the inappropriately named FrameOrSeries typvar is caught by pyright.

@simonjayhawkins simonjayhawkins merged commit 49d3713 into pandas-dev:master Sep 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants