Skip to content

Support for pyright 1.1.310 #716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 30, 2023
Merged

Support for pyright 1.1.310 #716

merged 2 commits into from
May 30, 2023

Conversation

Dr-Irv
Copy link
Collaborator

@Dr-Irv Dr-Irv commented May 29, 2023

See discussion in microsoft/pyright#5178

Recommendation was to return Self instead of Series[S1] . But if you do that when we don't know the type, it breaks mypy. So if we don't know the type of the Series, we return Series and that only broke a test related to pandera sublcassing, but only for pyright, so I modified that test to make pyright happy.

@Dr-Irv Dr-Irv requested a review from twoertwein May 29, 2023 19:57
Interval[Timestamp],
Interval[Timedelta],
CategoricalDtype,
bound=str
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See point #3 at microsoft/pyright#5178 (comment) , Recommended to use a Union rather than a constrained type var.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also affect other TypeVars (could be done later)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use S1 = TypeVar("S1", bound=Scalar)? You replaced S1 with Scalar in some places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 S1 = TypeVar(
     "S1",
    bound=Scalar
     | Interval[int]
     | Interval[float]
     | Interval[Timestamp]
     | Interval[Timedelta]
    | Dtype
    | Period
    | CategoricalDtype
    | datetime.time,
 )

Seems to work (but Scalar includes a few more types!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also affect other TypeVars (could be done later)?

Yes, and I created an issue for that #719

name: Hashable | None = ...,
copy: bool = ...,
fastpath: bool = ...,
) -> Self: ...
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload was previously connected to the next one in terms of the data argument, but I split it into two to have it return Self when S1 is known, and to return Series if not.

Copy link
Member

@twoertwein twoertwein May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the two Self overloads be combined?

edit: Should we relax S1 as we explicitly allow object (which is correct)?

edit2: The following replaces the last three overloads and passes all tests with pyright (but not mypy):

     @overload
     def __new__(
         cls,
         data: Series[S1] | dict[int, S1] | dict[_str, S1] | object= ...,
         index: Axes | None = ...,
         dtype: type[S1] | type | str | None=...,
         name: Hashable = ...,
         copy: bool = ...,
         fastpath: bool = ...,
     ) -> Self: ...

About mypy errors: since they now have more frequent releases, I wouldn't mind making changes that break mypy-compatibility IF 1) there is a long-standing mypy issue, 2) after informing them that we will break compatibility, and 3) if it helps clean pandas-stubs up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the two Self overloads be combined?

The reason I separated them is that we want to allow people to have an untyped data and specify dtype = type[S1], OR specify a typed data (where S1 can be inferred) and not specify dtype. If we combine them, then it allows any object along with dtype: str and that is an untyped Series. That's where mypy has trouble.

edit: Should we relax S1 as we explicitly allow object (which is correct)?

No. I want to make it that if we know the type and can do something special with it (e.g., Series[Timestamp].__add__(Timedelta)) we differentiate that. object is too wide, so that will correspond to Series[Any] (or in pyright terms, Series[Unknown])

edit2: The following replaces the last three overloads and passes all tests with pyright (but not mypy):

     @overload
     def __new__(
         cls,
         data: Series[S1] | dict[int, S1] | dict[_str, S1] | object= ...,
         index: Axes | None = ...,
         dtype: type[S1] | type | str | None=...,
         name: Hashable = ...,
         copy: bool = ...,
         fastpath: bool = ...,
     ) -> Self: ...

See above reason why I don't want to combine them.

The mypy error is reported here: python/mypy#15322

About mypy errors: since they now have more frequent releases, I wouldn't mind making changes that break mypy-compatibility IF 1) there is a long-standing mypy issue, 2) after informing them that we will break compatibility, and 3) if it helps clean pandas-stubs up.

I'm more hesitant to do that. Most of our bug reports from the community are things people find by using mypy. It is the dominant type checker for now, so we should make sure we handle people's code with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: Should we relax S1 as we explicitly allow object (which is correct)?

No. I want to make it that if we know the type and can do something special with it (e.g., Series[Timestamp].__add__(Timedelta)) we differentiate that. object is too wide, so that will correspond to Series[Any] (or in pyright terms, Series[Unknown])

I didn't mean to use data: object but to simply widen S1 to S1 = TypeVar("S1"), which allows any "object". Wouldn't this still allow special overloads? In either way, this PR is targeted for pyright compatibility and shouldn't depend on that.

def __add__(self, other: TimedeltaSeries | np.timedelta64) -> TimestampSeries: ... # type: ignore[override]
def __radd__(self, other: TimedeltaSeries | np.timedelta64) -> TimestampSeries: ... # type: ignore[override]
def __add__(self, other: TimedeltaSeries | np.timedelta64 | timedelta) -> TimestampSeries: ... # type: ignore[override]
def __radd__(self, other: TimedeltaSeries | np.timedelta64 | timedelta) -> TimestampSeries: ... # type: ignore[override]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently pyright 1.1.310 had bugs that revealed we were missing some types here

@@ -1604,12 +1605,14 @@ def test_pandera_generic() -> None:
T = TypeVar("T")

class MySeries(pd.Series, Generic[T]):
...
def __new__(cls, *args, **kwargs) -> Self:
return object.__new__(cls)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes pyright not complain.


def func() -> MySeries[float]:
return MySeries[float]([1, 2, 3])

func()
result = func()
assert result.iloc[1] == 2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the test to make sure that defining __new__() didn't change the result.

@twoertwein
Copy link
Member

Thanks @Dr-Irv !

@Dr-Irv Dr-Irv deleted the pyright310 branch December 2, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pyright above 1.1.308 has failures on our tests
2 participants