Skip to content

TYPING: change to FrameOrSeries Alias in pandas._typing #29480

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

simonjayhawkins
Copy link
Member

@WillAyd

I think we should discuss reverting #28173 and only use TypeVar("FrameOrSeries", bound="NDFrame") in core.generic. perhaps call it _NDFrameT to avoid confusion.

This PR is to show the changes required to keep mypy green if we wanted to revert.

It also reverts the change in #29458, see #29458 (comment)

_GroupBy is defined as a generic class, but not sure about Grouping

I've created it as a draft PR, since I think we should wait for more issues to surface going forward to help decide. The number of changes needed at this stage is not significant enough to warrant a hasty decision.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Nov 8, 2019
@WillAyd
Copy link
Member

WillAyd commented Nov 8, 2019

What is the problem this solves? Looks OK so no objections if you think it's a better approach

@simonjayhawkins
Copy link
Member Author

if you think it's a better approach

didn't realize how few uses of FrameOrSeries we had in master until looked at this.

so reluctant to change and then potentially need to change back again.

thinking that as we review, if we think FrameOrSeries should be generic in a class or function and it causes unexpected mypy issues we could suggest using FrameOrSeries = TypeVar("FrameOrSeries", "DataFrame", "Series") at a module level to see if the problems are solved. atm I think the type anotations either aren't added, NDFrame is used or Union[DataFrame, Series]

If we find this definition more useful, we could then use this as the global definition and use TypeVar("FrameOrSeries", bound="NDFrame") at the module level if necessary (i.e. core.generic)

@WillAyd
Copy link
Member

WillAyd commented Nov 8, 2019

Sure; this is still active development so no objections to moving things around if you find easier.

The only thing I don't want to do is put a looser definition like Union[DataFrame, Series] in pandas._typing and have people start using that, though recognize that a TypeVar isn't suitable in all cases

@simonjayhawkins
Copy link
Member Author

agreed. if necessary we can just use the Union as needed.

However, not sure about return types in these cases. The recommended approach is not to return Union types. Whereas NDFrame does not permit narrowing between Series/DataFrame.

as you said, active development, lets see if this becomes an issue.

@jbrockmendel
Copy link
Member

General question for @simonjayhawkins and @WillAyd: am I the only one having trouble with the distinction between NDFrame vs FrameOrSeries vs Union["DataFrame", "Series"] or whatever else? If its just me then ill set aside some time to get informed, but if others are having issues, it may merit discussion on the mailing list or a call.

@WillAyd
Copy link
Member

WillAyd commented Nov 11, 2019

General question for @simonjayhawkins and @WillAyd: am I the only one having trouble with the distinction between NDFrame vs FrameOrSeries vs Union["DataFrame", "Series"] or whatever else?

No it can be confusing. Very loosely the difference between:

def some_function(obj: FrameOrSeries) -> FrameOrSeries:
   ...

and

def some_function(obj: Union[Series, DataFrame]) -> Union[Series, DataFrame]:
    ...

Is that with the former the type that goes in and out of the function is homogenous, i.e. if you pass a DataFrame in you get a DataFrame out, if you pass a Series in you get a Series out. With the latter, basically the cartesian product of types is allowable, i.e. you can pass a DataFrame in and get either a Series or a DataFrame back, and likewise if you pass a Series in you can still get a Series or DataFrame back.

I think in some cases like the following the usage can be interchangeable

def length(obj: FrameOrSeries) -> int:
    ...

def length(obj: Union[Series, DataFrame]) -> int:
    ...

Because there is no type parametrization, i.e. FrameOrSeries only appears once in signature.

@@ -336,13 +347,13 @@ def _group_selection_context(groupby):
groupby._reset_group_selection()


class _GroupBy(PandasObject, SelectionMixin):
class _GroupBy(PandasObject, SelectionMixin, Generic[FrameOrSeries]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure I would consider _GroupBy a generic class - is this required as part of this change or just experimenting?

@simonjayhawkins
Copy link
Member Author

I've created it as a draft PR, since I think we should wait for more issues to surface going forward to help decide. The number of changes needed at this stage is not significant enough to warrant a hasty decision.

closing to clear queue. No issues encountered in last few weeks. can re-open if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants