Skip to content

ENH: Add typing for pandas.core.frame.dropna #38968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
31 changes: 28 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5070,14 +5070,38 @@ def notna(self) -> DataFrame:
def notnull(self) -> DataFrame:
return ~self.isna()

# Overload function signature to prevent union of conflicting types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop this comment

# As Optional[DataFrame] is really Union[DataFrame, None]
@overload
def dropna(
self,
axis: Axis = ...,
how: str = ...,
thresh: Optional[int] = ...,
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
inplace: Literal[False] = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you be specifying the default value here? (I'm not sure)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we were typing at the time :P My understanding of overloading (and this might be very wrong) is that you only specify the defaults in overloads for those parameters you are overloading, in this case inplace.

See below for an example taken from pandas.core.frame.reset_index.

pandas/pandas/core/frame.py

Lines 4812 to 4843 in 67d4cae

@overload
# https://github.com/python/mypy/issues/6580
# Overloaded function signatures 1 and 2 overlap with incompatible return types
def reset_index( # type: ignore[misc]
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
drop: bool = ...,
inplace: Literal[False] = ...,
col_level: Hashable = ...,
col_fill: Label = ...,
) -> DataFrame:
...
@overload
def reset_index(
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
drop: bool = ...,
inplace: Literal[True] = ...,
col_level: Hashable = ...,
col_fill: Label = ...,
) -> None:
...
def reset_index(
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = None,
drop: bool = False,
inplace: bool = False,
col_level: Hashable = 0,
col_fill: Label = "",
) -> Optional[DataFrame]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah in this example the defaults are not specified in the overloaded signatures so I think you should get rid of those

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example also explains the mypy error (it's caused to a mypy bug). So to get mypy passing you need to add a type ignore (and a comment with the link to mypy issue and the error message above the overloaded signature as done in the reset_index example)

Those things being done I think this will be good to go in

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Changes forthcoming...

) -> DataFrame:
...

@overload
def dropna(
self,
axis: Axis = ...,
how: str = ...,
thresh: Optional[int] = ...,
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now have an IndexLabel alias in pandas._typing that could be used for subset.

Suggested change
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
subset: Optional[IndexLabel] = ...,

inplace: Literal[True] = True,
) -> None:
...

def dropna(
self,
axis: Axis = 0,
how: str = "any",
thresh=None,
subset=None,
thresh: Optional[int] = None,
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None,
inplace: bool = False,
):
) -> Optional[DataFrame]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you used #37137 as a example. However the motivation for that PR was to remove an assert df is not None from the codebase and the return type of Optional[DataFrame] not touched.

Not needed to be done in this PR, but just FYI if the motivation for adding the types is for the public api, then the return type should be the same type as self for subclassed DataFrames.

"""
Remove missing values.

Expand Down Expand Up @@ -5216,6 +5240,7 @@ def dropna(

if inplace:
self._update_inplace(result)
return None
else:
return result

Expand Down