Skip to content

ENH: Add typing for pandas.core.frame.dropna #38968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

ENH: Add typing for pandas.core.frame.dropna #38968

wants to merge 7 commits into from

Conversation

rbpatt2019
Copy link

@rbpatt2019 rbpatt2019 commented Jan 5, 2021

This PR adds type hinting for pandas.core.frame.dropna. Additionally, an explicit return None was added to the if/else block for in-place changes to meet mypy standards and make the types explicit.

I haven't yet run the tests, because I didn't modify any source code. Once/if the CI builds pass, I'll check it off the list.

mypy justifiably requires an explicit `return None` when a function
return is typed as `Optional`
@rbpatt2019
Copy link
Author

Looking to add a "whatsnew" comment to the docs and see there's a tracker for v1.2.1 and v1.3.0, both of which aren't released. Which should I add to?

@rbpatt2019
Copy link
Author

It's also worth noting that mypy throws a wall of errors on my local machine (Ubuntu 20.10, Kernel 5.8.0-33-generic), but none of them are related to any changes I made.

pandas/core/computation/parsing.py:43: error: unused 'type: ignore' comment
pandas/io/common.py:182: error: Only @runtime_checkable protocols can be used with instance and class checks  [misc]
pandas/io/common.py:491: error: Unsupported operand types for + ("List[Optional[str]]" and "List[str]")  [operator]
pandas/_testing/contexts.py:128: error: Argument 1 to "close" has incompatible type "Optional[int]"; expected "int"  [arg-type]
pandas/core/aggregation.py:485: error: Value of type variable "_LT" of "sorted" cannot be "Optional[Hashable]"  [type-var]
pandas/core/aggregation.py:741: error: Value of type variable "_LT" of "sorted" cannot be "Optional[Hashable]"  [type-var]
pandas/core/groupby/ops.py:286: error: Incompatible types in assignment (expression has type "Index", variable has type "Iterator[Any]")  [assignment]
pandas/core/groupby/ops.py:286: note: 'Index' is missing following 'Iterator' protocol member:
pandas/core/groupby/ops.py:286: note:     __next__
pandas/core/reshape/pivot.py:198: error: Incompatible types in assignment (expression has type "Optional[DataFrame]", variable has type "DataFrame")  [assignment]
pandas/tests/groupby/test_categorical.py:1676: error: Item "None" of "Optional[DataFrame]" has no attribute "astype"  [union-attr]
pandas/tests/window/conftest.py:119: error: unused 'type: ignore' comment
pandas/tests/window/conftest.py:335: error: unused 'type: ignore' comment
pandas/tests/arrays/string_/test_string.py:25: error: unused 'type: ignore' comment
Found 12 errors in 9 files (checked 1160 source files)

@rbpatt2019
Copy link
Author

@mzeitlin11 Any feedback would be appreciated - I'm not understanding why I'm seeing these build fails, particularly as the bits that are failing don't seem to be related to any changes I made!

As the `Optional[DataFrame]` return type creates a union of incompatible
types (DataFrame and None), we overload the signature on the value of
the `inplace` parameter, to force the return of a DataFrame when inplace
is false.

Fixes GH38948
For GH38948
Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK Literal is a Python 3.8 feature whereas we still support Python 3.7 so so I think we want to use a workaround for now.

Also this doesn't actually pass mypy at the moment:

pandas/core/frame.py:5076: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [misc]

(although I'm not clear on why that is...)

cc @simonjayhawkins

@@ -5070,14 +5070,38 @@ def notna(self) -> DataFrame:
def notnull(self) -> DataFrame:
return ~self.isna()

# Overload function signature to prevent union of conflicting types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop this comment

how: str = ...,
thresh: Optional[int] = ...,
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
inplace: Literal[False] = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you be specifying the default value here? (I'm not sure)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we were typing at the time :P My understanding of overloading (and this might be very wrong) is that you only specify the defaults in overloads for those parameters you are overloading, in this case inplace.

See below for an example taken from pandas.core.frame.reset_index.

pandas/pandas/core/frame.py

Lines 4812 to 4843 in 67d4cae

@overload
# https://github.com/python/mypy/issues/6580
# Overloaded function signatures 1 and 2 overlap with incompatible return types
def reset_index( # type: ignore[misc]
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
drop: bool = ...,
inplace: Literal[False] = ...,
col_level: Hashable = ...,
col_fill: Label = ...,
) -> DataFrame:
...
@overload
def reset_index(
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
drop: bool = ...,
inplace: Literal[True] = ...,
col_level: Hashable = ...,
col_fill: Label = ...,
) -> None:
...
def reset_index(
self,
level: Optional[Union[Hashable, Sequence[Hashable]]] = None,
drop: bool = False,
inplace: bool = False,
col_level: Hashable = 0,
col_fill: Label = "",
) -> Optional[DataFrame]:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah in this example the defaults are not specified in the overloaded signatures so I think you should get rid of those

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That example also explains the mypy error (it's caused to a mypy bug). So to get mypy passing you need to add a type ignore (and a comment with the link to mypy issue and the error message above the overloaded signature as done in the reset_index example)

Those things being done I think this will be good to go in

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Changes forthcoming...

@rbpatt2019
Copy link
Author

So overloading the function signature to prevent type collision solved most of the bugs. I'm left with a mypy error about overloaded signature overlap with conflicting return types, which is not present when I run ./ci/code_checks.sh typing locally. I'm making a full development environment to see if that helps troubleshoot the issue.

@arw2019
Copy link
Member

arw2019 commented Jan 5, 2021

Looking to add a "whatsnew" comment to the docs and see there's a tracker for v1.2.1 and v1.3.0, both of which aren't released. Which should I add to?

We generally don't add a whatsnew for a pure typing PR

@jreback jreback added the Typing type annotations, mypy/pyright type checking label Jan 5, 2021
@jreback
Copy link
Contributor

jreback commented Jan 5, 2021

Looking to add a "whatsnew" comment to the docs and see there's a tracker for v1.2.1 and v1.3.0, both of which aren't released. Which should I add to?

not needed for typing

@rbpatt2019
Copy link
Author

Also, @arw2019 Literal is official 3.8 and is only available in <=3.7 with the typing_extensions module, which mypy discusses here. I only used as the pandas.core.frame module explicitly imports it here...

pandas/pandas/core/frame.py

Lines 163 to 164 in 67d4cae

if TYPE_CHECKING:
from typing import Literal

GH38948 pull request review
@arw2019
Copy link
Member

arw2019 commented Jan 5, 2021

Also, @arw2019 Literal is official 3.8 and is only available in <=3.7 with the typing_extensions module, which mypy discusses here. I only used as the pandas.core.frame module explicitly imports it here...

pandas/pandas/core/frame.py

Lines 163 to 164 in 67d4cae

if TYPE_CHECKING:
from typing import Literal

yeah that's fine (esp as it's already imported)

@@ -5070,14 +5070,38 @@ def notna(self) -> DataFrame:
def notnull(self) -> DataFrame:
return ~self.isna()

@overload
# https://github.com/python/mypy/issues/6580
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this comment has to go above the decorator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't in pandas.core.frame.reset_index, which doesn't seem to cause any problems. Also, it passed the typing portion of CI/Checks build, so I think we are fine. Happy to move it if there is a style preference though!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference since it passes CI

@rbpatt2019
Copy link
Author

Happy to see it approved! Let me know if I can help with anything else!
Out of curiosity, is there an official issue for tracking typing in pandas? I very strongly believe in typing and would be happy to contribute some spare time to progressing it coverage in the library!

@arw2019
Copy link
Member

arw2019 commented Jan 5, 2021

Happy to see it approved! Let me know if I can help with anything else!

FYI there will be at least one more round of review from someone else ;)

Out of curiosity, is there an official issue for tracking typing in pandas? I very strongly believe in typing and would be happy to contribute some spare time to progressing it coverage in the library!

There are some issues in the tracker but we don't usually open issues unless there's something specific to discuss.

We'd love for you to chime in and help to increase our coverage. Focused PRs like this one are best. For easy annotations (bool, str, etc.) we'll usually do a pass through a file just doing those and leaving the hard ones for dedicated PRs

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rbpatt2019 for the PR. small targeted PRs adding type hints always welcome.

axis: Axis = ...,
how: str = ...,
thresh: Optional[int] = ...,
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now have an IndexLabel alias in pandas._typing that could be used for subset.

Suggested change
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ...,
subset: Optional[IndexLabel] = ...,

inplace: bool = False,
):
) -> Optional[DataFrame]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you used #37137 as a example. However the motivation for that PR was to remove an assert df is not None from the codebase and the return type of Optional[DataFrame] not touched.

Not needed to be done in this PR, but just FYI if the motivation for adding the types is for the public api, then the return type should be the same type as self for subclassed DataFrames.

@simonjayhawkins simonjayhawkins added this to the 1.3 milestone Jan 8, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Feb 8, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 8, 2021
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

@rbpatt2019 can you update to comments and merge master

@simonjayhawkins simonjayhawkins changed the title BUG:Add typing for pandas.core.frame.dropna ENH: Add typing for pandas.core.frame.dropna Mar 14, 2021
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi - is this still active? If so, please see set_axis and reset_index for examples of these kinds of overloads (or, this blog post, which is still work-in-progress but does have some examples of how to do this)

@simonjayhawkins simonjayhawkins removed this from the 1.3 milestone Apr 24, 2021
@simonjayhawkins
Copy link
Member

closing as stale. @rbpatt2019 please ping if you want to pick this up again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Return type of pandas.DataFrame.dropna is incorrectly None
5 participants