Skip to content

TYP: pd.isna #46222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Mar 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
632eed5
TYP: pd.isna
twoertwein Mar 4, 2022
4496f42
address review
twoertwein Mar 4, 2022
b77f716
avoid needing to import npt at runtime
twoertwein Mar 4, 2022
24aec6f
comment for cast
twoertwein Mar 5, 2022
c1204b4
remove to
twoertwein Mar 5, 2022
a0d8123
Merge remote-tracking branch 'upstream/main' into isna
twoertwein Mar 5, 2022
b412060
back to Index
twoertwein Mar 6, 2022
17c9779
remove unused Scalar
twoertwein Mar 6, 2022
a4d0dfa
Merge remote-tracking branch 'upstream/main' into isna
twoertwein Mar 6, 2022
2d7f86a
get left | right on the same line to avoid mypy printing a 'note'
twoertwein Mar 6, 2022
eb16bc3
ArrayLike
twoertwein Mar 7, 2022
b9288cb
unsupported overloads
twoertwein Mar 7, 2022
c77248f
all typing imports witihn TYPE_CHECKING
twoertwein Mar 7, 2022
70d5d60
Revert "unsupported overloads"
twoertwein Mar 8, 2022
1d2afd5
Merge remote-tracking branch 'upstream/main' into isna
twoertwein Mar 8, 2022
5f8b856
handle unions
twoertwein Mar 8, 2022
7e65d89
do not require Series to be imported at runtime
twoertwein Mar 8, 2022
a0cf860
CI/TST: numpy 1.22.3 release fixes (#46274)
mroeschke Mar 9, 2022
a29a9e0
TYP: annotation of __init__ return type (PEP 484) (misc modules) (#46…
EkaterinaKuzkina Mar 9, 2022
d1957f4
TYP: annotation of __init__ return type (PEP 484) (pandas/tests) (#46…
EkaterinaKuzkina Mar 9, 2022
c319692
TYP: add type annotation to DataFrame.to_pickle (#46262)
Moisan Mar 9, 2022
f1d0309
Merge remote-tracking branch 'upstream/main' into isna
twoertwein Mar 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from __future__ import annotations

import numbers
from typing import TYPE_CHECKING
from typing import (
TYPE_CHECKING,
cast,
)

import numpy as np

Expand Down Expand Up @@ -31,6 +34,8 @@
if TYPE_CHECKING:
import pyarrow

from pandas._typing import npt


@register_extension_dtype
class BooleanDtype(BaseMaskedDtype):
Expand Down Expand Up @@ -200,7 +205,9 @@ def coerce_to_array(
if inferred_dtype not in ("boolean", "empty") + integer_like:
raise TypeError("Need to pass bool-like values")

mask_values = isna(values_object)
# mypy does not narrow the type of mask_values to npt.NDArray[np.bool_]
# within this branch, it assumes it can also be None
mask_values = cast("npt.NDArray[np.bool_]", isna(values_object))
values = np.zeros(len(values), dtype=bool)
values[~mask_values] = values_object[~mask_values].astype(bool)

Expand Down
4 changes: 3 additions & 1 deletion pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -769,7 +769,9 @@ def hasnans(self) -> bool:

Enables various performance speedups.
"""
return bool(isna(self).any())
# error: Item "bool" of "Union[bool, ndarray[Any, dtype[bool_]], NDFrame]"
# has no attribute "any"
return bool(isna(self).any()) # type: ignore[union-attr]

def isna(self):
return isna(self._values)
Expand Down
84 changes: 76 additions & 8 deletions pandas/core/dtypes/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@

from decimal import Decimal
from functools import partial
from typing import (
TYPE_CHECKING,
overload,
)

import numpy as np

Expand All @@ -16,11 +20,6 @@
NaT,
iNaT,
)
from pandas._typing import (
ArrayLike,
DtypeObj,
npt,
)

from pandas.core.dtypes.common import (
DT64NS_DTYPE,
Expand Down Expand Up @@ -54,6 +53,19 @@
)
from pandas.core.dtypes.inference import is_list_like

if TYPE_CHECKING:
from pandas._typing import (
ArrayLike,
DtypeObj,
NDFrame,
NDFrameT,
Scalar,
npt,
)
Copy link
Member

@simonjayhawkins simonjayhawkins Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @twoertwein for the PR.

I didn't want to comment on this PR before to avoid "too many cooks" and in general if mypy is happy, so am I. (In some respects, because we have mypy we don't need to review certain aspects of typing as if it is not 100% correct, it will create issues down the line. I see it like a jigsaw puzzle that won't be complete until the last piece is in place, i.e. the codebase is 100% typed.)

However, I think I've seen that your preference is to import from pandas._typing inside the TYPE_CHECKING also elsewhere? If this is correct can you explain the reasoning so that it helps when reviewing other contributors PRs.

AFAIK, we ensure all imports in pandas._typing are guarded so that they can be imported at the top level and for instance npt was added so that we could import without needing to add a TYPE_CHECKING block everywhere it was needed, otherwise we would just import from numpy directly and not include in pandas._typing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I imported it within the TYPE_CHECKING block because it is only needed for type checking . I will import it outside the TYPE_CHECKING block in next PRs.

Probably one reason why I prefer to put imports in this block is that there are some import cycles that prevent even mypy to function correctly: I think having from pandas import Index caused issues whereas from pandas.core.indexes.base import Index worked without issues (both inside the TYPE_CHECKING block).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for import cycles it is definitely needed. My preference is also adding imports inside the type checking if added specifically in a typing PR for type annotations but this is difficult to enforce since say a refactor could remove the need for a top level import and it would not be removed because it is still used in type annotations then in theory the import should be moved to the type checking for consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having from pandas import Index caused issues whereas from pandas.core.indexes.base import Index worked without issues (both inside the TYPE_CHECKING block).

my preference here is to use from pandas import Index when inside the type checking block. less typing! (the fingers on the keyboard type of typing not the type annotations type of typing!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried replacing it with from pandas import Index and it worked, must have been something different that caused weird mypy issues.


from pandas.core.indexes.base import Index


isposinf_scalar = libmissing.isposinf_scalar
isneginf_scalar = libmissing.isneginf_scalar

Expand All @@ -63,7 +75,35 @@
_dtype_str = np.dtype(str)


def isna(obj):
@overload
def isna(obj: Scalar) -> bool:
...


@overload
def isna(
obj: ArrayLike | Index | list,
) -> npt.NDArray[np.bool_]:
...


@overload
def isna(obj: NDFrameT) -> NDFrameT:
...


# handle unions
@overload
def isna(obj: NDFrameT | ArrayLike | Index | list) -> NDFrameT | npt.NDArray[np.bool_]:
...


@overload
def isna(obj: object) -> bool | npt.NDArray[np.bool_] | NDFrame:
...


def isna(obj: object) -> bool | npt.NDArray[np.bool_] | NDFrame:
"""
Detect missing values for an array-like object.

Expand Down Expand Up @@ -284,7 +324,35 @@ def _isna_string_dtype(values: np.ndarray, inf_as_na: bool) -> npt.NDArray[np.bo
return result


def notna(obj):
@overload
def notna(obj: Scalar) -> bool:
...


@overload
def notna(
obj: ArrayLike | Index | list,
) -> npt.NDArray[np.bool_]:
...


@overload
def notna(obj: NDFrameT) -> NDFrameT:
...


# handle unions
@overload
def notna(obj: NDFrameT | ArrayLike | Index | list) -> NDFrameT | npt.NDArray[np.bool_]:
...


@overload
def notna(obj: object) -> bool | npt.NDArray[np.bool_] | NDFrame:
...


def notna(obj: object) -> bool | npt.NDArray[np.bool_] | NDFrame:
"""
Detect non-missing values for an array-like object.

Expand Down Expand Up @@ -362,7 +430,7 @@ def notna(obj):
Name: 1, dtype: bool
"""
res = isna(obj)
if is_scalar(res):
if isinstance(res, bool):
return not res
return ~res

Expand Down
12 changes: 7 additions & 5 deletions pandas/core/window/ewm.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
import datetime
from functools import partial
from textwrap import dedent
from typing import TYPE_CHECKING
from typing import (
TYPE_CHECKING,
cast,
)
import warnings

import numpy as np
Expand Down Expand Up @@ -380,12 +383,11 @@ def __init__(
FutureWarning,
stacklevel=find_stack_level(),
)
self.times = self._selected_obj[self.times]
# self.times cannot be str anymore
self.times = cast("Series", self._selected_obj[self.times])
if not is_datetime64_ns_dtype(self.times):
raise ValueError("times must be datetime64[ns] dtype.")
# error: Argument 1 to "len" has incompatible type "Union[str, ndarray,
# NDFrameT, None]"; expected "Sized"
if len(self.times) != len(obj): # type: ignore[arg-type]
if len(self.times) != len(obj):
raise ValueError("times must be the same length as the object.")
if not isinstance(self.halflife, (str, datetime.timedelta)):
raise ValueError(
Expand Down
8 changes: 5 additions & 3 deletions pandas/io/parsers/base_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -855,9 +855,11 @@ def _check_data_length(
data: list of array-likes containing the data column-wise.
"""
if not self.index_col and len(columns) != len(data) and columns:
if len(columns) == len(data) - 1 and np.all(
(is_object_dtype(data[-1]) and data[-1] == "") | isna(data[-1])
):
empty_str = is_object_dtype(data[-1]) and data[-1] == ""
# error: No overload variant of "__ror__" of "ndarray" matches
# argument type "ExtensionArray"
empty_str_or_na = empty_str | isna(data[-1]) # type: ignore[operator]
if len(columns) == len(data) - 1 and np.all(empty_str_or_na):
return
warnings.warn(
"Length of header or names does not match length of data. This leads "
Expand Down