Skip to content

GH1180 Clean up all/any methods for Series and DataFrame #1188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 17, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions pandas-stubs/core/frame.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -1660,7 +1660,8 @@ class DataFrame(NDFrame, OpsMixin, _GetItemHack):
bool_only: _bool | None = ...,
skipna: _bool = ...,
**kwargs: Any,
) -> _bool: ...
) -> np.bool: ...
# FIXME the type below is not correct, should be pd.Series[np.bool]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I look at it is that we are using Series[bool] to correspond to whatever bool is stored inside - being a python one, a numpy one, or even if we use BooleanDtype, so I don't think this comment is necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the type checker will not accept that, because pd.Series can only be subscribed with S1 which contains bool (the generic boolean from python) but not np.bool.
Yet pd.DataFrame.any will return at runtime pd.Series[np.bool] (but this is not accepted since np.bool is not a subtype of S1).
Let me know if this is not clear,
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type returned from DataFrame.any() and DataFrame.all() should be Series[_bool] . Then the tests should use that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue is that _bool does not contain np.bool which is the type we get at runtime. Happy to keep the stubs as is but that would mean that runtime type does not align with static type.
Or we can add np.bool to _bool, open to suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters. We have checks like this:

check(assert_type(df.any(), "pd.Series[bool]"), pd.Series, np.bool_)

So even though np.bool_ is in the Series, we call the type of the Series[bool]

It's similar to this:

    check(assert_type(df.value_counts(), "pd.Series[int]"), pd.Series, np.integer)

What's inside the series are numpy integers, but we call that Series[int]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see your point, let me fix the comments, thanks for the more detailed vision on this issue!

@overload
def all(
self,
Expand All @@ -1677,7 +1678,8 @@ class DataFrame(NDFrame, OpsMixin, _GetItemHack):
bool_only: _bool | None = ...,
skipna: _bool = ...,
**kwargs: Any,
) -> _bool: ...
) -> np.bool: ...
# FIXME the type below is not correct, should be pd.Series[np.bool]
@overload
def any(
self,
Expand Down
4 changes: 2 additions & 2 deletions pandas-stubs/core/series.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -1744,15 +1744,15 @@ class Series(IndexOpsMixin[S1], NDFrame):
bool_only: _bool | None = ...,
skipna: _bool = ...,
**kwargs: Any,
) -> _bool: ...
) -> np.bool: ...
def any(
self,
*,
axis: AxisIndex = ...,
bool_only: _bool | None = ...,
skipna: _bool = ...,
**kwargs: Any,
) -> _bool: ...
) -> np.bool: ...
def cummax(
self,
axis: AxisIndex | None = ...,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ mypy = "1.15.0"
pandas = "2.2.3"
pyarrow = ">=10.0.1"
pytest = ">=7.1.2"
pyright = ">=1.1.396,!=1.1.398"
pyright = "1.1.397"
poethepoet = ">=0.16.5"
loguru = ">=0.6.0"
typing-extensions = ">=4.4.0"
Expand Down
4 changes: 2 additions & 2 deletions tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,13 +120,13 @@ def test_types_init() -> None:
def test_types_all() -> None:
df = pd.DataFrame([[False, True], [False, False]], columns=["col1", "col2"])
check(assert_type(df.all(), "pd.Series[bool]"), pd.Series, np.bool_)
check(assert_type(df.all(axis=None), bool), np.bool_)
check(assert_type(df.all(axis=None), np.bool_), np.bool_)


def test_types_any() -> None:
df = pd.DataFrame([[False, True], [False, False]], columns=["col1", "col2"])
check(assert_type(df.any(), "pd.Series[bool]"), pd.Series, np.bool_)
check(assert_type(df.any(axis=None), bool), np.bool_)
check(assert_type(df.any(axis=None), np.bool_), np.bool_)


def test_types_append() -> None:
Expand Down
12 changes: 6 additions & 6 deletions tests/test_series.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,15 +134,15 @@ def test_types_init() -> None:


def test_types_any() -> None:
check(assert_type(pd.Series([False, False]).any(), bool), np.bool_)
check(assert_type(pd.Series([False, False]).any(bool_only=False), bool), np.bool_)
check(assert_type(pd.Series([np.nan]).any(skipna=False), bool), np.bool_)
check(assert_type(pd.Series([False, False]).any(), np.bool), np.bool)
check(assert_type(pd.Series([False, False]).any(bool_only=False), np.bool), np.bool)
check(assert_type(pd.Series([np.nan]).any(skipna=False), np.bool), np.bool)


def test_types_all() -> None:
check(assert_type(pd.Series([False, False]).all(), bool), np.bool_)
check(assert_type(pd.Series([False, False]).all(bool_only=False), bool), np.bool_)
check(assert_type(pd.Series([np.nan]).all(skipna=False), bool), np.bool_)
check(assert_type(pd.Series([False, False]).all(), np.bool), np.bool)
check(assert_type(pd.Series([False, False]).all(bool_only=False), np.bool), np.bool)
check(assert_type(pd.Series([np.nan]).all(skipna=False), np.bool), np.bool)


def test_types_csv() -> None:
Expand Down