Skip to content

TYP: setter for index/columns property-like (AxisProperty) #46565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 10, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions pandas/_libs/properties.pyi
Original file line number Diff line number Diff line change
@@ -1,9 +1,28 @@
# pyright: reportIncompleteStub = false
from typing import Any
from typing import (
Sequence,
overload,
)

from pandas._typing import (
AnyArrayLike,
DataFrame,
Index,
Series,
Comment on lines +8 to +10
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an issue with mypy that can't resolve the types for the decorated methods that IIRC @jbrockmendel encountered before when adding the stubs for the Cython code initially. It seems to be related to the imports and putting them here seems prevent the problem resurfacing.

)

# note: this is a lie to make type checkers happy (they special
# case property). cache_readonly uses attribute names similar to
# property (fget) but it does not provide fset and fdel.
cache_readonly = property

def __getattr__(name: str) -> Any: ... # incomplete
class AxisProperty:

axis: int
def __init__(self, axis: int = ..., doc: str = ...) -> None: ...
@overload
def __get__(self, obj: DataFrame | Series, type) -> Index: ...
@overload
def __get__(self, obj: None, type) -> AxisProperty: ...
def __set__(
self, obj: DataFrame | Series, value: AnyArrayLike | Sequence
Copy link
Member Author

@simonjayhawkins simonjayhawkins Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type annotation for value chosen just to match _set_axis which is added to Series to match the existing ensure_index annotation and changed on NDFrame for consistency.

Copy link
Member

@twoertwein twoertwein Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea, might be not worth it to enforce that obj is compatible with value.

IndexCompatibleT = TypeVar("IndexCompatibleT", AnyArrayLike, Sequence)

class Has_set_axis(Protocol[IndexCompatibleT]):
    def _set_axis(self: NDFrame, axis: int, value: IndexCompatibleT) -> object:
        ...


def __set__(self, obj: Has_set_axis[IndexCompatibleT], value: IndexCompatibleT) -> None:
    ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the other response, I think that only NDFrame objects use AxisProperty and we tend to avoid NDFrame in annotations that would be part of the public API since NDFrame is not public, but obviously need to use it in some circumstances.

Until the other issue is resolved, I'm not sure I can comment on whether we would be better off using a protocol here instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be fair, obj would probably not surface in the public api, only value would.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the index/column properties are only actually added to DataFrame/Series in the codebase.

In theory they could be added to any NDFrame object and I suppose we could refactor the code to add the AxisProperty dynamically in the base class based on the other NDFrame attributes, i.e. we know the axis numbers and names.

Having said that, i'm -1 on doing this dynamically and therefore happy with the DataFrame|Series to match the actual code.

) -> None: ...
12 changes: 10 additions & 2 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,8 +234,12 @@ def transform(self) -> DataFrame | Series:
and not obj.empty
):
raise ValueError("Transform function failed")
# error: Argument 1 to "__get__" of "AxisProperty" has incompatible type
# "Union[Series, DataFrame, GroupBy[Any], SeriesGroupBy,
# DataFrameGroupBy, BaseWindow, Resampler]"; expected "Union[DataFrame,
# Series]"
if not isinstance(result, (ABCSeries, ABCDataFrame)) or not result.index.equals(
obj.index
obj.index # type:ignore[arg-type]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not investigated this one yet. maybe fix later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is AxisProperty used by all NDFrame? That might cover most/all objects in this union.

A slightly (too) complex approach would be to use a Protocol for __get__: only objects that have _mgr and only _mgr that have _mgr.axes where axes is Sequence[Index].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that all those objects are valid e.g. AttributeError: 'DataFrameGroupBy' object has no attribute 'index'

I assume that the condition should be narrowing to DataFrame/Series from the error message.

):
raise ValueError("Function did not transform")

Expand Down Expand Up @@ -639,7 +643,11 @@ class NDFrameApply(Apply):

@property
def index(self) -> Index:
return self.obj.index
# error: Argument 1 to "__get__" of "AxisProperty" has incompatible type
# "Union[Series, DataFrame, GroupBy[Any], SeriesGroupBy,
# DataFrameGroupBy, BaseWindow, Resampler]"; expected "Union[DataFrame,
# Series]"
return self.obj.index # type:ignore[arg-type]

@property
def agg_axis(self) -> Index:
Expand Down
6 changes: 2 additions & 4 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -11039,12 +11039,10 @@ def isin(self, values) -> DataFrame:
_info_axis_number = 1
_info_axis_name = "columns"

index: Index = properties.AxisProperty(
index = properties.AxisProperty(
axis=1, doc="The index (row labels) of the DataFrame."
)
columns: Index = properties.AxisProperty(
axis=0, doc="The column labels of the DataFrame."
)
columns = properties.AxisProperty(axis=0, doc="The column labels of the DataFrame.")

@property
def _AXIS_NUMBERS(self) -> dict[str, int]:
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
to_offset,
)
from pandas._typing import (
AnyArrayLike,
ArrayLike,
Axis,
CompressionOptions,
Expand Down Expand Up @@ -754,7 +755,7 @@ def _set_axis_nocheck(self, labels, axis: Axis, inplace: bool_t):
obj.set_axis(labels, axis=axis, inplace=True)
return obj

def _set_axis(self, axis: int, labels: Index) -> None:
def _set_axis(self, axis: int, labels: AnyArrayLike | Sequence) -> None:
labels = ensure_index(labels)
self._mgr.set_axis(axis, labels)
self._clear_item_cache()
Expand Down
6 changes: 3 additions & 3 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -270,9 +270,9 @@ def aggregate(self, func=None, *args, engine=None, engine_kwargs=None, **kwargs)
func = maybe_mangle_lambdas(func)
ret = self._aggregate_multiple_funcs(func)
if relabeling:
# error: Incompatible types in assignment (expression has type
# "Optional[List[str]]", variable has type "Index")
ret.columns = columns # type: ignore[assignment]
# columns is not narrowed by mypy from relabeling flag
assert columns is not None # for mypy
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet ran the tests locally to check we don't get failures but it appears relabeling and columns are not independent variables, which mypy cannot understand and we should try to avoid. Not planning to refactor at this time.

ret.columns = columns
return ret

else:
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
from pandas._libs.lib import no_default
from pandas._typing import (
AggFuncType,
AnyArrayLike,
ArrayLike,
Axis,
Dtype,
Expand Down Expand Up @@ -552,7 +553,7 @@ def _constructor_expanddim(self) -> Callable[..., DataFrame]:
def _can_hold_na(self) -> bool:
return self._mgr._can_hold_na

def _set_axis(self, axis: int, labels) -> None:
def _set_axis(self, axis: int, labels: AnyArrayLike | Sequence) -> None:
"""
Override generic, we want to set the _typ here.

Expand Down Expand Up @@ -5727,7 +5728,7 @@ def mask(
_info_axis_number = 0
_info_axis_name = "index"

index: Index = properties.AxisProperty(
index = properties.AxisProperty(
axis=0, doc="The index (axis labels) of the Series."
)

Expand Down
7 changes: 1 addition & 6 deletions pandas/io/parsers/arrow_parser_wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,12 +105,7 @@ def _finalize_output(self, frame: DataFrame) -> DataFrame:
multi_index_named = False
frame.columns = self.names
# we only need the frame not the names
# error: Incompatible types in assignment (expression has type
# "Union[List[Union[Union[str, int, float, bool], Union[Period, Timestamp,
# Timedelta, Any]]], Index]", variable has type "Index") [assignment]
frame.columns, frame = self._do_date_conversions( # type: ignore[assignment]
frame.columns, frame
)
frame.columns, frame = self._do_date_conversions(frame.columns, frame)
if self.index_col is not None:
for i, item in enumerate(self.index_col):
if is_integer(item):
Expand Down