Skip to content

TYP: misc typing in core\indexes\base.py #35991

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 2, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1772,13 +1772,13 @@ def from_records(
arrays = [data[k] for k in columns]
else:
arrays = []
arr_columns = []
arr_columns_list = []
for k, v in data.items():
if k in columns:
arr_columns.append(k)
arr_columns_list.append(k)
arrays.append(v)

arrays, arr_columns = reorder_arrays(arrays, arr_columns, columns)
arrays, arr_columns = reorder_arrays(arrays, arr_columns_list, columns)

elif isinstance(data, (np.ndarray, DataFrame)):
arrays, columns = to_arrays(data, columns)
Expand Down
51 changes: 40 additions & 11 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
Hashable,
List,
Optional,
Sequence,
TypeVar,
Union,
)
import warnings
Expand All @@ -22,7 +24,7 @@
from pandas._libs.tslibs import OutOfBoundsDatetime, Timestamp
from pandas._libs.tslibs.period import IncompatibleFrequency
from pandas._libs.tslibs.timezones import tz_compare
from pandas._typing import DtypeObj, Label
from pandas._typing import AnyArrayLike, Dtype, DtypeObj, Label
from pandas.compat import set_function_name
from pandas.compat.numpy import function as nv
from pandas.errors import InvalidIndexError
Expand Down Expand Up @@ -98,7 +100,7 @@
)

if TYPE_CHECKING:
from pandas import Series
from pandas import RangeIndex, Series


__all__ = ["Index"]
Expand Down Expand Up @@ -188,6 +190,9 @@ def _new_Index(cls, d):
return cls.__new__(cls, **d)


_IndexT = TypeVar("_IndexT", bound="Index")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment before this (and in other cases where we use TypeVar like this); also pls confirm that the ref is consistent (e.g. FrameOrSeries is different )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR lmk explictly what you want to name the TypeVar otherwise i'll leave it as is for now pending further discussion.


also pls confirm that the ref is consistent

we are far from consistent, the use of TypeVar outside of pandas._typing is

_T = TypeVar("_T", bound="NDArrayBackedExtensionArray") added in #33660

DatetimeLikeArrayT = TypeVar("DatetimeLikeArrayT", bound="DatetimeLikeArrayMixin") added in #33706

BaseMaskedArrayT = TypeVar("BaseMaskedArrayT", bound="BaseMaskedArray") added in #31728

_T = TypeVar("_T", bound="BaseExprVisitor") added in #31365

ScalarResult = TypeVar("ScalarResult")

OutputFrameOrSeries = TypeVar("OutputFrameOrSeries", bound=NDFrame) added in #33286

_T = TypeVar("_T", bound="DatetimeIndexOpsMixin") added in #33839

T = TypeVar("T", bound="BlockManager") added in #32421

DatetimeScalar = TypeVar("DatetimeScalar", Scalar, datetime)

and a couple of uses of
_KT = TypeVar("_KT")
_VT = TypeVar("_VT")

my preference is _<classname>T, i.e. leading underscore followed by the class name passed as the bound argument followed by a uppercase T to indicate TypeVar. (of course where a union is used instead of a bound this allows for more imaginative naming)

In pandas._typing, the TypeVars are imported by other modules, so we don't use leading underscores

see also https://github.com/numpy/numpy/blob/3fbc84a5662ffd985a071b0bbdcd59e655041ad3/numpy/__init__.pyi for other ideas on naming.

we could add Self suffix instead of T for TypeVars used to preserve return types or we could drop the T altogether. so using that naming convention, we would change the _IndexT added here to _IndexSelf since this TypeVar is used to maintain the return type of .copy()

for abstract/base classes we could add SubClass suffix (so FrameOrSeries could be NDFrameSubClass)

e.g. FrameOrSeries is different

FrameOrSeries was originally

FrameOrSeries = TypeVar("FrameOrSeries", "Series", "DataFrame")

before being changed in #28173 to

FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")

can you add a comment before this (and in other cases where we use TypeVar like this)

TypeVar is a fundamental building block[1] of typing and if we are consistent with the naming, additional comments explaining fundamental use of typing shouldn't be necessary.

[1] from https://www.python.org/dev/peps/pep-0484/

Fundamental building blocks:

  • Any, used as def get(key: str) -> Any: ...
  • Union, used as Union[Type1, Type2, Type3]
  • Callable, used as Callable[[Arg1Type, Arg2Type], ReturnType]
  • Tuple, used by listing the element types, for example Tuple[int, int, str]. The empty tuple can be typed as Tuple[()]. Arbitrary-length homogeneous tuples can be expressed using one type and ellipsis, for example Tuple[int, ...]. (The ... here are part of the syntax, a literal ellipsis.)
  • TypeVar, used as X = TypeVar('X', Type1, Type2, Type3) or simply Y = TypeVar('Y') (see above for more details)
  • Generic, used to create user-defined generic classes
  • Type, used to annotate class objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah as long as consistent doesn't matter much, IndexT looks good to me: importable (no leading _), not too crazy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously can do this in a dedicated PR



class Index(IndexOpsMixin, PandasObject):
"""
Immutable ndarray implementing an ordered, sliceable set. The basic object
Expand Down Expand Up @@ -787,7 +792,13 @@ def repeat(self, repeats, axis=None):
# --------------------------------------------------------------------
# Copying Methods

def copy(self, name=None, deep=False, dtype=None, names=None):
def copy(
self: _IndexT,
name: Optional[Label] = None,
deep: bool = False,
dtype: Optional[Dtype] = None,
names: Optional[Sequence[Label]] = None,
) -> _IndexT:
"""
Make a copy of this object.

Expand Down Expand Up @@ -949,10 +960,9 @@ def _format_with_header(
# could have nans
mask = isna(values)
if mask.any():
result = np.array(result)
result[mask] = na_rep
# error: "List[str]" has no attribute "tolist"
result = result.tolist() # type: ignore[attr-defined]
result_arr = np.array(result)
result_arr[mask] = na_rep
result = result_arr.tolist()
else:
result = trim_front(format_array(values, None, justify="left"))
return header + result
Expand Down Expand Up @@ -4913,7 +4923,13 @@ def _get_string_slice(self, key: str_t, use_lhs: bool = True, use_rhs: bool = Tr
# overridden in DatetimeIndex, TimedeltaIndex and PeriodIndex
raise NotImplementedError

def slice_indexer(self, start=None, end=None, step=None, kind=None):
def slice_indexer(
self,
start: Optional[Label] = None,
end: Optional[Label] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt Optional redundant here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've got no_implicit_optional=True in out setup.cfg (but I don't know why?)

it would work, but only since we have Label = Optional[Hashable] to allow None to be a Label. Label would be easier to grok if it were Union[Hashable, None] (or if Hashable included None) but for consistency we use Optional.

Here, for consistency of keyword parameter annotations, don't really want to remove the Optional, even though already accounted for in Label.

but I'll change if blocker.

step: Optional[int] = None,
kind: Optional[str_t] = None,
) -> slice:
"""
Compute the slice indexer for input labels and step.

Expand Down Expand Up @@ -5513,7 +5529,9 @@ def ensure_index_from_sequences(sequences, names=None):
return MultiIndex.from_arrays(sequences, names=names)


def ensure_index(index_like, copy: bool = False):
def ensure_index(
index_like: Union[AnyArrayLike, Sequence], copy: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Sequence not subsume ArrayLike?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #28770

) -> Index:
"""
Ensure that we have an index from some index-like object.

Expand Down Expand Up @@ -5549,7 +5567,18 @@ def ensure_index(index_like, copy: bool = False):
index_like = index_like.copy()
return index_like
if hasattr(index_like, "name"):
return Index(index_like, name=index_like.name, copy=copy)
# https://github.com/python/mypy/issues/1424
# error: Item "ExtensionArray" of "Union[ExtensionArray,
# Sequence[Any]]" has no attribute "name" [union-attr]
# error: Item "Sequence[Any]" of "Union[ExtensionArray, Sequence[Any]]"
# has no attribute "name" [union-attr]
# error: "Sequence[Any]" has no attribute "name" [attr-defined]
# error: Item "Sequence[Any]" of "Union[Series, Sequence[Any]]" has no
# attribute "name" [union-attr]
# error: Item "Sequence[Any]" of "Union[Any, Sequence[Any]]" has no
# attribute "name" [union-attr]
name = index_like.name # type: ignore[union-attr, attr-defined]
return Index(index_like, name=name, copy=copy)

if is_iterator(index_like):
index_like = list(index_like)
Expand Down Expand Up @@ -5604,7 +5633,7 @@ def _validate_join_method(method: str):
raise ValueError(f"do not recognize join method {method}")


def default_index(n):
def default_index(n: int) -> "RangeIndex":
from pandas.core.indexes.range import RangeIndex

return RangeIndex(0, n, name=None)
Expand Down
6 changes: 5 additions & 1 deletion pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
""" define the IntervalIndex """
from operator import le, lt
import textwrap
from typing import Any, List, Optional, Tuple, Union
from typing import TYPE_CHECKING, Any, List, Optional, Tuple, Union, cast

import numpy as np

Expand Down Expand Up @@ -56,6 +56,9 @@
from pandas.core.indexes.timedeltas import TimedeltaIndex, timedelta_range
from pandas.core.ops import get_op_result_name

if TYPE_CHECKING:
from pandas import CategoricalIndex

_VALID_CLOSED = {"left", "right", "both", "neither"}
_index_doc_kwargs = dict(ibase._index_doc_kwargs)

Expand Down Expand Up @@ -786,6 +789,7 @@ def get_indexer(
right_indexer = self.right.get_indexer(target_as_index.right)
indexer = np.where(left_indexer == right_indexer, left_indexer, -1)
elif is_categorical_dtype(target_as_index.dtype):
target_as_index = cast("CategoricalIndex", target_as_index)
# get an indexer for unique categories then propagate to codes via take_1d
categories_indexer = self.get_indexer(target_as_index.categories)
indexer = take_1d(categories_indexer, target_as_index.codes, fill_value=-1)
Expand Down