Skip to content

Removed ABCs from pandas._typing #27424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 24, 2019
Merged
47 changes: 25 additions & 22 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,37 @@
from pathlib import Path
from typing import IO, AnyStr, TypeVar, Union
from typing import IO, TYPE_CHECKING, AnyStr, TypeVar, Union

import numpy as np

from pandas._libs import Timestamp
from pandas._libs.tslibs.period import Period
from pandas._libs.tslibs.timedeltas import Timedelta
# To prevent import cycles place any internal imports in the branch below
# and use a string literal forward reference to it in subsequent types
# https://mypy.readthedocs.io/en/latest/common_issues.html#import-cycles
if TYPE_CHECKING:
from pandas._libs import Period, Timedelta, Timestamp

from pandas.core.arrays.base import ExtensionArray
from pandas.core.dtypes.dtypes import ExtensionDtype
from pandas.core.dtypes.generic import (
ABCDataFrame,
ABCExtensionArray,
ABCIndexClass,
ABCSeries,
ABCSparseSeries,
)
from pandas.core.indexes.base import Index
from pandas.core.frame import DataFrame
from pandas.core.series import Series
from pandas.core.sparse.series import SparseSeries

from pandas.core.dtypes.dtypes import ExtensionDtype
from pandas.core.dtypes.generic import (
ABCDataFrame,
ABCExtensionArray,
ABCIndexClass,
ABCSeries,
ABCSparseSeries,
)

AnyArrayLike = TypeVar(
"AnyArrayLike",
ABCExtensionArray,
ABCIndexClass,
ABCSeries,
ABCSparseSeries,
np.ndarray,
"AnyArrayLike", "ExtensionArray", "Index", "Series", "SparseSeries", np.ndarray
)
ArrayLike = TypeVar("ArrayLike", ABCExtensionArray, np.ndarray)
DatetimeLikeScalar = TypeVar("DatetimeLikeScalar", Period, Timestamp, Timedelta)
Dtype = Union[str, np.dtype, ExtensionDtype]
ArrayLike = TypeVar("ArrayLike", "ExtensionArray", np.ndarray)
DatetimeLikeScalar = TypeVar("DatetimeLikeScalar", "Period", "Timestamp", "Timedelta")
Dtype = Union[str, np.dtype, "ExtensionDtype"]
FilePathOrBuffer = Union[str, Path, IO[AnyStr]]

FrameOrSeries = TypeVar("FrameOrSeries", ABCSeries, ABCDataFrame)
FrameOrSeries = TypeVar("FrameOrSeries", "Series", "DataFrame")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FrameOrSeries = TypeVar("FrameOrSeries", "Series", "DataFrame")
FrameOrSeries = Union["Series", "DataFrame"]

quote from https://mypy.readthedocs.io/en/latest/generics.html...
"User-defined generics are a moderately advanced feature and you can get far without ever using them..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks but this just loosens the type system rather than actually fixing anything. TypeVar is going to be generally more useful for checking functions that can be fully generic in nature.

Might just change the return of this one and see how many others require Union in the future

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Union[Series, DataFrame] might be better written as NDFrame anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the "user-defined generics" you are referring to are more applicable to containers not TypeVars. Right now we just use a blanket Series as a return object, though in the future we could do something like Series[int] and Series[str], etc...; the Series would be the user-defined generic in that case

The TypeVar in the docs you linked is just a way of parametrizing that user-defined generic, so that a Series[int] would have to stay as a Series[int] through it's lifecycle; without that parametrization we allow Series[int] to become Series[str] without any complaints from mypy today

We are probably a ways off of doing user-defined generics but this is great that you looked into it. Certainly open to ideas on that front if you think of a good way to implement as we get more familiar with these annotations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Union[Series, DataFrame] might be better written as NDFrame anyway?

Hmm that would work though we don't typically import NDFrame anywhere so I don't think want to start here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would leave as FrameOrSeries as its more descriptive

Scalar = Union[str, int, float]
Axis = Union[str, int]
5 changes: 3 additions & 2 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,12 +167,13 @@ def ensure_int_or_float(arr: ArrayLike, copy=False) -> np.array:
If the array is explicitly of type uint64 the type
will remain unchanged.
"""
# TODO: GH27506 potential bug with ExtensionArrays
try:
return arr.astype("int64", copy=copy, casting="safe")
return arr.astype("int64", copy=copy, casting="safe") # type: ignore
except TypeError:
pass
try:
return arr.astype("uint64", copy=copy, casting="safe")
return arr.astype("uint64", copy=copy, casting="safe") # type: ignore
except TypeError:
return arr.astype("float64", copy=copy)

Expand Down
55 changes: 31 additions & 24 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -906,35 +906,35 @@ def get_indexer(
)
raise InvalidIndexError(msg)

target = ensure_index(target)
target_as_index = ensure_index(target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you also make AnyArrayLike an Alias instead of a TypeVar, then this module doesn't need to be changed at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing though - this just loosens the type checking which isn't desired. Actually moved towards TypeVars for reasons described in #26453 (comment)

Might update #27050 to include some of that info

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually moved towards TypeVars for reasons described in #26453 (comment)

i don't think that applies here.

target does not need to be a TypeVar, the type of target does not need to be maintained.

target can be any of ["ExtensionArray", "Index", "Series", "SparseSeries", np.ndarray]

ensure_index is not yet typed. but presumably will have a return type of Index.

so reassigning with an Index to a variable that has a type of Union of ["ExtensionArray", "Index", "Series", "SparseSeries", np.ndarray] is perfectly valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add Union alternatives for each TypeVar in the central module but I think that confounds the point of generic programming and/or makes our type system weaker. Another option would be to allow redefinition of variables which mypy supplies a setting for:

https://mypy.readthedocs.io/en/latest/command_line.html?highlight=allow-redefinition#miscellaneous-strictness-flags

But I also think that makes for a weaker type system, and generally there's not a lot of downside to creating a separate variable here instead of allowing it's type to implicitly be altered by the return of ensure_index

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to allow redefinition of variables which mypy supplies a setting for:

https://mypy.readthedocs.io/en/latest/command_line.html?highlight=allow-redefinition#miscellaneous-strictness-flags

But I also think that makes for a weaker type system

definitely. we should rule this out.

generally there's not a lot of downside to creating a separate variable here instead of allowing it's type to implicitly be altered by the return of ensure_index

disagree. the return type of ensure_index would need to conform to the type of target.

is creating new variables not weakening the type system?


if isinstance(target, IntervalIndex):
if isinstance(target_as_index, IntervalIndex):
# equal indexes -> 1:1 positional match
if self.equals(target):
if self.equals(target_as_index):
return np.arange(len(self), dtype="intp")

# different closed or incompatible subtype -> no matches
common_subtype = find_common_type(
[self.dtype.subtype, target.dtype.subtype]
[self.dtype.subtype, target_as_index.dtype.subtype]
)
if self.closed != target.closed or is_object_dtype(common_subtype):
return np.repeat(np.intp(-1), len(target))
if self.closed != target_as_index.closed or is_object_dtype(common_subtype):
return np.repeat(np.intp(-1), len(target_as_index))

# non-overlapping -> at most one match per interval in target
# non-overlapping -> at most one match per interval in target_as_index
# want exact matches -> need both left/right to match, so defer to
# left/right get_indexer, compare elementwise, equality -> match
left_indexer = self.left.get_indexer(target.left)
right_indexer = self.right.get_indexer(target.right)
left_indexer = self.left.get_indexer(target_as_index.left)
right_indexer = self.right.get_indexer(target_as_index.right)
indexer = np.where(left_indexer == right_indexer, left_indexer, -1)
elif not is_object_dtype(target):
elif not is_object_dtype(target_as_index):
# homogeneous scalar index: use IntervalTree
target = self._maybe_convert_i8(target)
indexer = self._engine.get_indexer(target.values)
target_as_index = self._maybe_convert_i8(target_as_index)
indexer = self._engine.get_indexer(target_as_index.values)
else:
# heterogeneous scalar index: defer elementwise to get_loc
# (non-overlapping so get_loc guarantees scalar of KeyError)
indexer = []
for key in target:
for key in target_as_index:
try:
loc = self.get_loc(key)
except KeyError:
Expand All @@ -947,21 +947,26 @@ def get_indexer(
def get_indexer_non_unique(
self, target: AnyArrayLike
) -> Tuple[np.ndarray, np.ndarray]:
target = ensure_index(target)
target_as_index = ensure_index(target)

# check that target IntervalIndex is compatible
if isinstance(target, IntervalIndex):
# check that target_as_index IntervalIndex is compatible
if isinstance(target_as_index, IntervalIndex):
common_subtype = find_common_type(
[self.dtype.subtype, target.dtype.subtype]
[self.dtype.subtype, target_as_index.dtype.subtype]
)
if self.closed != target.closed or is_object_dtype(common_subtype):
if self.closed != target_as_index.closed or is_object_dtype(common_subtype):
# different closed or incompatible subtype -> no matches
return np.repeat(-1, len(target)), np.arange(len(target))
return (
np.repeat(-1, len(target_as_index)),
np.arange(len(target_as_index)),
)

if is_object_dtype(target) or isinstance(target, IntervalIndex):
# target might contain intervals: defer elementwise to get_loc
if is_object_dtype(target_as_index) or isinstance(
target_as_index, IntervalIndex
):
# target_as_index might contain intervals: defer elementwise to get_loc
indexer, missing = [], []
for i, key in enumerate(target):
for i, key in enumerate(target_as_index):
try:
locs = self.get_loc(key)
if isinstance(locs, slice):
Expand All @@ -973,8 +978,10 @@ def get_indexer_non_unique(
indexer.append(locs)
indexer = np.concatenate(indexer)
else:
target = self._maybe_convert_i8(target)
indexer, missing = self._engine.get_indexer_non_unique(target.values)
target_as_index = self._maybe_convert_i8(target_as_index)
indexer, missing = self._engine.get_indexer_non_unique(
target_as_index.values
)

return ensure_platform_int(indexer), ensure_platform_int(missing)

Expand Down
5 changes: 4 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ exclude =
.eggs/*.py,
versioneer.py,
env # exclude asv benchmark environments from linting
pandas/_typing.py

[flake8-rst]
bootstrap =
Expand Down Expand Up @@ -77,7 +78,9 @@ filterwarnings =

[coverage:run]
branch = False
omit = */tests/*
omit =
*/tests/*
pandas/_typing.py
plugins = Cython.Coverage

[coverage:report]
Expand Down