Skip to content

TYP: Arraylike #31574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
54f2c52
TYP: Arraylike
jbrockmendel Jan 31, 2020
b339215
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Feb 2, 2020
60567d2
mypy fixup
simonjayhawkins Feb 2, 2020
1ec8da0
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Mar 19, 2020
f01f4b1
mypy/isort fixup
simonjayhawkins Mar 19, 2020
7212b5a
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Apr 10, 2020
5f5231d
mypy fixup
simonjayhawkins Apr 10, 2020
42a178f
revert add itemsize to EA interface
simonjayhawkins Apr 10, 2020
a6cc70c
revert changes to pytables
simonjayhawkins Apr 10, 2020
ead6793
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Apr 10, 2020
9f03c51
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Apr 10, 2020
3db3dca
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Apr 11, 2020
f7fca6e
unsed classes following merge upsteam
simonjayhawkins Apr 11, 2020
c987bc2
whitespace
simonjayhawkins Apr 11, 2020
384ac32
troubleshoot test failure
simonjayhawkins Apr 11, 2020
6984eaf
restore replace lambda
simonjayhawkins Apr 11, 2020
c0772a2
AssertionError -> TypeError
simonjayhawkins Apr 11, 2020
074bbd4
more test changes
simonjayhawkins Apr 11, 2020
f6db38f
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Apr 22, 2020
0b66334
revert changes to 'debugging' asserts that were tested
simonjayhawkins Apr 22, 2020
cdb39f2
fixup tests
simonjayhawkins Apr 22, 2020
1ba670d
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Apr 24, 2020
ff2ff9c
Merge remote-tracking branch 'upstream/master' into arraylike
simonjayhawkins Apr 27, 2020
3d6f7a6
fix doc failures
simonjayhawkins Apr 27, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ class ExtensionArray:
nbytes
ndim
shape
size

Methods
-------
Expand Down Expand Up @@ -439,7 +440,7 @@ def isna(self) -> ArrayLike:

Returns
-------
na_values : Union[np.ndarray, ExtensionArray]
na_values : np.ndarray or ExtensionArray
In most cases, this should return a NumPy ndarray. For
exceptional cases like ``SparseArray``, where returning
an ndarray would be expensive, an ExtensionArray may be
Expand Down
12 changes: 8 additions & 4 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@

import builtins
import textwrap
from typing import Any, Dict, FrozenSet, List, Optional, Union
from typing import Any, Dict, FrozenSet, Generic, List, Optional

import numpy as np

import pandas._libs.lib as lib
from pandas._typing import ArrayLike
from pandas.compat import PYPY
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
Expand Down Expand Up @@ -584,7 +585,7 @@ def _shallow_copy(self, obj, **kwargs):
return self._constructor(obj, **kwargs)


class IndexOpsMixin:
class IndexOpsMixin(Generic[ArrayLike]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I wouldn't really consider a Mixin to be a Generic class - have you seen this pattern used elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I wouldn't really consider a Mixin to be a Generic class

see errors in OP of #31518

pandas/core/indexes/base.py:246: error: Type variable "pandas._typing.ArrayLike" is unbound
pandas/core/indexes/base.py:246: note: (Hint: Use "Generic[ArrayLike]" or "Protocol[ArrayLike]" base class to bind "ArrayLike" inside a class)
pandas/core/indexes/base.py:246: note: (Hint: Use "ArrayLike" in function signature to bind "ArrayLike" inside a function)

The goal here is to use the ArrayLike alias from pandas._typing to replace uses of Union[ExtensionArray, np.ndarray]. ArrayLike is a typevar and hence mypy reports errors if the typevar is not bound.

In Index we want the return type of _values to be the same type as self._data. The way to bind typevars within a class is to use Generic.

More value would come from having the correct type available in calling code instead of a Union type to reduce casts, asserts and type ignores to perform narrowing.

I'll mark this as draft till this is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. So my question was more around the general approach of dealing with Mixins here. The mypy suggested approach I think is using Protocol:

https://mypy.readthedocs.io/en/stable/more_types.html#mixin-classes

So just curious if you found this approach elsewhere in the docs or are using another project as a precedent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mypy suggested approach I think is using Protocol:

https://mypy.readthedocs.io/en/stable/more_types.html#mixin-classes

from those docs

one can define a protocol that defines common functionality for host classes instead of adding required abstract methods to every mixin

At the moment _values has already been added (not in this PR) as an abstract method on L680 so changing to Protocol is orthogonal to this PR.

PEP 544 states

Programmers are free to not use them even if they use type annotations.
There is no intent to make protocols non-optional in the future.

we can have the discussion regarding the use of protocol and the migration path in #33274

This PR is about binding Typvars.

In common with the Generic abc, Protocol is generic and would still need the typevar in the (Protocol) definition class...
(example from PEP 544)

class Box(Protocol[T_co]):
def content(self) -> T_co:
...

so if we adopt Protocols in the future we would use SomeClassWithJustTheAbstractMethods[Protocol[ArrayLike]] instead of IndexOpsMixin(Generic[ArrayLike]) with the abstract methods included.

So my question was more around the general approach of dealing with Mixins here

hopefully answered as not relevant.

So just curious if you found this approach elsewhere in the docs or are using another project as a precedent

following advice from mypy pandas/core/indexes/base.py:246: note: (Hint: Use "Generic[ArrayLike]" or "Protocol[ArrayLike]" base class to bind "ArrayLike" inside a class)

"""
Common ops mixin to support a unified interface / docs for Series / Index
"""
Expand All @@ -596,7 +597,7 @@ class IndexOpsMixin:
)

@property
def _values(self) -> Union[ExtensionArray, np.ndarray]:
def _values(self) -> ArrayLike:
# must be defined here as a property for mypy
raise AbstractMethodError(self)

Expand Down Expand Up @@ -1141,7 +1142,10 @@ def _map_values(self, mapper, na_action=None):
values = self._values
if na_action is not None:
raise NotImplementedError
map_f = lambda values, f: values.map(f)

def map_f(values, f):
return values.map(f)

Comment on lines +1145 to +1148
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is changed to silence pandas/core/base.py:1151: note: Internal mypy error checking function redefinition

else:
values = self.astype(object)
values = getattr(values, "values", values)
Expand Down
8 changes: 4 additions & 4 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from pandas._libs.tslibs import OutOfBoundsDatetime, Timestamp
from pandas._libs.tslibs.period import IncompatibleFrequency
from pandas._libs.tslibs.timezones import tz_compare
from pandas._typing import Label
from pandas._typing import ArrayLike, Label
from pandas.compat import set_function_name
from pandas.compat.numpy import function as nv
from pandas.util._decorators import Appender, Substitution, cache_readonly, doc
Expand Down Expand Up @@ -183,7 +183,7 @@ def _new_Index(cls, d):
return cls.__new__(cls, **d)


class Index(IndexOpsMixin, PandasObject):
class Index(IndexOpsMixin[ArrayLike], PandasObject):
"""
Immutable ndarray implementing an ordered, sliceable set. The basic object
storing axis labels for all pandas objects.
Expand Down Expand Up @@ -255,7 +255,7 @@ def _outer_indexer(self, left, right):
return libjoin.outer_join_indexer(left, right)

_typ = "index"
_data: Union[ExtensionArray, np.ndarray]
_data: ArrayLike
_id = None
_name: Label = None
# MultiIndex.levels previously allowed setting the index name. We
Expand Down Expand Up @@ -3855,7 +3855,7 @@ def array(self) -> ExtensionArray:
return array

@property
def _values(self) -> Union[ExtensionArray, np.ndarray]:
def _values(self) -> ArrayLike:
"""
The best array representation.

Expand Down
5 changes: 4 additions & 1 deletion pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,10 @@ def _simple_new(cls, values: PeriodArray, name: Label = None):
Values that can be converted to a PeriodArray without inference
or coercion.
"""
assert isinstance(values, PeriodArray), type(values)
if not isinstance(values, PeriodArray):
raise TypeError(
f"_simple_new expects PeriodArray, got {type(values).__name__}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an assertion is correct here since this is internal and annotated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an assertion is correct here since this is internal and annotated

we have tests to test these.


result = object.__new__(cls)
result._data = values
Expand Down
18 changes: 13 additions & 5 deletions pandas/tests/indexes/period/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,9 @@ def test_constructor_mixed(self):
def test_constructor_simple_new(self):
idx = period_range("2007-01", name="p", periods=2, freq="M")

with pytest.raises(AssertionError, match="<class .*PeriodIndex'>"):
with pytest.raises(
TypeError, match="_simple_new expects PeriodArray, got PeriodIndex"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: i prefer msg = "_simple_new ..." on the previous line so the with pytest.raises ` can be a 1-liner. not a big deal

):
idx._simple_new(idx, name="p")

result = idx._simple_new(idx._data, name="p")
Expand All @@ -339,15 +341,21 @@ def test_constructor_simple_new(self):
def test_constructor_simple_new_empty(self):
# GH13079
idx = PeriodIndex([], freq="M", name="p")
with pytest.raises(AssertionError, match="<class .*PeriodIndex'>"):
with pytest.raises(
TypeError, match="_simple_new expects PeriodArray, got PeriodIndex"
):
idx._simple_new(idx, name="p")

result = idx._simple_new(idx._data, name="p")
tm.assert_index_equal(result, idx)

@pytest.mark.parametrize("floats", [[1.1, 2.1], np.array([1.1, 2.1])])
def test_constructor_floats(self, floats):
with pytest.raises(AssertionError, match="<class "):
@pytest.mark.parametrize(
"floats,box", [([1.1, 2.1], "list"), (np.array([1.1, 2.1]), "ndarray")]
)
def test_constructor_floats(self, floats, box):
with pytest.raises(
TypeError, match=f"_simple_new expects PeriodArray, got {box}"
):
PeriodIndex._simple_new(floats)

msg = "PeriodIndex does not allow floating point in construction"
Expand Down
8 changes: 6 additions & 2 deletions pandas/tests/indexes/period/test_period.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,16 @@ def test_shallow_copy_empty(self):
def test_shallow_copy_disallow_i8(self):
# GH-24391
pi = period_range("2018-01-01", periods=3, freq="2D")
with pytest.raises(AssertionError, match="ndarray"):
with pytest.raises(
TypeError, match="_simple_new expects PeriodArray, got ndarray"
):
pi._shallow_copy(pi.asi8)

def test_shallow_copy_requires_disallow_period_index(self):
pi = period_range("2018-01-01", periods=3, freq="2D")
with pytest.raises(AssertionError, match="PeriodIndex"):
with pytest.raises(
TypeError, match="_simple_new expects PeriodArray, got PeriodIndex"
):
pi._shallow_copy(pi)

def test_view_asi8(self):
Expand Down