Skip to content

REF: share _simple_new #37872

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 17, 2020
Merged

Conversation

jbrockmendel
Copy link
Member

@simonjayhawkins two questions: 1) is there a nice way to annotate the values arg (maybe a Protocol thing?) and 2) is the use of cls.__annotations__["_data"] an egregious antipattern?

@jreback jreback added Index Related to the Index class or subclasses Refactor Internal refactoring of code labels Nov 15, 2020
@jreback jreback added this to the 1.2 milestone Nov 15, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@simonjayhawkins
Copy link
Member

  1. is the use of cls.__annotations__["_data"] an egregious antipattern?

to begin with a quote from Fluent Python, p8

"The first thing to know about special methods is that they are meant to be called by the Python interpreter, and not by you"

using type annotations in a runtime context is not something we've discussed.

I think in general it could be fragile. we can only use with certain types so cannot now change cls._data for the subclasses without potentially breaking the code. So we wouldn't, for example, be able to makes Indexes generic, e.g. Categorical[str] if we used this pattern in other Indexes.

>>> from pandas.core.indexes.datetimelike import DatetimeIndexOpsMixin
>>>
>>> DatetimeIndexOpsMixin.__annotations__
{'_data': typing.Union[pandas.core.arrays.datetimes.DatetimeArray, pandas.core.a
rrays.timedeltas.TimedeltaArray, pandas.core.arrays.period.PeriodArray], 'freq':
 typing.Union[pandas._libs.tslibs.offsets.BaseOffset, NoneType], 'freqstr': typi
ng.Union[str, NoneType], '_resolution_obj': <enum 'Resolution'>, '_bool_ops': ty
ping.List[str], '_field_ops': typing.List[str]}
>>>
>>> isinstance(42, DatetimeIndexOpsMixin.__annotations__["_data"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\typing.py", line 769, in __
instancecheck__
    return self.__subclasscheck__(type(obj))
  File "C:\Users\simon\anaconda3\envs\pandas-dev\lib\typing.py", line 777, in __
subclasscheck__
    raise TypeError("Subscripted generics cannot be used with"
TypeError: Subscripted generics cannot be used with class and instance checks
>>>

assuming that the assert has been added for runtime debugging and not to appease mypy:

can you not add something akin to construct_array_type from ExtensionDtype as a class attribute to Index to store the backing array type?

  1. is there a nice way to annotate the values arg (maybe a Protocol thing?)

is the reason type annotations for values has not been added related to the above?

@jbrockmendel
Copy link
Member Author

can you not add something akin to construct_array_type from ExtensionDtype as a class attribute to Index to store the backing array type?

will do

is there a nice way to annotate the values arg (maybe a Protocol thing?)

is the reason type annotations for values has not been added related to the above?

I can add an annotation with Union[DatetimeArray, TimedeltaArray, PeriodArray], was wondering if there is a way to make it isolate just the one of those three (like the annotations in the current code with three separate simple_new implementations)

@simonjayhawkins
Copy link
Member

is there a nice way to annotate the values arg (maybe a Protocol thing?)

is the reason type annotations for values has not been added related to the above?

I can add an annotation with Union[DatetimeArray, TimedeltaArray, PeriodArray], was wondering if there is a way to make it isolate just the one of those three (like the annotations in the current code with three separate simple_new implementations)

right. so we want to tie the type of values, a parameter of a class method, to the type of self._data, a class attribute (and maybe a construct_array_type like attribute)? I think this is where we would use Generic. the backing array type would be the type parameter.

so something like (completely untested!)

(ArrayLikeT is the same as ArrayLike but we want to change that to a Union, and we want a typevar here)

ArrayLikeT = TypeVar("ArrayLikeT", "ExtensionArray", np.ndarray)

class Index(IndexOpsMixin, PandasObject, Generic(ArrayLikeT)):

    _data : ArrayLikeT
    _construct_array_type: Type[ArrayLikeT]

    def _simple_new(cls, values: ArrayLikeT, name: Label = None):
         ...

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbrockmendel lgtm pending green

@@ -418,7 +418,7 @@ def asi8(self):
return None

@classmethod
def _simple_new(cls, values, name: Label = None):
def _simple_new(cls, values: np.ndarray, name: Label = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this is the base class it should be ArrayLike? (or we can leave this fix for #36092)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, its specifically np.ndarray

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I guess you want to make Index np.ndarray specific and have ExtensionIndex handle EA backed indexes but in that case some other type annotations in base would need to be changed e,g, _data: Union[ExtensionArray, np.ndarray] so that methods in the base class that currently handle EA can be simplified and overridden.

In #37898, im getting error: Unsupported left operand type for - ("ExtensionArray") in the base class since def _values(self) -> Union[ExtensionArray, np.ndarray] so atm the base class is not EA agnostic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to get revert this annotation

def _simple_new(
cls,
values: Union[DatetimeArray, TimedeltaArray, PeriodArray],
name: Label = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can now just use Hashable, see #37134

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will edit on subsequent pass

@jbrockmendel
Copy link
Member Author

updated + green

@jreback jreback merged commit 8724737 into pandas-dev:master Nov 17, 2020
@jbrockmendel jbrockmendel deleted the ref-share-simple_new branch November 17, 2020 03:14
@MarcoGorelli MarcoGorelli mentioned this pull request Nov 18, 2020
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants