Skip to content

TYP: Add cast to ABC classes. #37902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 24, 2020
Merged

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Nov 16, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Implements the idea of @simonjayhawkins from #27353 (comment) to use cast on ABC classes. Currently only implements ABCDataFrame, ABCNDFrame, and ABCSeries. Previously, mypy would resolve e.g. ABCSeries as any, now it gets resolved as Series.

One issue remains: I don't understand why the code below generates this complaint.

pandas/core/common.py:472: error: Incompatible return value type (got "Union[Any, List[Any], Series]", expected "Union[List[Any], ExtensionArray]") [return-value]

def convert_to_list_like(
    values: Union[Scalar, Iterable, AnyArrayLike]
) -> Union[List, AnyArrayLike]:
    if isinstance(values, (list, np.ndarray, ABCIndex, ABCSeries, ABCExtensionArray)):
        return values
    ...

One can add Series to the return Union to resolve this, but it doesn't seem like that should be necessary.

I'm also getting:

pandas/core/base.py:923: note: Internal mypy error checking function redefinition

on the lines

def map_f(values, f):
    return lib.map_infer_mask(values, f, isna(values).view(np.uint8))

where map_f is defined in various ways in if-else blocks. I'm not sure if this is a mypy bug.

@rhshadrach rhshadrach added the Typing type annotations, mypy/pyright type checking label Nov 16, 2020
def extract_array(obj: AnyArrayLike, extract_numpy: bool = False) -> ArrayLike:
def extract_array(
obj: AnyArrayLike, extract_numpy: bool = False
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think we want AnyArrayLike in the return type, since we should never return Series or Index

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will probably need to use an overload here eventually but not so easy when np.ndarray resolves to Any as we normally get messages about overlapping overloads.

Suggested change
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]:
) -> Union[Any, ExtensionArray, np.ndarray]:

for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looking at this more closely, the call to array always returns an EA. The possible types for input are EA, Index, Series, or np.ndarray, so that means the return can only be EA or np.ndarray. Let me know if this wrong, for now I've simply removed the AnyArrayLike.

def extract_array(obj: AnyArrayLike, extract_numpy: bool = False) -> ArrayLike:
def extract_array(
obj: AnyArrayLike, extract_numpy: bool = False
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will probably need to use an overload here eventually but not so easy when np.ndarray resolves to Any as we normally get messages about overlapping overloads.

Suggested change
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]:
) -> Union[Any, ExtensionArray, np.ndarray]:

for now.

Comment on lines 399 to 406
result = obj.array
else:
result = obj

if extract_numpy and isinstance(obj, ABCPandasArray):
obj = obj.to_numpy()
if extract_numpy and isinstance(result, ABCPandasArray):
result = result.to_numpy()

# error: Incompatible return value type (got "Index", expected "ExtensionArray")
# error: Incompatible return value type (got "Series", expected "ExtensionArray")
return obj # type: ignore[return-value]
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem here is that AnyArrayLike is a typevar and also the type of obj is wrong anyway and should be object I think could also change obj to object/Any for now until we overload and keep the implementation unchanged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert the changes to the implementation.

Copy link
Member Author

@rhshadrach rhshadrach Nov 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but would require adding obj = obj.array # type: ignore[assignment] or possibly changing typing on arguments (not return type)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is a similar issue to #37902 (comment) where we changed FrameOrSeries to FrameOrSeriesUnion.

We don't yet have an AnyArrayLikeUnion to do the same here.

because AnyArrayLike is a typevar, the type annotations imply that the type doesn't change.

This is not true. Only for np.ndarray and ExtensionArray if extract_array=False.

so maybe the signature should be

def extract_array(
    obj: Union[object, ArrayLike], extract_numpy: bool = False
) -> Union[object, ArrayLike]:

but we also need to overload (to account for Series and Index array extraction) but while numpy types resolve to Any we get mypy errors about overlapping overloads.

so for now, i think ok to do...

def extract_array(
    obj: object, extract_numpy: bool = False
) -> Union[Any, ArrayLike]:

It's the use of the typevar that causes the issue, so I'm not too concerned what type annotations are chosen at this stage, so long as we are not changing the implementation because we are using a TypeVar incorrectly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Changes made.

@simonjayhawkins
Copy link
Member

One issue remains: I don't understand why the code below generates this complaint.

pandas/core/common.py:472: error: Incompatible return value type (got "Union[Any, List[Any], Series]", expected "Union[List[Any], ExtensionArray]") [return-value]

def convert_to_list_like(
    values: Union[Scalar, Iterable, AnyArrayLike]
) -> Union[List, AnyArrayLike]:
    if isinstance(values, (list, np.ndarray, ABCIndex, ABCSeries, ABCExtensionArray)):
        return values
    ...

One can add Series to the return Union to resolve this, but it doesn't seem like that should be necessary.

AnyArrayLike is a TypeVar so that a type of AnyArrayLike is returned as is. I'm guessing this is a False positive that may be down to either np.ndarray resolving to Any or down to ABCIndex and ABCExtensionArray not yet updated to not resolve to Any. I would just ignore these errors for now.

@simonjayhawkins
Copy link
Member

I'm also getting:

pandas/core/base.py:923: note: Internal mypy error checking function redefinition

on the lines

def map_f(values, f):
    return lib.map_infer_mask(values, f, isna(values).view(np.uint8))

where map_f is defined in various ways in if-else blocks. I'm not sure if this is a mypy bug.

changing the def to a lambda fixes. probably a mypy issue.

@rhshadrach
Copy link
Member Author

Thanks for all the feedback @jbrockmendel and @simonjayhawkins; I believe I've address everything thus far.


cellText = data.values
cellText = frame.values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes in this module are not required. The mypy issue is trying to assign the result of data._to_frame back to data which is FrameOrSeries, a typevar.

just need to change FrameOrSeries to FrameOrSeriesUnion in the function signature.

I think the defaults in pandas._typing should be changed to Unions. see also #36092 (comment) about ArrayLike and AnyArrayLike

@rhshadrach
Copy link
Member Author

Thanks @simonjayhawkins, change made. I also used your lambda to work around the internal mypy error for a lack of any other solutions. Using # type: ignore worked to silence mypy, but there is no error type and so it doesn't pass the pre-commit style checks. On a whim I attempted # type: ignore[note] but it did not work. I'll be raising an issue with mypy later tonight.

@simonjayhawkins
Copy link
Member

Using # type: ignore worked to silence mypy, but there is no error type and so it doesn't pass the pre-commit style checks. On a whim I attempted # type: ignore[note] but it did not work. I'll be raising an issue with mypy later tonight.

we could perhaps change our code check to allow, say, # noqa to be added to allow this through.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thanks @rhshadrach for working on this. generally lgtm. looking forward to the other ABC getting the same treatment.

Comment on lines 399 to 406
result = obj.array
else:
result = obj

if extract_numpy and isinstance(obj, ABCPandasArray):
obj = obj.to_numpy()
if extract_numpy and isinstance(result, ABCPandasArray):
result = result.to_numpy()

# error: Incompatible return value type (got "Index", expected "ExtensionArray")
# error: Incompatible return value type (got "Series", expected "ExtensionArray")
return obj # type: ignore[return-value]
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert the changes to the implementation.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach lgtm.

@simonjayhawkins simonjayhawkins added this to the 1.2 milestone Nov 23, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, happy to handle in a followup or here. ping either way

@@ -102,12 +104,12 @@ def clean_column_name(name: str) -> str:

Parameters
----------
name : str
name : hashable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be Label (or is this consistent with how we do elsewhere?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick search reveals 6 counts of "Label", 2 counts of "hashable" (other than the 2 added in this PR). @simonjayhawkins requested hashable over Label, and I like the idea of staying away from pandas-defined types for readability. This is an internal function, so that reasoning doesn't directly apply, but it seems best to me to not have a conditional when deciding on typing in docstrings (e.g. hashable when function is public, Label otherwise).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk that's fine, just worried that we are inconsistent generally. i am not sure this is actually easy to check, but we should try to be as consisten as possible.

@rhshadrach
Copy link
Member Author

@jreback changes made and green. Just the label vs hashable issue is outstanding.

@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

thanks @rhshadrach

@jreback jreback merged commit 99376b3 into pandas-dev:master Nov 24, 2020
@rhshadrach rhshadrach deleted the abc_typing branch November 24, 2020 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants