-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Add cast to ABC classes. #37902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/construction.py
Outdated
def extract_array(obj: AnyArrayLike, extract_numpy: bool = False) -> ArrayLike: | ||
def extract_array( | ||
obj: AnyArrayLike, extract_numpy: bool = False | ||
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think we want AnyArrayLike in the return type, since we should never return Series or Index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will probably need to use an overload here eventually but not so easy when np.ndarray resolves to Any
as we normally get messages about overlapping overloads.
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]: | |
) -> Union[Any, ExtensionArray, np.ndarray]: |
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looking at this more closely, the call to array
always returns an EA. The possible types for input are EA, Index, Series, or np.ndarray, so that means the return can only be EA or np.ndarray. Let me know if this wrong, for now I've simply removed the AnyArrayLike
.
pandas/core/construction.py
Outdated
def extract_array(obj: AnyArrayLike, extract_numpy: bool = False) -> ArrayLike: | ||
def extract_array( | ||
obj: AnyArrayLike, extract_numpy: bool = False | ||
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will probably need to use an overload here eventually but not so easy when np.ndarray resolves to Any
as we normally get messages about overlapping overloads.
) -> Union[AnyArrayLike, ExtensionArray, np.ndarray]: | |
) -> Union[Any, ExtensionArray, np.ndarray]: |
for now.
pandas/core/construction.py
Outdated
result = obj.array | ||
else: | ||
result = obj | ||
|
||
if extract_numpy and isinstance(obj, ABCPandasArray): | ||
obj = obj.to_numpy() | ||
if extract_numpy and isinstance(result, ABCPandasArray): | ||
result = result.to_numpy() | ||
|
||
# error: Incompatible return value type (got "Index", expected "ExtensionArray") | ||
# error: Incompatible return value type (got "Series", expected "ExtensionArray") | ||
return obj # type: ignore[return-value] | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem here is that AnyArrayLike is a typevar and also the type of obj
is wrong anyway and should be object
I think could also change obj
to object/Any for now until we overload and keep the implementation unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you revert the changes to the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but would require adding obj = obj.array # type: ignore[assignment]
or possibly changing typing on arguments (not return type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this is a similar issue to #37902 (comment) where we changed FrameOrSeries to FrameOrSeriesUnion.
We don't yet have an AnyArrayLikeUnion to do the same here.
because AnyArrayLike is a typevar, the type annotations imply that the type doesn't change.
This is not true. Only for np.ndarray and ExtensionArray if extract_array=False.
so maybe the signature should be
def extract_array(
obj: Union[object, ArrayLike], extract_numpy: bool = False
) -> Union[object, ArrayLike]:
but we also need to overload (to account for Series and Index array extraction) but while numpy types resolve to Any
we get mypy errors about overlapping overloads.
so for now, i think ok to do...
def extract_array(
obj: object, extract_numpy: bool = False
) -> Union[Any, ArrayLike]:
It's the use of the typevar that causes the issue, so I'm not too concerned what type annotations are chosen at this stage, so long as we are not changing the implementation because we are using a TypeVar incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Changes made.
AnyArrayLike is a TypeVar so that a type of AnyArrayLike is returned as is. I'm guessing this is a False positive that may be down to either np.ndarray resolving to |
changing the def to a lambda fixes. probably a mypy issue. |
Thanks for all the feedback @jbrockmendel and @simonjayhawkins; I believe I've address everything thus far. |
pandas/plotting/_matplotlib/tools.py
Outdated
|
||
cellText = data.values | ||
cellText = frame.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the changes in this module are not required. The mypy issue is trying to assign the result of data._to_frame back to data which is FrameOrSeries, a typevar.
just need to change FrameOrSeries to FrameOrSeriesUnion in the function signature.
I think the defaults in pandas._typing should be changed to Unions. see also #36092 (comment) about ArrayLike and AnyArrayLike
Thanks @simonjayhawkins, change made. I also used your lambda to work around the internal mypy error for a lack of any other solutions. Using |
we could perhaps change our code check to allow, say, # noqa to be added to allow this through. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Thanks @rhshadrach for working on this. generally lgtm. looking forward to the other ABC getting the same treatment.
pandas/core/construction.py
Outdated
result = obj.array | ||
else: | ||
result = obj | ||
|
||
if extract_numpy and isinstance(obj, ABCPandasArray): | ||
obj = obj.to_numpy() | ||
if extract_numpy and isinstance(result, ABCPandasArray): | ||
result = result.to_numpy() | ||
|
||
# error: Incompatible return value type (got "Index", expected "ExtensionArray") | ||
# error: Incompatible return value type (got "Series", expected "ExtensionArray") | ||
return obj # type: ignore[return-value] | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you revert the changes to the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rhshadrach lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine, happy to handle in a followup or here. ping either way
@@ -102,12 +104,12 @@ def clean_column_name(name: str) -> str: | |||
|
|||
Parameters | |||
---------- | |||
name : str | |||
name : hashable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be Label (or is this consistent with how we do elsewhere?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick search reveals 6 counts of "Label", 2 counts of "hashable" (other than the 2 added in this PR). @simonjayhawkins requested hashable over Label, and I like the idea of staying away from pandas-defined types for readability. This is an internal function, so that reasoning doesn't directly apply, but it seems best to me to not have a conditional when deciding on typing in docstrings (e.g. hashable when function is public, Label otherwise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk that's fine, just worried that we are inconsistent generally. i am not sure this is actually easy to check, but we should try to be as consisten as possible.
@jreback changes made and green. Just the label vs hashable issue is outstanding. |
thanks @rhshadrach |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Implements the idea of @simonjayhawkins from #27353 (comment) to use cast on ABC classes. Currently only implements ABCDataFrame, ABCNDFrame, and ABCSeries. Previously, mypy would resolve e.g. ABCSeries as any, now it gets resolved as Series.
One issue remains: I don't understand why the code below generates this complaint.
One can add Series to the return Union to resolve this, but it doesn't seem like that should be necessary.
I'm also getting:
on the lines
where
map_f
is defined in various ways in if-else blocks. I'm not sure if this is a mypy bug.