-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TYP: remove ignores in refine_percentiles #42389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -50,7 +51,7 @@ def describe_ndframe( | |||
include: str | Sequence[str] | None, | |||
exclude: str | Sequence[str] | None, | |||
datetime_is_numeric: bool, | |||
percentiles: Sequence[float] | None, | |||
percentiles: Sequence[float] | np.ndarray | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't Sequence[float] cover the ndarray cases that we actually want? i.e. once it is supported, something like np.ndarray[float, ndim=1] | None
would be what we really want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type of refine_percentiles is a numpy array and we pass this onto the other functions, so we need to add np.ndarray to the acceptable types.
doesn't Sequence[float] cover the ndarray cases that we actually want?
numpy/numpy#2776, issue has been opened a while, doesn't look like a fix in the pipeline
something like
np.ndarray[float, ndim=1] | None
would be what we really want?
probably not float
, since int
is also allowed. but if we narrow using float
from __future__ import annotations
import numpy as np
from typing import Sequence, Any
arr = np.array([0.25, 0.5])
reveal_type(arr)
arr2: np.ndarray[Any, np.dtype[np.float64]] = np.array([0.25, 0.5])
reveal_type(arr2)
arr3: np.ndarray[Any, np.dtype[np.float32]] = np.array([0.25, 0.5], dtype="float64")
reveal_type(arr3)
arr4: np.ndarray[Any, np.dtype[np.int_]] = np.array([0.25, 0.5], dtype="float64")
reveal_type(arr4)
arr5: np.ndarray[Any, np.dtype[np.int_]] = np.array([1, 2, 3])
reveal_type(arr5)
def test(a: Sequence[float]):
pass
test(arr)
test(arr2)
test(arr3)
test(arr4)
test(arr5)
def test2(a: Sequence[float] | np.ndarray[Any, np.dtype[np.float_]]):
reveal_type(a)
test2(arr)
test2(arr2)
test2(arr3)
test2(arr4)
test2(arr5)
/home/simon/t1.py:7: note: Revealed type is "numpy.ndarray[Any, Any]"
/home/simon/t1.py:9: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy.typing._64Bit]]]"
/home/simon/t1.py:11: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy.typing._32Bit]]]"
/home/simon/t1.py:13: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.signedinteger[Any]]]"
/home/simon/t1.py:15: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.signedinteger[Any]]]"
/home/simon/t1.py:22: error: Argument 1 to "test" has incompatible type "ndarray[Any, Any]"; expected "Sequence[float]" [arg-type]
/home/simon/t1.py:23: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[floating[_64Bit]]]"; expected "Sequence[float]" [arg-type]
/home/simon/t1.py:24: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[floating[_32Bit]]]"; expected "Sequence[float]" [arg-type]
/home/simon/t1.py:25: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Sequence[float]" [arg-type]
/home/simon/t1.py:26: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Sequence[float]" [arg-type]
/home/simon/t1.py:30: note: Revealed type is "Union[typing.Sequence[builtins.float], numpy.ndarray[Any, numpy.dtype[numpy.floating[Any]]]]"
/home/simon/t1.py:36: error: Argument 1 to "test2" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Union[Sequence[float], ndarray[Any, dtype[floating[Any]]]]" [arg-type]
/home/simon/t1.py:37: error: Argument 1 to "test2" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Union[Sequence[float], ndarray[Any, dtype[floating[Any]]]]" [arg-type]
Found 7 errors in 1 file (checked 1 source file)
I've added the dtype bound to the return type of refine_percentiles since for return types we should be as precise as possible.
For the arguments of describe
we want to be as permissible as possible, so maybe best not to narrow till the numpy inference works properly to avoid false positives for end users, describe
is public.
'int' is duck type compatible with float
, see https://mypy.readthedocs.io/en/stable/duck_type_compatibility.html#duck-type-compatibility, so having Sequence[float]
is equivalent to Sequence[float | int]
on master
import numpy as np
import pandas as pd
print(pd.DataFrame(pd.Series([1, 2, 3])).describe([0, 1]))
print(pd.DataFrame(pd.Series([1, 2, 3])).describe([0.5]))
print(pd.DataFrame(pd.Series([1, 2, 3])).describe(np.array([0, 1])))
print(pd.DataFrame(pd.Series([1, 2, 3])).describe(np.array([0.5])))
gives
0
count 3.0
mean 2.0
std 1.0
min 1.0
0% 1.0
50% 2.0
100% 3.0
max 3.0
0
count 3.0
mean 2.0
std 1.0
min 1.0
50% 2.0
max 3.0
0
count 3.0
mean 2.0
std 1.0
min 1.0
0% 1.0
50% 2.0
100% 3.0
max 3.0
0
count 3.0
mean 2.0
std 1.0
min 1.0
50% 2.0
max 3.0
but on master, users don't see false positives since the percentiles
parameter of DataFrame.describe
is not yet typed
to type that, we need to pass the user passed argument onto the lower level functions, and these therefore need to be typed in a such a way to avoid false positives, so we can't wait for downstream fixes and imo best not to narrow types for numpy arrays till it's more mature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks for walking me through that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, passing integers 0
and 1
is redundant anyway.
imo best not to narrow types for numpy arrays till it's more mature.
and for the internal functions, using the type parameters for np.ndarray is probably also fine. it's only the public api where we may want to be more cautious.
to be clear, while the constructors don't specialize and the dtype type parameter of np.ndarray is Any
, then they will be type compatible with any specializations we do add. If we get these wrong, then there could be a batch of errors that need to be resolved when we get a future numpy release.
in numpy 1.21 (we use for type checking)...
def array(
object: object,
dtype: DTypeLike = ...,
*,
copy: bool = ...,
order: _OrderKACF = ...,
subok: bool = ...,
ndmin: int = ...,
like: ArrayLike = ...,
) -> ndarray: ...
so all arrays created with np.array
will have an inferred type numpy.ndarray[Any, Any]
No description provided.