TYP: remove ignores in refine_percentiles #42389

simonjayhawkins · 2021-07-05T15:49:56Z

No description provided.

jbrockmendel · 2021-07-06T16:33:28Z

pandas/core/describe.py

@@ -50,7 +51,7 @@ def describe_ndframe(
    include: str | Sequence[str] | None,
    exclude: str | Sequence[str] | None,
    datetime_is_numeric: bool,
-    percentiles: Sequence[float] | None,
+    percentiles: Sequence[float] | np.ndarray | None,


doesn't Sequence[float] cover the ndarray cases that we actually want? i.e. once it is supported, something like np.ndarray[float, ndim=1] | None would be what we really want?

The return type of refine_percentiles is a numpy array and we pass this onto the other functions, so we need to add np.ndarray to the acceptable types.

doesn't Sequence[float] cover the ndarray cases that we actually want?

numpy/numpy#2776, issue has been opened a while, doesn't look like a fix in the pipeline

something like np.ndarray[float, ndim=1] | None would be what we really want?

probably not float, since int is also allowed. but if we narrow using float

from __future__ import annotations import numpy as np from typing import Sequence, Any arr = np.array([0.25, 0.5]) reveal_type(arr) arr2: np.ndarray[Any, np.dtype[np.float64]] = np.array([0.25, 0.5]) reveal_type(arr2) arr3: np.ndarray[Any, np.dtype[np.float32]] = np.array([0.25, 0.5], dtype="float64") reveal_type(arr3) arr4: np.ndarray[Any, np.dtype[np.int_]] = np.array([0.25, 0.5], dtype="float64") reveal_type(arr4) arr5: np.ndarray[Any, np.dtype[np.int_]] = np.array([1, 2, 3]) reveal_type(arr5) def test(a: Sequence[float]): pass test(arr) test(arr2) test(arr3) test(arr4) test(arr5) def test2(a: Sequence[float] | np.ndarray[Any, np.dtype[np.float_]]): reveal_type(a) test2(arr) test2(arr2) test2(arr3) test2(arr4) test2(arr5)

/home/simon/t1.py:7: note: Revealed type is "numpy.ndarray[Any, Any]" /home/simon/t1.py:9: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy.typing._64Bit]]]" /home/simon/t1.py:11: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.floating[numpy.typing._32Bit]]]" /home/simon/t1.py:13: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.signedinteger[Any]]]" /home/simon/t1.py:15: note: Revealed type is "numpy.ndarray[Any, numpy.dtype[numpy.signedinteger[Any]]]" /home/simon/t1.py:22: error: Argument 1 to "test" has incompatible type "ndarray[Any, Any]"; expected "Sequence[float]" [arg-type] /home/simon/t1.py:23: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[floating[_64Bit]]]"; expected "Sequence[float]" [arg-type] /home/simon/t1.py:24: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[floating[_32Bit]]]"; expected "Sequence[float]" [arg-type] /home/simon/t1.py:25: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Sequence[float]" [arg-type] /home/simon/t1.py:26: error: Argument 1 to "test" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Sequence[float]" [arg-type] /home/simon/t1.py:30: note: Revealed type is "Union[typing.Sequence[builtins.float], numpy.ndarray[Any, numpy.dtype[numpy.floating[Any]]]]" /home/simon/t1.py:36: error: Argument 1 to "test2" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Union[Sequence[float], ndarray[Any, dtype[floating[Any]]]]" [arg-type] /home/simon/t1.py:37: error: Argument 1 to "test2" has incompatible type "ndarray[Any, dtype[signedinteger[Any]]]"; expected "Union[Sequence[float], ndarray[Any, dtype[floating[Any]]]]" [arg-type] Found 7 errors in 1 file (checked 1 source file)

I've added the dtype bound to the return type of refine_percentiles since for return types we should be as precise as possible.

For the arguments of describe we want to be as permissible as possible, so maybe best not to narrow till the numpy inference works properly to avoid false positives for end users, describe is public.

'int' is duck type compatible with float, see https://mypy.readthedocs.io/en/stable/duck_type_compatibility.html#duck-type-compatibility, so having Sequence[float] is equivalent to Sequence[float | int]

on master

import numpy as np import pandas as pd print(pd.DataFrame(pd.Series([1, 2, 3])).describe([0, 1])) print(pd.DataFrame(pd.Series([1, 2, 3])).describe([0.5])) print(pd.DataFrame(pd.Series([1, 2, 3])).describe(np.array([0, 1]))) print(pd.DataFrame(pd.Series([1, 2, 3])).describe(np.array([0.5])))

gives

0 count 3.0 mean 2.0 std 1.0 min 1.0 0% 1.0 50% 2.0 100% 3.0 max 3.0 0 count 3.0 mean 2.0 std 1.0 min 1.0 50% 2.0 max 3.0 0 count 3.0 mean 2.0 std 1.0 min 1.0 0% 1.0 50% 2.0 100% 3.0 max 3.0 0 count 3.0 mean 2.0 std 1.0 min 1.0 50% 2.0 max 3.0

but on master, users don't see false positives since the percentiles parameter of DataFrame.describe is not yet typed

to type that, we need to pass the user passed argument onto the lower level functions, and these therefore need to be typed in a such a way to avoid false positives, so we can't wait for downstream fixes and imo best not to narrow types for numpy arrays till it's more mature.

makes sense, thanks for walking me through that

of course, passing integers 0 and 1 is redundant anyway.

imo best not to narrow types for numpy arrays till it's more mature.

and for the internal functions, using the type parameters for np.ndarray is probably also fine. it's only the public api where we may want to be more cautious.

to be clear, while the constructors don't specialize and the dtype type parameter of np.ndarray is Any, then they will be type compatible with any specializations we do add. If we get these wrong, then there could be a batch of errors that need to be resolved when we get a future numpy release.

in numpy 1.21 (we use for type checking)...

def array( object: object, dtype: DTypeLike = ..., *, copy: bool = ..., order: _OrderKACF = ..., subok: bool = ..., ndmin: int = ..., like: ArrayLike = ..., ) -> ndarray: ...

so all arrays created with np.array will have an inferred type numpy.ndarray[Any, Any]

TYP: remove ignores in refine_percentiles

d1cf6df

simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jul 5, 2021

jbrockmendel reviewed Jul 6, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into ignore-2

3e664e8

jreback added this to the 1.4 milestone Jul 7, 2021

jreback merged commit 00ceed4 into pandas-dev:master Jul 7, 2021

simonjayhawkins deleted the ignore-2 branch July 7, 2021 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP: remove ignores in refine_percentiles #42389

TYP: remove ignores in refine_percentiles #42389

simonjayhawkins commented Jul 5, 2021

jbrockmendel Jul 6, 2021

simonjayhawkins Jul 6, 2021 •

edited

Loading

jbrockmendel Jul 6, 2021

simonjayhawkins Jul 7, 2021

TYP: remove ignores in refine_percentiles #42389

TYP: remove ignores in refine_percentiles #42389

Conversation

simonjayhawkins commented Jul 5, 2021

jbrockmendel Jul 6, 2021

Choose a reason for hiding this comment

simonjayhawkins Jul 6, 2021 • edited Loading

Choose a reason for hiding this comment

jbrockmendel Jul 6, 2021

Choose a reason for hiding this comment

simonjayhawkins Jul 7, 2021

Choose a reason for hiding this comment

simonjayhawkins Jul 6, 2021 •

edited

Loading