-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Add types to top-level funcs, step 2 #30582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -651,7 +654,7 @@ def value_counts( | |||
normalize: bool = False, | |||
bins=None, | |||
dropna: bool = True, | |||
) -> ABCSeries: | |||
) -> "Series": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABCSeries doesn't work as a return type, AFAIS: Doing a x = pd.value_counts(x); reveal_type(x)
returns Any, if the return type is ABCSeries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, all ABCs used for type annotations need to be replaced. we need a custom script/clever regex to avoid this in future.
pandas/core/reshape/reshape.py
Outdated
result = concat(with_dummies, axis=1) | ||
concatted = concat(with_dummies, axis=1) | ||
assert isinstance(concatted, DataFrame) | ||
result = concatted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same annoying mypy issue as in #30565. don't know if there's a better way to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @topper-123
pandas/core/reshape/concat.py
Outdated
@@ -37,7 +37,7 @@ def concat( | |||
verify_integrity: bool = False, | |||
sort: bool = False, | |||
copy: bool = True, | |||
): | |||
) -> Union["DataFrame", "Series"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume FrameOrSeries doesn't apply here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the problem is the return type doesn't always follow a parameter type. Specifically if the input are series, and axis=1, the output is a DataFrame. We need Literals to do it better.
pandas/core/reshape/pivot.py
Outdated
@@ -58,7 +58,9 @@ def pivot_table( | |||
pieces.append(table) | |||
keys.append(getattr(func, "__name__", func)) | |||
|
|||
return concat(pieces, keys=keys, axis=1) | |||
result = concat(pieces, keys=keys, axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can concat be overloaded to avoid this?
even without Literal for axis could try List[DataFrame] -> DataFrame and List[Series} -> Union[Series, DataFrame]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might work. I'll look into it.
Hello @topper-123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-01-05 12:20:24 UTC |
I've overloaded This is quite ugly, but allows us to drop the uglyness in other locations, so...:unamused: |
the overloads do end up being ugly (and introduces duplication) but is more benefical for users of types since they don't need to include #ignores, casts or workarounds. |
Yeah, I agree. It can be cleaned a bit when we reach 3.8. hopefully mypy will drop the duplication requirement. The overload example in the python docs had no empty lines between overloaded definitions. Our Flake8 accepts that with a noqa directive, but pep8speaks doesn't. This is probaly not settled stylewise, but I like having no empty lines to signal the definitions belong together. EDIT: Ok, while flake8 acceps no empty lines, black doesn't. So two empty lines it will be.... |
88f9556
to
03375c6
Compare
xref #26766 |
pandas/core/reshape/concat.py
Outdated
def concat( | ||
objs, | ||
objs: Union[Sequence[FrameOrSeriesUnion], Mapping[str, FrameOrSeriesUnion]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the mapping key should be Optional[Hashable]
pandas/core/reshape/concat.py
Outdated
@@ -25,9 +25,28 @@ | |||
# --------------------------------------------------------------------- | |||
# Concatenate DataFrame objects | |||
|
|||
FrameOrSeriesUnion = Union["DataFrame", "Series"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should put this in _typing as FrameOrSeries and rename FrameOrSeries as FrameOrSeriesT. @WillAyd
i think that’s a great idea
…Sent from my iPhone
On Dec 31, 2019, at 10:55 AM, Simon Hawkins ***@***.***> wrote:
@simonjayhawkins commented on this pull request.
In pandas/core/reshape/concat.py:
> @@ -25,9 +25,28 @@
# ---------------------------------------------------------------------
# Concatenate DataFrame objects
+FrameOrSeriesUnion = Union["DataFrame", "Series"]
maybe we should put this in _typing as FrameOrSeries and rename FrameOrSeries as FrameOrSeriesT. @WillAyd
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
ccba86c
to
d68fe49
Compare
I’ve made the changes. |
pandas/_typing.py
Outdated
@@ -41,7 +42,8 @@ | |||
|
|||
Dtype = Union[str, np.dtype, "ExtensionDtype"] | |||
FilePathOrBuffer = Union[str, Path, IO[AnyStr]] | |||
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame") | |||
FrameOrSeries = Union["DataFrame", "Series"] | |||
FrameOrSeriesT = TypeVar("FrameOrSeriesT", bound="NDFrame") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment explaining the difference between FrameOrSeries vs FrameOrSeriesT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm after viewing the diff here I am second guess this a bit - is this really just for one-off things like concat? If so probably worth just dealing with exceptions in the individual signatures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this myself. I can remove it here and if we want this union in _typing.py later, it can be added then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more descriptive on the name and add a comment before these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe SameFrameOrSeries
? It's difficult to find a descriptive name for the TypeVar that isn't very long. It will be used in a lot of places, so shouldn't be too long IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure that sounds better, definitely add some comments / sections here in typing with liberal explanations on when to use these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find SameFrameOrSeries
is ambiguous - is that supposed to represent the type or instance of the object being the same? Should be type as far as annotations are concerned but I don't think that's clear
I would be +1 with @simonjayhawkins naming suggestion as I think the T
is a pseudo-convention seen most with TypeVars in typeshed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, but I think need to comment / explain when FrameOrSeries and FrameOrSeriesT (need a better name here as well), should be used. would comment in pandas._typing directly (generally we should have many more comments on what types of things these should cover, even if they are 'obvious'.
pandas/_typing.py
Outdated
@@ -41,7 +42,8 @@ | |||
|
|||
Dtype = Union[str, np.dtype, "ExtensionDtype"] | |||
FilePathOrBuffer = Union[str, Path, IO[AnyStr]] | |||
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame") | |||
FrameOrSeries = Union["DataFrame", "Series"] | |||
FrameOrSeriesT = TypeVar("FrameOrSeriesT", bound="NDFrame") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be more descriptive on the name and add a comment before these
pandas/core/reshape/concat.py
Outdated
@@ -26,8 +28,25 @@ | |||
# Concatenate DataFrame objects | |||
|
|||
|
|||
@overload | |||
def concat( | |||
objs: Union[Sequence["DataFrame"], Mapping[Hashable, "DataFrame"]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC None is a valid key and None is not a subtype of Hashable.
objs: Union[Sequence["DataFrame"], Mapping[Hashable, "DataFrame"]], | |
objs: Union[Sequence["DataFrame"], Mapping[Optional[Hashable], "DataFrame"]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None is an instance of Hashable:
In [1]: from typing import Hashable
In [2]: isinstance(None, Hashable)
Out[2]: True
So Optional is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, when typechecking
from typing import Hashable
a: Hashable = None
$ mypy test.py
test.py:3: error: Incompatible types in assignment (expression has type "None", variable has type "Hashable")
Found 1 error in 1 file (checked 1 source file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but None is hashable (e.g. hash(None)
and {None: 1}
both work). So mypy is specialcasing None, which seems a bit like a bug to me.
Maybe we should have a Label = Optional[Hashable]
in _typing.py? Would seem easier to remember to use that than to define Optional[Hashable] every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added Optional
. I did not add a Label type to _typing.py. Figured it can be done in seperate PR by searching for "Optional[Hashable]"
d68fe49
to
87a9b1a
Compare
PR updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. @simonjayhawkins @WillAyd
87a9b1a
to
ddcdb8a
Compare
I think @simonjayhawkins and I are proposing Is there a strong objection to the T standard? |
I don’t have a strong opinion either way, but I think ‘FrameOrSeries’ and ‘FrameOrSeriesT’ are too similar; it will be unnecessary difficult to glance at the types. Maybe ‘NDFrameT’ (signals better where it actually comes from)? Or a differrent name for the union type. |
prefer not. we may want to change this. see #29480 |
FrameOrSeries for the TypeVar and FrameOrSeriesUnion for the union? The union should be used sparingly, so less of a problem if that has a long name. |
sure |
ddcdb8a
to
4270e57
Compare
4270e57
to
c32eef4
Compare
c32eef4
to
84d1c37
Compare
Is this ok to take in? |
The failure looks unrelated, looks cython-related. |
thanks ! |
Next step to #30565.
Next up will be the pd.read_* functions.