Skip to content

TYP: Add types to top-level funcs, step 2 #30582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 5, 2020

Conversation

topper-123
Copy link
Contributor

Next step to #30565.

Next up will be the pd.read_* functions.

@@ -651,7 +654,7 @@ def value_counts(
normalize: bool = False,
bins=None,
dropna: bool = True,
) -> ABCSeries:
) -> "Series":
Copy link
Contributor Author

@topper-123 topper-123 Dec 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABCSeries doesn't work as a return type, AFAIS: Doing a x = pd.value_counts(x); reveal_type(x) returns Any, if the return type is ABCSeries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, all ABCs used for type annotations need to be replaced. we need a custom script/clever regex to avoid this in future.

result = concat(with_dummies, axis=1)
concatted = concat(with_dummies, axis=1)
assert isinstance(concatted, DataFrame)
result = concatted
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same annoying mypy issue as in #30565. don't know if there's a better way to do this

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @topper-123

@@ -37,7 +37,7 @@ def concat(
verify_integrity: bool = False,
sort: bool = False,
copy: bool = True,
):
) -> Union["DataFrame", "Series"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume FrameOrSeries doesn't apply here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the problem is the return type doesn't always follow a parameter type. Specifically if the input are series, and axis=1, the output is a DataFrame. We need Literals to do it better.

@@ -58,7 +58,9 @@ def pivot_table(
pieces.append(table)
keys.append(getattr(func, "__name__", func))

return concat(pieces, keys=keys, axis=1)
result = concat(pieces, keys=keys, axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can concat be overloaded to avoid this?

even without Literal for axis could try List[DataFrame] -> DataFrame and List[Series} -> Union[Series, DataFrame]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might work. I'll look into it.

@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Dec 31, 2019
@simonjayhawkins simonjayhawkins added this to the 1.0 milestone Dec 31, 2019
@pep8speaks
Copy link

pep8speaks commented Dec 31, 2019

Hello @topper-123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-05 12:20:24 UTC

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 31, 2019

I've overloaded concat. mypy demands at least two overloads, which is strange.

This is quite ugly, but allows us to drop the uglyness in other locations, so...:unamused:

@simonjayhawkins
Copy link
Member

I've overloaded concat. mypy demands at least two overloads, which is quite strange.

This is quite ugly, but allows us to drop the uglyness in other locations, so...😒

the overloads do end up being ugly (and introduces duplication) but is more benefical for users of types since they don't need to include #ignores, casts or workarounds.

@topper-123
Copy link
Contributor Author

topper-123 commented Dec 31, 2019

Yeah, I agree. It can be cleaned a bit when we reach 3.8. hopefully mypy will drop the duplication requirement.

The overload example in the python docs had no empty lines between overloaded definitions. Our Flake8 accepts that with a noqa directive, but pep8speaks doesn't. This is probaly not settled stylewise, but I like having no empty lines to signal the definitions belong together.

EDIT: Ok, while flake8 acceps no empty lines, black doesn't. So two empty lines it will be....

@topper-123 topper-123 force-pushed the type_top_level_funcs_II branch from 88f9556 to 03375c6 Compare December 31, 2019 15:03
@simonjayhawkins
Copy link
Member

xref #26766

def concat(
objs,
objs: Union[Sequence[FrameOrSeriesUnion], Mapping[str, FrameOrSeriesUnion]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the mapping key should be Optional[Hashable]

@@ -25,9 +25,28 @@
# ---------------------------------------------------------------------
# Concatenate DataFrame objects

FrameOrSeriesUnion = Union["DataFrame", "Series"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should put this in _typing as FrameOrSeries and rename FrameOrSeries as FrameOrSeriesT. @WillAyd

@WillAyd
Copy link
Member

WillAyd commented Dec 31, 2019 via email

@topper-123 topper-123 force-pushed the type_top_level_funcs_II branch 2 times, most recently from ccba86c to d68fe49 Compare December 31, 2019 18:29
@topper-123
Copy link
Contributor Author

I’ve made the changes.

@@ -41,7 +42,8 @@

Dtype = Union[str, np.dtype, "ExtensionDtype"]
FilePathOrBuffer = Union[str, Path, IO[AnyStr]]
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
FrameOrSeries = Union["DataFrame", "Series"]
FrameOrSeriesT = TypeVar("FrameOrSeriesT", bound="NDFrame")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment explaining the difference between FrameOrSeries vs FrameOrSeriesT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm after viewing the diff here I am second guess this a bit - is this really just for one-off things like concat? If so probably worth just dealing with exceptions in the individual signatures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this myself. I can remove it here and if we want this union in _typing.py later, it can be added then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more descriptive on the name and add a comment before these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe SameFrameOrSeries? It's difficult to find a descriptive name for the TypeVar that isn't very long. It will be used in a lot of places, so shouldn't be too long IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure that sounds better, definitely add some comments / sections here in typing with liberal explanations on when to use these

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find SameFrameOrSeries is ambiguous - is that supposed to represent the type or instance of the object being the same? Should be type as far as annotations are concerned but I don't think that's clear

I would be +1 with @simonjayhawkins naming suggestion as I think the T is a pseudo-convention seen most with TypeVars in typeshed

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but I think need to comment / explain when FrameOrSeries and FrameOrSeriesT (need a better name here as well), should be used. would comment in pandas._typing directly (generally we should have many more comments on what types of things these should cover, even if they are 'obvious'.

@@ -41,7 +42,8 @@

Dtype = Union[str, np.dtype, "ExtensionDtype"]
FilePathOrBuffer = Union[str, Path, IO[AnyStr]]
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
FrameOrSeries = Union["DataFrame", "Series"]
FrameOrSeriesT = TypeVar("FrameOrSeriesT", bound="NDFrame")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more descriptive on the name and add a comment before these

@@ -26,8 +28,25 @@
# Concatenate DataFrame objects


@overload
def concat(
objs: Union[Sequence["DataFrame"], Mapping[Hashable, "DataFrame"]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC None is a valid key and None is not a subtype of Hashable.

Suggested change
objs: Union[Sequence["DataFrame"], Mapping[Hashable, "DataFrame"]],
objs: Union[Sequence["DataFrame"], Mapping[Optional[Hashable], "DataFrame"]],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None is an instance of Hashable:

In [1]: from typing import Hashable
In [2]: isinstance(None, Hashable)
Out[2]: True

So Optional is not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, when typechecking

from typing import Hashable

a: Hashable = None
$ mypy test.py
test.py:3: error: Incompatible types in assignment (expression has type "None", variable has type "Hashable")
Found 1 error in 1 file (checked 1 source file)

Copy link
Contributor Author

@topper-123 topper-123 Jan 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but None is hashable (e.g. hash(None) and {None: 1} both work). So mypy is specialcasing None, which seems a bit like a bug to me.

Maybe we should have a Label = Optional[Hashable] in _typing.py? Would seem easier to remember to use that than to define Optional[Hashable] every time?

Copy link
Contributor Author

@topper-123 topper-123 Jan 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Optional. I did not add a Label type to _typing.py. Figured it can be done in seperate PR by searching for "Optional[Hashable]"

@topper-123 topper-123 force-pushed the type_top_level_funcs_II branch from d68fe49 to 87a9b1a Compare January 1, 2020 21:43
@topper-123
Copy link
Contributor Author

PR updated.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment. @simonjayhawkins @WillAyd

@topper-123 topper-123 force-pushed the type_top_level_funcs_II branch from 87a9b1a to ddcdb8a Compare January 2, 2020 02:55
@WillAyd
Copy link
Member

WillAyd commented Jan 2, 2020

I think @simonjayhawkins and I are proposing FrameOrSeriesT to denote a TypeVar version and go with a pseudo-standard of appending T for any TypeVars. Currently this is at SameFrameOrSeries which I am personally against as it is ambiguous what "Same" means

Is there a strong objection to the T standard?

@topper-123
Copy link
Contributor Author

topper-123 commented Jan 2, 2020

I don’t have a strong opinion either way, but I think ‘FrameOrSeries’ and ‘FrameOrSeriesT’ are too similar; it will be unnecessary difficult to glance at the types. Maybe ‘NDFrameT’ (signals better where it actually comes from)? Or a differrent name for the union type.

@simonjayhawkins
Copy link
Member

Maybe ‘NDFrameT’ (signals better where it actually comes from)?

prefer not. we may want to change this. see #29480

@topper-123
Copy link
Contributor Author

topper-123 commented Jan 2, 2020

FrameOrSeries for the TypeVar and FrameOrSeriesUnion for the union? The union should be used sparingly, so less of a problem if that has a long name.

@jreback
Copy link
Contributor

jreback commented Jan 3, 2020

FrameOrSeries for the TypeVar and FrameOrSeriesUnion for the union? The union should be used sparingly, so less of a problem if that has a long name.

sure

@topper-123 topper-123 force-pushed the type_top_level_funcs_II branch from ddcdb8a to 4270e57 Compare January 3, 2020 08:11
@topper-123 topper-123 force-pushed the type_top_level_funcs_II branch from 4270e57 to c32eef4 Compare January 3, 2020 08:19
@topper-123 topper-123 force-pushed the type_top_level_funcs_II branch from c32eef4 to 84d1c37 Compare January 5, 2020 12:20
@topper-123
Copy link
Contributor Author

Is this ok to take in?

@topper-123
Copy link
Contributor Author

The failure looks unrelated, looks cython-related.

@jreback jreback merged commit 66b4dbc into pandas-dev:master Jan 5, 2020
@jreback
Copy link
Contributor

jreback commented Jan 5, 2020

thanks !

@topper-123 topper-123 deleted the type_top_level_funcs_II branch January 5, 2020 16:27
@simonjayhawkins simonjayhawkins mentioned this pull request Sep 18, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants