Skip to content

CLN: remove util._decorators.make_signature and make related changes #26819

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 14, 2019

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jun 12, 2019

util._decorators.make_signature is only used in core.groupby.base.whitelist_method_generator.

However, in that function it is only used inside an if statement, that is conditional upon an attribute being a method. That however can not happen, as the function parameter only takes classes as an input, so that if-statement is never true and its content never executed. The net effect is that a nice chunk of code can just be removed, which is nice.

In the future, we should just use jus use inspect.signature or inspect.getfullargspecs for signature inspection.

@topper-123 topper-123 force-pushed the remove_make_signature branch from b945b74 to 3b9bcbe Compare June 12, 2019 20:56
@gfyoung gfyoung added the Clean label Jun 12, 2019
@codecov
Copy link

codecov bot commented Jun 12, 2019

Codecov Report

Merging #26819 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26819      +/-   ##
==========================================
+ Coverage   91.86%   91.86%   +<.01%     
==========================================
  Files         179      179              
  Lines       50700    50679      -21     
==========================================
- Hits        46576    46557      -19     
+ Misses       4124     4122       -2
Flag Coverage Δ
#multiple 90.45% <100%> (ø) ⬆️
#single 41.13% <100%> (-0.05%) ⬇️
Impacted Files Coverage Δ
pandas/util/_decorators.py 92.13% <ø> (+0.78%) ⬆️
pandas/core/groupby/base.py 100% <ø> (+8.16%) ⬆️
pandas/core/groupby/generic.py 89.33% <100%> (+0.16%) ⬆️
pandas/_typing.py 100% <100%> (ø) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8374a1d...bf07b51. Read the comment docs.

@topper-123 topper-123 force-pushed the remove_make_signature branch from 3b9bcbe to a0a15b4 Compare June 12, 2019 21:31
@topper-123
Copy link
Contributor Author

The failures are unrelated to this PR

["old_arg_name", "new_arg_name",
"mapping", "stacklevel"]))
])
def test_make_signature(func, expected):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be noted that validate_kwargs and deprecate_kwarg have their own test modules, so removing here it not an issue.

@WillAyd
Copy link
Member

WillAyd commented Jun 13, 2019

Makes sense to me. I think merging master should get rid of the failure.

Also annotations while you are looking at this would be great if they are easy to add

@WillAyd WillAyd added this to the 0.25.0 milestone Jun 13, 2019
@topper-123 topper-123 force-pushed the remove_make_signature branch 2 times, most recently from c3841fa to ddcf052 Compare June 13, 2019 08:07
@topper-123
Copy link
Contributor Author

I've rebased and added type hints.

Wrt. the type hints, I've added the typing specific internal imports inside a if TYPE_CHECKING statement in order to minimize the risk of circular import problems. I'm however not 100 % strong in correct typing usage, so I see that as point for discussion, not ny final view on pure type importing.


from pandas.core.dtypes.common import is_list_like, is_scalar

if TYPE_CHECKING:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this actually causing circular imports? We import DataFrame and Series elsewhere within this directory didn't think those would cause issues.

Would like to avoid the TYPE_CHECKING variable generally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this probably won't matter in light of other comments below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it wasn't. Just my nervousness that it might cause one in a future commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an FYI you can always use the quoted form of these rather than direct imports, e.g. for 'GroupBy'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand you here. Could you be more explicit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where a type is expected, for example DataFrame, you can simply use the string 'DataFrame' rather than importing it (this only works for types :->) e.g.

def func(df: 'DataFrame) -> 'DataFrame':
   ...

is legit. Of course we prefer to import, but we need to avoid circular deps (e.g. in groupby in particular).

Copy link
Contributor Author

@topper-123 topper-123 Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's legit, but it won't be infer correctly without the import?

At least in PyCharm it needs an import to recognize the type hint types do proper code completions, even when quoted.

EDIT: Likewise, mypy will complain if the imports aren't there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked again, and actually if I remove the if TYPE_CHECKING statement, it will cause a circular import error.

So I think the if-statement must remain in order to do type checking. Also, the DataFrame rather than ABCDataFrame gives better results for PyCharm, without negativ results for mypy, so I propose keeping it that way.

Although adding type hints does feel like walking in uncharted territory as far as figuring out what best-practice is.

Copy link
Member

@WillAyd WillAyd Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think this will be required for "GroupBy" since it is not in the same module. FWIW you can do this by putting reveal_type(base) in the body of whitelist_method_generator.

Just trying Type['GroupBy'] as the annotation without this import yields the below errors:

$ mypy pandas/core/groupby/base.py
pandas/core/groupby/base.py:93: error: Name 'GroupBy' is not defined
pandas/core/groupby/base.py:124: error: Revealed type is 'Type[Any]'

Whereas Type[GroupBy] with this import for mypy infers correctly:

$ mypy pandas/core/groupby/base.py
pandas/core/groupby/base.py:127: error: Revealed type is 'Type[pandas.core.groupby.groupby.GroupBy]'

So I think this is correct for GroupBy (separate conversations still for DataFrame and Series)

@@ -93,19 +94,21 @@ def _gotitem(self, key, ndim, subset=None):
'idxmax'])


def whitelist_method_generator(base, klass, whitelist):
def whitelist_method_generator(base: 'Type[GroupBy]',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is probably just used for GroupBy probably OK to just type this as Type, i.e. it could theoretically accept any Type not just a GroupBy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty descriptive here, if the more specific types work that is good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sure. Though I think should be Type['GroupBy'] then (note difference in quoting)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is an all-or-nothing approach to quoting. Else it's more work to check if a annotation is quoted or not.

Copy link
Member

@WillAyd WillAyd Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I totally follow your point but the intention of quoting is to provide forward references to objects before they are defined. There's no reason to do this with builtin typing objects

Here's the mypy documentation on that. You'll see a particular example where they populate a List with a forward reference, where the quoting only apply to the type variable. That's what I'm asking for here:

https://mypy.readthedocs.io/en/latest/kinds_of_types.html#class-name-forward-references

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though may not even want this at all:

String literal types must be defined (or imported) later in the same module. They cannot be used to leave cross-module references unresolved. (For dealing with import cycles, see Import cycles.)

Don't need this to be a blocker

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here if we don't like the import machinery is to just move this to generic where it is actually used. I don't see a very strong reason why this needs to be in base

Copy link
Contributor Author

@topper-123 topper-123 Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved whitelist_method_generator to generic.py. That does make the typing simpler. I've also added the TypeVar in _typing.

However, I suggest that unless we're fixing relatively simple typing issues (no new defined types in _typing.py etc.), that typing changes in the future go in separate commits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate commits are certainly fine for typing but not a PR unless major work needs to be done; we get much more mileage out of adding typing as we going in changed code (granted sometimes this will be slightly painful at the beginning)

@@ -93,19 +94,21 @@ def _gotitem(self, key, ndim, subset=None):
'idxmax'])


def whitelist_method_generator(base, klass, whitelist):
def whitelist_method_generator(base: 'Type[GroupBy]',
klass: 'Union[Type[DataFrame], Type[Series]]',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a TypeVar instead of a Union (i.e. we parametrize on one or the other of these, but don't use interchangeably within the function).

This is going to be a common idiom - could you add FrameOrSeries = TypeVar('FrameOrSeries', ABCSeries, ABCDataFrame) to pandas._typing and use that here instead (open to other naming suggestions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand TypeVars all too well, but my understanding is it it ensures that the same type comes out as came in. So, if a Series came in, and an object of the same TypeVar comes out, that means the returnm type is a Series (and not DataFrame). In that case, it will not matter in this case if we use Union or TypeVar, isn't that true?

Not opposed to using TypeVar, just trying to understand their use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea you've got that right! This comment might also be helpful

#26453 (comment)

So I don't think will make a difference here - just looking to get some kind of consistency in place as our annotations are still relatively immature

Copy link
Contributor Author

@topper-123 topper-123 Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It will be actually be more concise to use that TypeVar here (fewer type-specific imports), and a TypeVar('FrameOrSeries', Series, DataFrame) would make sense for Pandas in general.

In Pycharm TypeVar('FrameOrSeries', ABCSeries, ABCDataFrame) doesn't give the correct completions, while TypeVar('FrameOrSeries', Series, DataFrame) does give the right ones. any specific reason to use ABCSeries and ABCDataFrame over the actual classes?

I'm also thinking that the best version would be actually TypeVar('FrameOrSeries', NDFrame, Series, DataFrame), to be able to add type relevant hints to the NDFrame class. Any thoughts on this?

CORRECTION: TypeVar('FrameOrSeries', Series, DataFrame) doesn't give the right completions either, so that point doesn't matter. It still stands that ABCDataFrame does not give any completions, while using DataFrame as a type hint does give the correct completions.

CORRECTION no. 2: the above means that Union[DataFrame, Series] currently gives better completions in PyCharn than TypeVar. Is this just a limitation of PyCharm, or is my understanding of TyprVar wrong? Anyone using VSCode can say if they get the correct type hints from using TypVars?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABCSeries and ABCDataFrame are supposed to circumvent issues with import cycles. Unfortunately I don't use PyCharm so I can't speak to their support for those w.r.t. code completion.

I also don't think NDFrame should be a part of that as it's not a type that is used directly (i.e. typically only used as a superclass)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ABCSeries and ABCDataFrame are supposed to circumvent issues with import cycles.

To clarify though, if you see a better way of doing this certainly open to suggestions. I can't claim total expertise in our comprehensive usage of the ABC classes. @jreback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd, do you use VSCode? Do you get correct type hints from TypeVars?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Haven't used annotations for code completion so any insights you can offer there would be appreciated

@topper-123
Copy link
Contributor Author

Given that a Union works as well as a TypeVar in this case, and the required TypeVar would actually be TypeVar('FrameOrSeries', Type[ABCSeries], Type[ABCDataFrame]) (notice the extra Type), and this TypeVar would be used very infrequently, I propose just accepting it like is it ATM (a Union).

@WillAyd
Copy link
Member

WillAyd commented Jun 13, 2019

Might be mixing up a few things here. The TypeVar would look as follows:

FrameOrSeries = TypeVar('FrameOrSeries', ABCSeries, ABCDataFrame)

The annotation in question would then be Type[FrameOrSeries] as opposed to Type[Union[Series, DataFrame]]] as is now (perhaps some slight variations).

@WillAyd WillAyd added the Typing type annotations, mypy/pyright type checking label Jun 14, 2019
@topper-123 topper-123 force-pushed the remove_make_signature branch from ddcf052 to 17aa408 Compare June 14, 2019 10:37
@topper-123 topper-123 force-pushed the remove_make_signature branch from 17aa408 to bf07b51 Compare June 14, 2019 11:07
@WillAyd
Copy link
Member

WillAyd commented Jun 14, 2019

Change looks good not sure what's going on with Travis / NumPy failure

@@ -22,3 +22,5 @@
Timedelta)
Dtype = Union[str, np.dtype, ExtensionDtype]
FilePathOrBuffer = Union[str, Path, IO[AnyStr]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd we should put a line comment on when to use these / what these are if not completely obvious (and then maybe even so).

future PR of course

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm going to take another look at this module soon to see if there's a better way of doing things. Also wondering if we should use if TYPE_CHECKING imports in that module instead of the ABCs. Will check it out and post separately

Copy link
Contributor

@jreback jreback Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I'm going to take another look at this module soon to see if there's a better way of doing things. Also wondering if we should use if TYPE_CHECKING imports in that module instead of the ABCs. Will check it out and post separately

-1 on using TYPE_CHECCKING. I think the way you have it setup is very nice actually, unless I am missing something big.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that the ABC* classes aren't getting type checked correctly. Adding reveal_type(klass) to this PR where the TypeVar is used yields the following:

$ mypy pandas/core/groupby/generic.py
pandas/core/groupby/generic.py:83: error: Revealed type is 'Type[Any]'

That coupled with @topper-123 's comment around code completion makes me think the ABCs might all be Type[Any] since they are generated by create_pandas_abc_type, a function without any annotations itself (so implicitly Type[Any])

Going to investigate and open a separate issue accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair. I just don't want to introduce overhead for contributors, which typing especially with 'non-standard' things (e.g. cast) can do; we really want to minimize what people have to do (nothwithstanding maybe we have magic that makes things easy).

@jreback
Copy link
Contributor

jreback commented Jun 14, 2019

lgtm. I restarted the job, should pass. merge on green.

@jreback jreback merged commit 430f0fd into pandas-dev:master Jun 14, 2019
@jreback
Copy link
Contributor

jreback commented Jun 14, 2019

thanks @topper-123

@topper-123 topper-123 deleted the remove_make_signature branch June 14, 2019 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants