-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: remove util._decorators.make_signature and make related changes #26819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b945b74
to
3b9bcbe
Compare
Codecov Report
@@ Coverage Diff @@
## master #26819 +/- ##
==========================================
+ Coverage 91.86% 91.86% +<.01%
==========================================
Files 179 179
Lines 50700 50679 -21
==========================================
- Hits 46576 46557 -19
+ Misses 4124 4122 -2
Continue to review full report at Codecov.
|
3b9bcbe
to
a0a15b4
Compare
The failures are unrelated to this PR |
["old_arg_name", "new_arg_name", | ||
"mapping", "stacklevel"])) | ||
]) | ||
def test_make_signature(func, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be noted that validate_kwargs
and deprecate_kwarg
have their own test modules, so removing here it not an issue.
Makes sense to me. I think merging master should get rid of the failure. Also annotations while you are looking at this would be great if they are easy to add |
c3841fa
to
ddcf052
Compare
I've rebased and added type hints. Wrt. the type hints, I've added the typing specific internal imports inside a |
pandas/core/groupby/base.py
Outdated
|
||
from pandas.core.dtypes.common import is_list_like, is_scalar | ||
|
||
if TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this actually causing circular imports? We import DataFrame and Series elsewhere within this directory didn't think those would cause issues.
Would like to avoid the TYPE_CHECKING variable generally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this probably won't matter in light of other comments below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it wasn't. Just my nervousness that it might cause one in a future commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as an FYI you can always use the quoted form of these rather than direct imports, e.g. for 'GroupBy'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand you here. Could you be more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where a type is expected, for example DataFrame, you can simply use the string 'DataFrame' rather than importing it (this only works for types :->) e.g.
def func(df: 'DataFrame) -> 'DataFrame':
...
is legit. Of course we prefer to import, but we need to avoid circular deps (e.g. in groupby in particular).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's legit, but it won't be infer correctly without the import?
At least in PyCharm it needs an import to recognize the type hint types do proper code completions, even when quoted.
EDIT: Likewise, mypy will complain if the imports aren't there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked again, and actually if I remove the if TYPE_CHECKING
statement, it will cause a circular import error.
So I think the if-statement must remain in order to do type checking. Also, the DataFrame rather than ABCDataFrame gives better results for PyCharm, without negativ results for mypy, so I propose keeping it that way.
Although adding type hints does feel like walking in uncharted territory as far as figuring out what best-practice is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think this will be required for "GroupBy" since it is not in the same module. FWIW you can do this by putting reveal_type(base)
in the body of whitelist_method_generator
.
Just trying Type['GroupBy']
as the annotation without this import yields the below errors:
$ mypy pandas/core/groupby/base.py
pandas/core/groupby/base.py:93: error: Name 'GroupBy' is not defined
pandas/core/groupby/base.py:124: error: Revealed type is 'Type[Any]'
Whereas Type[GroupBy]
with this import for mypy infers correctly:
$ mypy pandas/core/groupby/base.py
pandas/core/groupby/base.py:127: error: Revealed type is 'Type[pandas.core.groupby.groupby.GroupBy]'
So I think this is correct for GroupBy (separate conversations still for DataFrame and Series)
pandas/core/groupby/base.py
Outdated
@@ -93,19 +94,21 @@ def _gotitem(self, key, ndim, subset=None): | |||
'idxmax']) | |||
|
|||
|
|||
def whitelist_method_generator(base, klass, whitelist): | |||
def whitelist_method_generator(base: 'Type[GroupBy]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is probably just used for GroupBy probably OK to just type this as Type
, i.e. it could theoretically accept any Type not just a GroupBy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty descriptive here, if the more specific types work that is good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK sure. Though I think should be Type['GroupBy']
then (note difference in quoting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is an all-or-nothing approach to quoting. Else it's more work to check if a annotation is quoted or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I totally follow your point but the intention of quoting is to provide forward references to objects before they are defined. There's no reason to do this with builtin typing objects
Here's the mypy documentation on that. You'll see a particular example where they populate a List with a forward reference, where the quoting only apply to the type variable. That's what I'm asking for here:
https://mypy.readthedocs.io/en/latest/kinds_of_types.html#class-name-forward-references
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though may not even want this at all:
String literal types must be defined (or imported) later in the same module. They cannot be used to leave cross-module references unresolved. (For dealing with import cycles, see Import cycles.)
Don't need this to be a blocker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option here if we don't like the import machinery is to just move this to generic where it is actually used. I don't see a very strong reason why this needs to be in base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved whitelist_method_generator
to generic.py. That does make the typing simpler. I've also added the TypeVar in _typing.
However, I suggest that unless we're fixing relatively simple typing issues (no new defined types in _typing.py
etc.), that typing changes in the future go in separate commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate commits are certainly fine for typing but not a PR unless major work needs to be done; we get much more mileage out of adding typing as we going in changed code (granted sometimes this will be slightly painful at the beginning)
pandas/core/groupby/base.py
Outdated
@@ -93,19 +94,21 @@ def _gotitem(self, key, ndim, subset=None): | |||
'idxmax']) | |||
|
|||
|
|||
def whitelist_method_generator(base, klass, whitelist): | |||
def whitelist_method_generator(base: 'Type[GroupBy]', | |||
klass: 'Union[Type[DataFrame], Type[Series]]', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be a TypeVar instead of a Union (i.e. we parametrize on one or the other of these, but don't use interchangeably within the function).
This is going to be a common idiom - could you add FrameOrSeries = TypeVar('FrameOrSeries', ABCSeries, ABCDataFrame)
to pandas._typing and use that here instead (open to other naming suggestions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand TypeVars all too well, but my understanding is it it ensures that the same type comes out as came in. So, if a Series came in, and an object of the same TypeVar comes out, that means the returnm type is a Series (and not DataFrame). In that case, it will not matter in this case if we use Union or TypeVar, isn't that true?
Not opposed to using TypeVar, just trying to understand their use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea you've got that right! This comment might also be helpful
So I don't think will make a difference here - just looking to get some kind of consistency in place as our annotations are still relatively immature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. It will be actually be more concise to use that TypeVar here (fewer type-specific imports), and a TypeVar('FrameOrSeries', Series, DataFrame)
would make sense for Pandas in general.
In Pycharm TypeVar('FrameOrSeries', ABCSeries, ABCDataFrame)
doesn't give the correct completions, while TypeVar('FrameOrSeries', Series, DataFrame)
does give the right ones. any specific reason to use ABCSeries and ABCDataFrame over the actual classes?
I'm also thinking that the best version would be actually TypeVar('FrameOrSeries', NDFrame, Series, DataFrame)
, to be able to add type relevant hints to the NDFrame class. Any thoughts on this?
CORRECTION: TypeVar('FrameOrSeries', Series, DataFrame)
doesn't give the right completions either, so that point doesn't matter. It still stands that ABCDataFrame
does not give any completions, while using DataFrame
as a type hint does give the correct completions.
CORRECTION no. 2: the above means that Union[DataFrame, Series]
currently gives better completions in PyCharn than TypeVar. Is this just a limitation of PyCharm, or is my understanding of TyprVar wrong? Anyone using VSCode can say if they get the correct type hints from using TypVars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABCSeries and ABCDataFrame are supposed to circumvent issues with import cycles. Unfortunately I don't use PyCharm so I can't speak to their support for those w.r.t. code completion.
I also don't think NDFrame should be a part of that as it's not a type that is used directly (i.e. typically only used as a superclass)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABCSeries and ABCDataFrame are supposed to circumvent issues with import cycles.
To clarify though, if you see a better way of doing this certainly open to suggestions. I can't claim total expertise in our comprehensive usage of the ABC classes. @jreback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd, do you use VSCode? Do you get correct type hints from TypeVars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Haven't used annotations for code completion so any insights you can offer there would be appreciated
Given that a Union works as well as a TypeVar in this case, and the required TypeVar would actually be |
Might be mixing up a few things here. The TypeVar would look as follows: FrameOrSeries = TypeVar('FrameOrSeries', ABCSeries, ABCDataFrame) The annotation in question would then be |
ddcf052
to
17aa408
Compare
17aa408
to
bf07b51
Compare
Change looks good not sure what's going on with Travis / NumPy failure |
@@ -22,3 +22,5 @@ | |||
Timedelta) | |||
Dtype = Union[str, np.dtype, ExtensionDtype] | |||
FilePathOrBuffer = Union[str, Path, IO[AnyStr]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd we should put a line comment on when to use these / what these are if not completely obvious (and then maybe even so).
future PR of course
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm going to take another look at this module soon to see if there's a better way of doing things. Also wondering if we should use if TYPE_CHECKING
imports in that module instead of the ABCs. Will check it out and post separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm going to take another look at this module soon to see if there's a better way of doing things. Also wondering if we should use
if TYPE_CHECKING
imports in that module instead of the ABCs. Will check it out and post separately
-1 on using TYPE_CHECCKING. I think the way you have it setup is very nice actually, unless I am missing something big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that the ABC* classes aren't getting type checked correctly. Adding reveal_type(klass)
to this PR where the TypeVar is used yields the following:
$ mypy pandas/core/groupby/generic.py
pandas/core/groupby/generic.py:83: error: Revealed type is 'Type[Any]'
That coupled with @topper-123 's comment around code completion makes me think the ABCs might all be Type[Any]
since they are generated by create_pandas_abc_type
, a function without any annotations itself (so implicitly Type[Any]
)
Going to investigate and open a separate issue accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok fair. I just don't want to introduce overhead for contributors, which typing especially with 'non-standard' things (e.g. cast
) can do; we really want to minimize what people have to do (nothwithstanding maybe we have magic that makes things easy).
lgtm. I restarted the job, should pass. merge on green. |
thanks @topper-123 |
util._decorators.make_signature
is only used incore.groupby.base.whitelist_method_generator
.However, in that function it is only used inside an if statement, that is conditional upon an attribute being a method. That however can not happen, as the function parameter only takes classes as an input, so that if-statement is never true and its content never executed. The net effect is that a nice chunk of code can just be removed, which is nice.
In the future, we should just use jus use
inspect.signature
orinspect.getfullargspecs
for signature inspection.