-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN/TYP: aggregation methods in core.base #36677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1 mypy complaint |
Thanks @jbrockmendel. The offending line is:
A natural replacement would be to do:
but in general I think we prefer checking Any advice here @simonjayhawkins, @jbrockmendel, @jreback? |
i tend to use the ndim check, but no strong opinion |
I fixed by using cast; ready for review now. |
@@ -473,7 +478,11 @@ def is_any_frame() -> bool: | |||
# we have a dict of scalars | |||
|
|||
# GH 36212 use name only if self is a series | |||
name = self.name if (self.ndim == 1) else None | |||
if self.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just rewrite this to if isinstance(self, ABCSeries):
instead of checking ndim
. It would be more consistent with other checks in the code base and get rid of the need for a cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, at the expense of being a little slower. On my machine, I'm seeing 127 ns for ndim and cast together and 156 ns for isinstance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is that because ABCSeries resolves to Any
and hence mypy is no longer checking this. So I guess a cast will be needed in the future when we turn on more strict checking options for Any.
isinstance(.., ABC..) are generally slow and should normally be avoided if possible, but here it seems the ndim check is not that much more performant.
but this is OK for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A 28 ns timing difference is not worth worrying about. If we wanted to instance check to be stricter we should probably just add a stub for the ABCs (or see if there's a mypy hook to strongly type them) but that is out of scope for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stubs for ABCs would be good.
but I see three issues with the current change.
- impacted performance just to pass mypy. no matter how small
- changing the typing to resolve to Any just to pass mypy. bad practice
- code churn. no matter how trivial.
all three together and i'm now firmly -1 on this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
||
# if we have a dict of any non-scalars | ||
# eg. {'A' : ['mean']}, normalize all to | ||
# be list-likes | ||
if any(is_aggregator(x) for x in arg.values()): | ||
new_arg = {} | ||
new_arg: Dict[Label, Union[AggFuncTypeBase, List[AggFuncTypeBase]]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal to this PR but seems strange that this is really a subset of AggFuncType
- any chance we have test coverage for the other parts of that Union coming through this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're asking why this can't be typed Dict[Label, AggFuncType]
. The last part of AggFuncType not included here are where the values are a Dict. This has been removed and will raise on 338 below. That it raises is tested in frame.apply.test_agg_dict_nested_renaming_depr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rhshadrach generally lgtm.
despite my previous comment #36677 (comment) there is really no need to impact performance.
isinstance(..., ABC...) will always require a cast eventually. may as well do it now.
@simonjayhawkins Thanks for you're input. I've changed the code back to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rhshadrach lgtm
My objection was to ABCSeries. There is some history here #27353 and #34040
Particularly since it did NOT fix the mypy issue, just hid it, so was no better than a type: ignore
Series is already imported. So would not be against isinstance(self, Series)
if that is more acceptable to others.
thanks @rhshadrach |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This is in order to move _aggregate and _aggregate_multiple_funcs into core.aggregation - I think that's desirable. Type-hinting is slightly more strong there (checks untyped defs), and would involve renaming
self
toobj
. Thus the renaming ofobj
toselected_obj
here.