Skip to content

CLN/TYP: aggregation methods in core.base #36677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 2, 2020

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This is in order to move _aggregate and _aggregate_multiple_funcs into core.aggregation - I think that's desirable. Type-hinting is slightly more strong there (checks untyped defs), and would involve renaming self to obj. Thus the renaming of obj to selected_obj here.

@jbrockmendel
Copy link
Member

1 mypy complaint

@jreback jreback added the Typing type annotations, mypy/pyright type checking label Sep 27, 2020
@jreback jreback added this to the 1.2 milestone Sep 27, 2020
@rhshadrach
Copy link
Member Author

rhshadrach commented Sep 28, 2020

Thanks @jbrockmendel. The offending line is:

name = self.name if (self.ndim == 1) else None

A natural replacement would be to do:

if isinstance(self, ABCSeries):
    name = self.name
else:
    name = None

but in general I think we prefer checking ndim == 1 rather than isinstance for performance when the object is known to be a Series/DataFrame. So the other options I see here are to use cast (is self = cast(self, ...) okay?) or just silence mypy with # type: ignore (along with the specific code).

Any advice here @simonjayhawkins, @jbrockmendel, @jreback?

@jbrockmendel
Copy link
Member

i tend to use the ndim check, but no strong opinion

@rhshadrach
Copy link
Member Author

I fixed by using cast; ready for review now.

@@ -473,7 +478,11 @@ def is_any_frame() -> bool:
# we have a dict of scalars

# GH 36212 use name only if self is a series
name = self.name if (self.ndim == 1) else None
if self.ndim == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just rewrite this to if isinstance(self, ABCSeries): instead of checking ndim. It would be more consistent with other checks in the code base and get rid of the need for a cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, at the expense of being a little slower. On my machine, I'm seeing 127 ns for ndim and cast together and 156 ns for isinstance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is that because ABCSeries resolves to Any and hence mypy is no longer checking this. So I guess a cast will be needed in the future when we turn on more strict checking options for Any.

isinstance(.., ABC..) are generally slow and should normally be avoided if possible, but here it seems the ndim check is not that much more performant.

but this is OK for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 28 ns timing difference is not worth worrying about. If we wanted to instance check to be stricter we should probably just add a stub for the ABCs (or see if there's a mypy hook to strongly type them) but that is out of scope for this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stubs for ABCs would be good.

but I see three issues with the current change.

  1. impacted performance just to pass mypy. no matter how small
  2. changing the typing to resolve to Any just to pass mypy. bad practice
  3. code churn. no matter how trivial.

all three together and i'm now firmly -1 on this approach.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


# if we have a dict of any non-scalars
# eg. {'A' : ['mean']}, normalize all to
# be list-likes
if any(is_aggregator(x) for x in arg.values()):
new_arg = {}
new_arg: Dict[Label, Union[AggFuncTypeBase, List[AggFuncTypeBase]]] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal to this PR but seems strange that this is really a subset of AggFuncType - any chance we have test coverage for the other parts of that Union coming through this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're asking why this can't be typed Dict[Label, AggFuncType]. The last part of AggFuncType not included here are where the values are a Dict. This has been removed and will raise on 338 below. That it raises is tested in frame.apply.test_agg_dict_nested_renaming_depr.

@WillAyd
Copy link
Member

WillAyd commented Sep 30, 2020

@simonjayhawkins

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach generally lgtm.

despite my previous comment #36677 (comment) there is really no need to impact performance.

isinstance(..., ABC...) will always require a cast eventually. may as well do it now.

@rhshadrach
Copy link
Member Author

@simonjayhawkins Thanks for you're input. I've changed the code back to use if ndim == 1 along with cast. I think this is what you meant by "may as well [cast] now" - let me know if I misunderstood.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach lgtm

My objection was to ABCSeries. There is some history here #27353 and #34040

Particularly since it did NOT fix the mypy issue, just hid it, so was no better than a type: ignore

Series is already imported. So would not be against isinstance(self, Series) if that is more acceptable to others.

@jreback jreback merged commit fbb490c into pandas-dev:master Oct 2, 2020
@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

thanks @rhshadrach

@rhshadrach rhshadrach deleted the clean_agg branch October 11, 2020 13:22
@rhshadrach rhshadrach mentioned this pull request Oct 29, 2020
5 tasks
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants