-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: type annotations in groupby.grouper, groupby.ops #29456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good some comments
pandas/core/groupby/ops.py
Outdated
@@ -89,7 +89,7 @@ def __init__( | |||
|
|||
self._filter_empty_groups = self.compressed = len(groupings) != 1 | |||
self.axis = axis | |||
self.groupings = groupings # type: Sequence[grouper.Grouping] | |||
self.groupings = list(groupings) # type: List[grouper.Grouping] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this necessary? I think Sequence should have been allowable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an attempt to troubleshoot a (still-failing, mentioned in OP) complaint about a mismatch in the groupings
annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins any thoughts on why mypy is still complaining about groupings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right about the property and the attribute.
diff --git a/pandas/core/groupby/ops.py b/pandas/core/groupby/ops.py
index cbe012012..46a427434 100644
--- a/pandas/core/groupby/ops.py
+++ b/pandas/core/groupby/ops.py
@@ -90,12 +90,16 @@ class BaseGrouper:
self._filter_empty_groups = self.compressed = len(groupings) != 1
self.axis = axis
- self.groupings = list(groupings) # type: List[grouper.Grouping]
+ self._groupings = list(groupings) # type: List[grouper.Grouping]
self.sort = sort
self.group_keys = group_keys
self.mutated = mutated
self.indexer = indexer
+ @property
+ def groupings(self):
+ return self._groupings
+
@property
def shape(self):
return tuple(ping.ngroups for ping in self.groupings)
silences mypy if you're happy with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to work, updated
Any issues remaining here? |
sort : bool, default False | ||
whether the resulting grouper should be sorted | ||
""" | ||
assert obj is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for my own benefit in trying to reason about this code.
mutated = self.mutated | ||
splitter = self._get_splitter(data, axis=axis) | ||
group_keys = self._get_group_keys() | ||
result_values = None | ||
|
||
sdata = splitter._get_sorted_data() | ||
sdata = splitter._get_sorted_data() # type: FrameOrSeries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed?
shouldn't need to add a type annotation here. maybe the return type of _get_sorted_data
needs to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_sorted_data return type is annotated, but mypy complains without this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can update to py3.6 syntax in a followon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed after e6c5f5a
return self.data.take(self.sort_idx, axis=self.axis) | ||
|
||
def _chop(self, sdata, slice_obj: slice): | ||
def _chop(self, sdata, slice_obj: slice) -> NDFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is NDFrame used? is _chop not generic? should DataSplitter be a generic class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand the question. Is "generic class" meaningfully different from "base class"? NDFrame is used because one subclass returns Series and the other returns DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataSplitter.__init__
accepts FrameOrSeries
. do we need to persist this type thoughout the class. i.e. make DataSplitter a generic class. see https://mypy.readthedocs.io/en/latest/generics.html#defining-generic-classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so looking at the definition of _chop in the derived classes, i'm guessing this abstractmethod should be typed as
def _chop(self, sdata: FrameOrSeries, slice_obj: slice) -> FrameOrSeries:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using FrameOrSeries here produces complaints:
pandas/core/groupby/ops.py:879: error: Argument 1 of "_chop" is incompatible with supertype "DataSplitter"; supertype defines the argument type as "FrameOrSeries"
pandas/core/groupby/ops.py:879: error: Return type "Series" of "_chop" incompatible with return type "FrameOrSeries" in supertype "DataSplitter"
pandas/core/groupby/ops.py:891: error: Argument 1 of "_chop" is incompatible with supertype "DataSplitter"; supertype defines the argument type as "FrameOrSeries"
pandas/core/groupby/ops.py:891: error: Return type "DataFrame" of "_chop" incompatible with return type "FrameOrSeries" in supertype "DataSplitter"
I'm getting close to saying "screw it" when dealing with this type of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mypy won't be looking at the derived classes when it performs type checking. it'll be looking at the type hints on the base class when it checks other methods in the base class.
the abstractmethod should be generic since that is how the derived classes are typed Series -> Series and DataFrame -> DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are mixing a few different paradigms here. The subclasses should probably be annotated with the type respective to the class, rather than using the TypeVar, i.e. you would never parametrize a SeriesSplitter
with a DataFrame - it exclusively deals with Series objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would never parametrize a
SeriesSplitter
with a DataFrame - it exclusively deals with Series objects
correct. but if a method of the base class is not overridden then the Series type in the derived class will become an NDFrame type after calling that method in the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are mixing a few different paradigms here.
There are some annotations in this PR that make it easier to reason about this code while reading it. The annotations in this sub-thread are not among them, so I do not particularly care about them. Let's focus for now on a minimal change needed to get this merged, as there are more bugfix PRs waiting in the wings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e6c5f5a fixes this.
I think should use py3.6 syntax for variable annotations going forward and would also prefer to not to see changes to .format but use f-strings instead. |
3.6 annotations ill try to get in the habit of. f-strings are going to take some getting used to |
just thinking it'll reduce churn as they are likely to be updated anyway. so why makes changes twice? |
No reason at all. That doesn't change the whole "I'm an old man and change will take some getting used to" thing. |
rebased+green |
lgtm (ex one comment), @WillAyd |
Thanks @jbrockmendel |
* Annotate groupby.ops * annotations, needs debugging * whitespace * types * circular import * fix msot mypy complaints * fix mypy groupings * merge cleanup
* Annotate groupby.ops * annotations, needs debugging * whitespace * types * circular import * fix msot mypy complaints * fix mypy groupings * merge cleanup
* Annotate groupby.ops * annotations, needs debugging * whitespace * types * circular import * fix msot mypy complaints * fix mypy groupings * merge cleanup
@simonjayhawkins mypy is still giving a couple of complaints I could use your help sorting out:
For the groupings complaint, AFAICT the attribute has the same annotation, but in the subclass its a property instead of defined in
__init__
. For the other two, I annotated an argument withNDFrame
in the base class and overrode withSeries
andDataFrame
in the subclasses. What is the preferred idiom for this pattern?