CLN: type annotations in groupby.grouper, groupby.ops #29456

jbrockmendel · 2019-11-07T05:18:19Z

@simonjayhawkins mypy is still giving a couple of complaints I could use your help sorting out:

pandas/core/groupby/ops.py:791: error: Signature of "groupings" incompatible with supertype "BaseGrouper"
pandas/core/groupby/ops.py:872: error: Argument 1 of "_chop" is incompatible with supertype "DataSplitter"; supertype defines the argument type as "NDFrame"
pandas/core/groupby/ops.py:884: error: Argument 1 of "_chop" is incompatible with supertype "DataSplitter"; supertype defines the argument type as "NDFrame"

For the groupings complaint, AFAICT the attribute has the same annotation, but in the subclass its a property instead of defined in __init__. For the other two, I annotated an argument with NDFrame in the base class and overrode with Series and DataFrame in the subclasses. What is the preferred idiom for this pattern?

WillAyd

looks good some comments

WillAyd · 2019-11-07T16:06:07Z

pandas/core/groupby/ops.py

@@ -89,7 +89,7 @@ def __init__(

        self._filter_empty_groups = self.compressed = len(groupings) != 1
        self.axis = axis
-        self.groupings = groupings  # type: Sequence[grouper.Grouping]
+        self.groupings = list(groupings)  # type: List[grouper.Grouping]


Was this necessary? I think Sequence should have been allowable

This was an attempt to troubleshoot a (still-failing, mentioned in OP) complaint about a mismatch in the groupings annotations.

@simonjayhawkins any thoughts on why mypy is still complaining about groupings?

I think you're right about the property and the attribute.

diff --git a/pandas/core/groupby/ops.py b/pandas/core/groupby/ops.py index cbe012012..46a427434 100644 --- a/pandas/core/groupby/ops.py +++ b/pandas/core/groupby/ops.py @@ -90,12 +90,16 @@ class BaseGrouper: self._filter_empty_groups = self.compressed = len(groupings) != 1 self.axis = axis - self.groupings = list(groupings) # type: List[grouper.Grouping] + self._groupings = list(groupings) # type: List[grouper.Grouping] self.sort = sort self.group_keys = group_keys self.mutated = mutated self.indexer = indexer + @property + def groupings(self): + return self._groupings + @property def shape(self): return tuple(ping.ngroups for ping in self.groupings)

silences mypy if you're happy with this approach.

seems to work, updated

pandas/core/groupby/grouper.py

pandas/core/groupby/ops.py

…n-gb

jbrockmendel · 2019-11-10T17:47:05Z

Any issues remaining here?

simonjayhawkins · 2019-11-10T17:51:56Z

pandas/core/groupby/grouper.py

        sort : bool, default False
            whether the resulting grouper should be sorted
        """
+        assert obj is not None


why is this needed?

This is for my own benefit in trying to reason about this code.

simonjayhawkins · 2019-11-10T17:55:23Z

pandas/core/groupby/ops.py

        mutated = self.mutated
        splitter = self._get_splitter(data, axis=axis)
        group_keys = self._get_group_keys()
        result_values = None

-        sdata = splitter._get_sorted_data()
+        sdata = splitter._get_sorted_data()  # type: FrameOrSeries


why is this needed?

shouldn't need to add a type annotation here. maybe the return type of _get_sorted_data needs to be added.

_get_sorted_data return type is annotated, but mypy complains without this

can update to py3.6 syntax in a followon

no longer needed after e6c5f5a

…n-gb

simonjayhawkins · 2019-11-10T18:02:09Z

pandas/core/groupby/ops.py

        return self.data.take(self.sort_idx, axis=self.axis)

-    def _chop(self, sdata, slice_obj: slice):
+    def _chop(self, sdata, slice_obj: slice) -> NDFrame:


why is NDFrame used? is _chop not generic? should DataSplitter be a generic class?

I dont understand the question. Is "generic class" meaningfully different from "base class"? NDFrame is used because one subclass returns Series and the other returns DataFrame

DataSplitter.__init__ accepts FrameOrSeries. do we need to persist this type thoughout the class. i.e. make DataSplitter a generic class. see https://mypy.readthedocs.io/en/latest/generics.html#defining-generic-classes

so looking at the definition of _chop in the derived classes, i'm guessing this abstractmethod should be typed as

def _chop(self, sdata: FrameOrSeries, slice_obj: slice) -> FrameOrSeries:

Using FrameOrSeries here produces complaints:

pandas/core/groupby/ops.py:879: error: Argument 1 of "_chop" is incompatible with supertype "DataSplitter"; supertype defines the argument type as "FrameOrSeries" pandas/core/groupby/ops.py:879: error: Return type "Series" of "_chop" incompatible with return type "FrameOrSeries" in supertype "DataSplitter" pandas/core/groupby/ops.py:891: error: Argument 1 of "_chop" is incompatible with supertype "DataSplitter"; supertype defines the argument type as "FrameOrSeries" pandas/core/groupby/ops.py:891: error: Return type "DataFrame" of "_chop" incompatible with return type "FrameOrSeries" in supertype "DataSplitter"

I'm getting close to saying "screw it" when dealing with this type of error.

mypy won't be looking at the derived classes when it performs type checking. it'll be looking at the type hints on the base class when it checks other methods in the base class.

the abstractmethod should be generic since that is how the derived classes are typed Series -> Series and DataFrame -> DataFrame.

I think we are mixing a few different paradigms here. The subclasses should probably be annotated with the type respective to the class, rather than using the TypeVar, i.e. you would never parametrize a SeriesSplitter with a DataFrame - it exclusively deals with Series objects

you would never parametrize a SeriesSplitter with a DataFrame - it exclusively deals with Series objects

correct. but if a method of the base class is not overridden then the Series type in the derived class will become an NDFrame type after calling that method in the base class.

I think we are mixing a few different paradigms here.

There are some annotations in this PR that make it easier to reason about this code while reading it. The annotations in this sub-thread are not among them, so I do not particularly care about them. Let's focus for now on a minimal change needed to get this merged, as there are more bugfix PRs waiting in the wings.

e6c5f5a fixes this.

pandas/core/groupby/ops.py

simonjayhawkins · 2019-11-10T18:08:00Z

Any issues remaining here?

I think should use py3.6 syntax for variable annotations going forward and would also prefer to not to see changes to .format but use f-strings instead.

jbrockmendel · 2019-11-10T18:12:55Z

I think should use py3.6 syntax for variable annotations going forward and would also prefer to not to see changes to .format but use f-strings instead.

3.6 annotations ill try to get in the habit of. f-strings are going to take some getting used to

simonjayhawkins · 2019-11-10T18:16:19Z

f-strings are going to take some getting used to

just thinking it'll reduce churn as they are likely to be updated anyway. so why makes changes twice?

jbrockmendel · 2019-11-10T18:18:25Z

just thinking it'll reduce churn as they are likely to be updated anyway. so why makes changes twice?

No reason at all. That doesn't change the whole "I'm an old man and change will take some getting used to" thing.

…n-gb

jbrockmendel · 2019-11-12T21:19:17Z

rebased+green

jreback · 2019-11-12T23:46:26Z

lgtm (ex one comment), @WillAyd

WillAyd · 2019-11-13T00:39:34Z

Thanks @jbrockmendel

* Annotate groupby.ops * annotations, needs debugging * whitespace * types * circular import * fix msot mypy complaints * fix mypy groupings * merge cleanup

jbrockmendel added 5 commits November 6, 2019 20:41

Annotate groupby.ops

98b53d7

annotations, needs debugging

efd4a9b

whitespace

1933277

types

9b6a87a

circular import

d52add4

WillAyd requested changes Nov 7, 2019

View reviewed changes

jbrockmendel added 7 commits November 7, 2019 09:24

Merge branch 'master' of https://github.com/pandas-dev/pandas into cl…

dee81f6

…n-gb

fix msot mypy complaints

a7e6ad1

Merge branch 'master' of https://github.com/pandas-dev/pandas into cl…

59cdf0a

…n-gb

Merge branch 'master' of https://github.com/pandas-dev/pandas into cl…

6966fba

…n-gb

fix mypy groupings

f038302

Merge branch 'master' of https://github.com/pandas-dev/pandas into cl…

dc250f1

…n-gb

merge cleanup

0b28143

jbrockmendel mentioned this pull request Nov 9, 2019

Update MultiIndex checks #29494

Merged

gfyoung added Groupby Typing type annotations, mypy/pyright type checking labels Nov 9, 2019

simonjayhawkins reviewed Nov 10, 2019

View reviewed changes

Merge branch 'master' of https://github.com/pandas-dev/pandas into cl…

1dfd414

…n-gb

simonjayhawkins reviewed Nov 10, 2019

View reviewed changes

pandas/core/groupby/ops.py Show resolved Hide resolved

Merge branch 'master' of https://github.com/pandas-dev/pandas into cl…

6d3d485

…n-gb

jreback added this to the 1.0 milestone Nov 12, 2019

WillAyd approved these changes Nov 13, 2019

View reviewed changes

WillAyd merged commit 4b3027f into pandas-dev:master Nov 13, 2019

jbrockmendel deleted the cln-gb branch November 13, 2019 00:58

simonjayhawkins mentioned this pull request Oct 6, 2020

TYP: check_untyped_defs core.groupby.ops #36921

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: type annotations in groupby.grouper, groupby.ops #29456

CLN: type annotations in groupby.grouper, groupby.ops #29456

jbrockmendel commented Nov 7, 2019

WillAyd left a comment

WillAyd Nov 7, 2019

jbrockmendel Nov 7, 2019

jbrockmendel Nov 7, 2019

simonjayhawkins Nov 7, 2019

jbrockmendel Nov 7, 2019

jbrockmendel commented Nov 10, 2019

simonjayhawkins Nov 10, 2019

jbrockmendel Nov 10, 2019

simonjayhawkins Nov 10, 2019

jbrockmendel Nov 10, 2019

jreback Nov 12, 2019

simonjayhawkins Nov 13, 2019

simonjayhawkins Nov 10, 2019

jbrockmendel Nov 10, 2019

simonjayhawkins Nov 10, 2019

simonjayhawkins Nov 10, 2019

jbrockmendel Nov 10, 2019

simonjayhawkins Nov 10, 2019

WillAyd Nov 10, 2019

simonjayhawkins Nov 10, 2019

jbrockmendel Nov 10, 2019

simonjayhawkins Nov 13, 2019

simonjayhawkins commented Nov 10, 2019

jbrockmendel commented Nov 10, 2019

simonjayhawkins commented Nov 10, 2019

jbrockmendel commented Nov 10, 2019

jbrockmendel commented Nov 12, 2019

jreback commented Nov 12, 2019

WillAyd commented Nov 13, 2019

CLN: type annotations in groupby.grouper, groupby.ops #29456

CLN: type annotations in groupby.grouper, groupby.ops #29456

Conversation

jbrockmendel commented Nov 7, 2019

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Nov 10, 2019

jbrockmendel commented Nov 10, 2019

simonjayhawkins commented Nov 10, 2019

jbrockmendel commented Nov 10, 2019

jbrockmendel commented Nov 12, 2019

jreback commented Nov 12, 2019

WillAyd commented Nov 13, 2019