Allow covariance in the agg dict passed to DataFrame or Series groupby.agg() #363

gandhis1 · 2022-10-06T03:45:14Z

Closes GroupBy agg rejects a dict argument #361
Tests added: Please use assert_type() to assert the type of any return value

twoertwein · 2022-10-06T14:22:12Z

tests/test_frame.py

@@ -655,21 +660,19 @@ def test_types_groupby_agg() -> None:
        assert_type(df.groupby("col1").agg(["min", "max"]), pd.DataFrame), pd.DataFrame
    )
    check(assert_type(df.groupby("col1").agg([min, max]), pd.DataFrame), pd.DataFrame)
+    agg_dict1: dict[Hashable, str] = {"col2": "min", "col3": "max", 0: "sum"}


Does it also work when the type is dict[str, str]? Might need to change the keys to HashableT (or simply Any).

Fixed

I mentioned this in my other PR, but it wasn't clear to me what the consensus type was for a column or index label. I see Hashable in some places, and Scalar in others.

Large unions such as Scalar as a return value don't provide that much benefit - if you require a specific return type, you will need to cast it. When using the Hashable protocol, you also need to cast it but it feels a bit cleaner to me (especially because it also contains covers some more unusually types that are not yet added to Scalar).

As an input argument, it could make sense to use Scalar, if it helps with overlapping overloads. Using Hashable + hoping that type checkers continue to pick the first overload + ignoring the overlapping overload, might actually be an okay'ish solution.

tests/test_frame.py

…y.agg()

…e and adjust type annotation to support this

Dr-Irv · 2022-10-08T01:57:55Z

tests/test_frame.py

+    check(assert_type(df.groupby("col1").agg(agg_dict1), pd.DataFrame), pd.DataFrame)
+    agg_dict2 = {"col2": min, "col3": max, 0: min}
+    check(assert_type(df.groupby("col1").agg(agg_dict2), pd.DataFrame), pd.DataFrame)
+    # Here, MyPy infers dict[object, object], so it must be explicitly annotated


I think you could do:

def func(x): return x.min()

and then use func in place of lambda x: x.min() below, and then you may not need to type agg_dict3

I'm of the position that lambda funcs should be replaced with defined functions for the purpose of static typing.

That doesn't seem to make a difference:

tests/test_frame.py:674: error: Argument 1 has incompatible type "Dict[object, object]"; expected "Union[Union[Callable[..., Any], str, ufunc], List[Union[Callable[..., Any], str, ufunc]], Mapping[Any, Union[Union[Callable[..., Any], str, ufunc], List[Union[Callable[..., Any], str, ufunc]]]]]" [arg-type]

I changed it to a regular function, although the pattern of passing a lambda as a dict element in agg() is pretty common, I'd say.

Two comments here:

mypy isn't inferring the types of a dict correctly, even for something as simple as mydict = { "abc" : "def", 0: "xyz" }. In this case, a reveal_type() produces "builtins.dict[builtins.object, builtins.str]"

With respect to lambda, there is not much we can do here. There is not a way to annotate the parameter types or return types of a lambda, independent of pandas.

You could also have done wrapped_min: Callable[[Any], Any] = lambda x: x.min() so that the type of the lambda is known. See discussion here: https://stackoverflow.com/a/33833896/1970354

twoertwein · 2022-10-08T23:10:57Z

pandas-stubs/_typing.pyi

-AggFuncTypeDictSeries: TypeAlias = dict[Hashable, AggFuncTypeBase]
-AggFuncTypeDictFrame: TypeAlias = dict[
-    Hashable, Union[AggFuncTypeBase, list[AggFuncTypeBase]]
+AggFuncTypeDictSeries: TypeAlias = Mapping[HashableT, AggFuncTypeBase]


Not for this PR but maybe for the future: It might be good to actually use a mapping that does not inherit from dict for tests that accept any mapping. Some pandas functions specifically check for dict/list/tuple, so it mightbe good to have Sequences/Mappings that do not inherit from list/tuple/dict.

Dr-Irv

Thanks @gandhis1

gandhis1 force-pushed the groupby branch from 57b7213 to bbe4f35 Compare October 6, 2022 03:46

gandhis1 marked this pull request as ready for review October 6, 2022 04:05

twoertwein reviewed Oct 6, 2022

View reviewed changes

Dr-Irv requested changes Oct 6, 2022

View reviewed changes

tests/test_frame.py Outdated Show resolved Hide resolved

tests/test_frame.py Outdated Show resolved Hide resolved

tests/test_frame.py Outdated Show resolved Hide resolved

tests/test_frame.py Outdated Show resolved Hide resolved

gandhis1 added 3 commits October 7, 2022 19:16

Allow covariance in the agg dict passed to DataFrame or Series groupb…

c4e37ab

…y.agg()

Add test cases for agg dicts with keys which are sub-types of Hashabl…

19655c2

…e and adjust type annotation to support this

Remove/adjust agg dict annotations on test

13f3144

gandhis1 force-pushed the groupby branch from 1048c48 to 13f3144 Compare October 7, 2022 23:23

Dr-Irv reviewed Oct 8, 2022

View reviewed changes

twoertwein reviewed Oct 8, 2022

View reviewed changes

Change lambda to a local function

32908f3

Dr-Irv approved these changes Oct 13, 2022

View reviewed changes

Dr-Irv merged commit f051cd7 into pandas-dev:main Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow covariance in the agg dict passed to DataFrame or Series groupby.agg() #363

Allow covariance in the agg dict passed to DataFrame or Series groupby.agg() #363

gandhis1 commented Oct 6, 2022

twoertwein Oct 6, 2022

gandhis1 Oct 6, 2022

twoertwein Oct 6, 2022

Dr-Irv Oct 8, 2022

gandhis1 Oct 12, 2022

Dr-Irv Oct 13, 2022

twoertwein Oct 8, 2022

Dr-Irv left a comment

Allow covariance in the agg dict passed to DataFrame or Series groupby.agg() #363

Allow covariance in the agg dict passed to DataFrame or Series groupby.agg() #363

Conversation

gandhis1 commented Oct 6, 2022

twoertwein Oct 6, 2022

Choose a reason for hiding this comment

gandhis1 Oct 6, 2022

Choose a reason for hiding this comment

twoertwein Oct 6, 2022

Choose a reason for hiding this comment

Dr-Irv Oct 8, 2022

Choose a reason for hiding this comment

gandhis1 Oct 12, 2022

Choose a reason for hiding this comment

Dr-Irv Oct 13, 2022

Choose a reason for hiding this comment

twoertwein Oct 8, 2022

Choose a reason for hiding this comment

Dr-Irv left a comment

Choose a reason for hiding this comment