-
-
Notifications
You must be signed in to change notification settings - Fork 141
Allow covariance in the agg dict passed to DataFrame or Series groupby.agg() #363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests/test_frame.py
Outdated
@@ -655,21 +660,19 @@ def test_types_groupby_agg() -> None: | |||
assert_type(df.groupby("col1").agg(["min", "max"]), pd.DataFrame), pd.DataFrame | |||
) | |||
check(assert_type(df.groupby("col1").agg([min, max]), pd.DataFrame), pd.DataFrame) | |||
agg_dict1: dict[Hashable, str] = {"col2": "min", "col3": "max", 0: "sum"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also work when the type is dict[str, str]
? Might need to change the keys to HashableT
(or simply Any
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
I mentioned this in my other PR, but it wasn't clear to me what the consensus type was for a column or index label. I see Hashable
in some places, and Scalar
in others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large unions such as Scalar
as a return value don't provide that much benefit - if you require a specific return type, you will need to cast it. When using the Hashable protocol, you also need to cast it but it feels a bit cleaner to me (especially because it also contains covers some more unusually types that are not yet added to Scalar
).
As an input argument, it could make sense to use Scalar
, if it helps with overlapping overloads. Using Hashable + hoping that type checkers continue to pick the first overload + ignoring the overlapping overload, might actually be an okay'ish solution.
…e and adjust type annotation to support this
check(assert_type(df.groupby("col1").agg(agg_dict1), pd.DataFrame), pd.DataFrame) | ||
agg_dict2 = {"col2": min, "col3": max, 0: min} | ||
check(assert_type(df.groupby("col1").agg(agg_dict2), pd.DataFrame), pd.DataFrame) | ||
# Here, MyPy infers dict[object, object], so it must be explicitly annotated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could do:
def func(x):
return x.min()
and then use func
in place of lambda x: x.min()
below, and then you may not need to type agg_dict3
I'm of the position that lambda
funcs should be replaced with defined functions for the purpose of static typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't seem to make a difference:
tests/test_frame.py:674: error: Argument 1 has incompatible type "Dict[object, object]"; expected "Union[Union[Callable[..., Any], str, ufunc], List[Union[Callable[..., Any], str, ufunc]], Mapping[Any, Union[Union[Callable[..., Any], str, ufunc], List[Union[Callable[..., Any], str, ufunc]]]]]" [arg-type]
I changed it to a regular function, although the pattern of passing a lambda
as a dict
element in agg()
is pretty common, I'd say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments here:
mypy
isn't inferring the types of a dict correctly, even for something as simple asmydict = { "abc" : "def", 0: "xyz" }
. In this case, areveal_type()
produces"builtins.dict[builtins.object, builtins.str]"
- With respect to
lambda
, there is not much we can do here. There is not a way to annotate the parameter types or return types of alambda
, independent ofpandas
.
You could also have done wrapped_min: Callable[[Any], Any] = lambda x: x.min()
so that the type of the lambda
is known. See discussion here: https://stackoverflow.com/a/33833896/1970354
AggFuncTypeDictSeries: TypeAlias = dict[Hashable, AggFuncTypeBase] | ||
AggFuncTypeDictFrame: TypeAlias = dict[ | ||
Hashable, Union[AggFuncTypeBase, list[AggFuncTypeBase]] | ||
AggFuncTypeDictSeries: TypeAlias = Mapping[HashableT, AggFuncTypeBase] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR but maybe for the future: It might be good to actually use a mapping that does not inherit from dict
for tests that accept any mapping. Some pandas functions specifically check for dict/list/tuple, so it mightbe good to have Sequences/Mappings that do not inherit from list/tuple/dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gandhis1
assert_type()
to assert the type of any return value