Skip to content

Allow covariance in the agg dict passed to DataFrame or Series groupby.agg() #363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 13, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions pandas-stubs/_typing.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -127,9 +127,9 @@ F = TypeVar("F", bound=FuncType)
HashableT = TypeVar("HashableT", bound=Hashable)

AggFuncTypeBase: TypeAlias = Union[Callable, str, np.ufunc]
AggFuncTypeDictSeries: TypeAlias = dict[Hashable, AggFuncTypeBase]
AggFuncTypeDictFrame: TypeAlias = dict[
Hashable, Union[AggFuncTypeBase, list[AggFuncTypeBase]]
AggFuncTypeDictSeries: TypeAlias = Mapping[HashableT, AggFuncTypeBase]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR but maybe for the future: It might be good to actually use a mapping that does not inherit from dict for tests that accept any mapping. Some pandas functions specifically check for dict/list/tuple, so it mightbe good to have Sequences/Mappings that do not inherit from list/tuple/dict.

AggFuncTypeDictFrame: TypeAlias = Mapping[
HashableT, Union[AggFuncTypeBase, list[AggFuncTypeBase]]
]
AggFuncTypeSeriesToFrame: TypeAlias = Union[
list[AggFuncTypeBase],
Expand Down
38 changes: 23 additions & 15 deletions tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,9 @@ def test_types_groupby_methods() -> None:


def test_types_groupby_agg() -> None:
df = pd.DataFrame(data={"col1": [1, 1, 2], "col2": [3, 4, 5], "col3": [0, 1, 0]})
df = pd.DataFrame(
data={"col1": [1, 1, 2], "col2": [3, 4, 5], "col3": [0, 1, 0], 0: [-1, -1, -1]}
)
check(assert_type(df.groupby("col1")["col3"].agg(min), pd.Series), pd.Series)
check(
assert_type(df.groupby("col1")["col3"].agg([min, max]), pd.DataFrame),
Expand All @@ -655,21 +657,24 @@ def test_types_groupby_agg() -> None:
assert_type(df.groupby("col1").agg(["min", "max"]), pd.DataFrame), pd.DataFrame
)
check(assert_type(df.groupby("col1").agg([min, max]), pd.DataFrame), pd.DataFrame)
agg_dict1 = {"col2": "min", "col3": "max", 0: "sum"}
check(assert_type(df.groupby("col1").agg(agg_dict1), pd.DataFrame), pd.DataFrame)
agg_dict2 = {"col2": min, "col3": max, 0: min}
check(assert_type(df.groupby("col1").agg(agg_dict2), pd.DataFrame), pd.DataFrame)
# Here, MyPy infers dict[object, object], so it must be explicitly annotated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could do:

def func(x):
    return x.min()

and then use func in place of lambda x: x.min() below, and then you may not need to type agg_dict3

I'm of the position that lambda funcs should be replaced with defined functions for the purpose of static typing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't seem to make a difference:

tests/test_frame.py:674: error: Argument 1 has incompatible type "Dict[object, object]"; expected "Union[Union[Callable[..., Any], str, ufunc], List[Union[Callable[..., Any], str, ufunc]], Mapping[Any, Union[Union[Callable[..., Any], str, ufunc], List[Union[Callable[..., Any], str, ufunc]]]]]" [arg-type]

I changed it to a regular function, although the pattern of passing a lambda as a dict element in agg() is pretty common, I'd say.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. mypy isn't inferring the types of a dict correctly, even for something as simple as mydict = { "abc" : "def", 0: "xyz" }. In this case, a reveal_type() produces "builtins.dict[builtins.object, builtins.str]"
  2. With respect to lambda, there is not much we can do here. There is not a way to annotate the parameter types or return types of a lambda, independent of pandas.

You could also have done wrapped_min: Callable[[Any], Any] = lambda x: x.min() so that the type of the lambda is known. See discussion here: https://stackoverflow.com/a/33833896/1970354

agg_dict3: dict[str | int, str | Callable] = {
"col2": min,
"col3": "max",
0: lambda x: x.min(),
}
check(assert_type(df.groupby("col1").agg(agg_dict3), pd.DataFrame), pd.DataFrame)
agg_dict4 = {"col2": "sum"}
check(assert_type(df.groupby("col1").agg(agg_dict4), pd.DataFrame), pd.DataFrame)
agg_dict5 = {0: "sum"}
check(assert_type(df.groupby("col1").agg(agg_dict5), pd.DataFrame), pd.DataFrame)
named_agg = pd.NamedAgg(column="col2", aggfunc="max")
check(
assert_type(
df.groupby("col1").agg({"col2": "min", "col3": "max"}), pd.DataFrame
),
pd.DataFrame,
)
check(
assert_type(df.groupby("col1").agg({"col2": min, "col3": max}), pd.DataFrame),
pd.DataFrame,
)
check(
assert_type(
df.groupby("col1").agg(new_col=pd.NamedAgg(column="col2", aggfunc="max")),
pd.DataFrame,
),
assert_type(df.groupby("col1").agg(new_col=named_agg), pd.DataFrame),
pd.DataFrame,
)
# GH#187
Expand All @@ -679,6 +684,9 @@ def test_types_groupby_agg() -> None:
cols_opt: list[str | None] = ["col1", "col2"]
check(assert_type(df.groupby(by=cols_opt).sum(), pd.DataFrame), pd.DataFrame)

cols_mixed: list[str | int] = ["col1", 0]
check(assert_type(df.groupby(by=cols_mixed).sum(), pd.DataFrame), pd.DataFrame)


# This was added in 1.1.0 https://pandas.pydata.org/docs/whatsnew/v1.1.0.html
def test_types_group_by_with_dropna_keyword() -> None:
Expand Down