-
-
Notifications
You must be signed in to change notification settings - Fork 143
GH456 First attempt GroupBy.transform improved typing #1242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think it won't close 456, because that was also about agg
.
For
is there a convention to name the TypeAlias or should we keep the same name as in the pandas code? |
Probably shouldn't use the same name, because then the types won't match. Not that anyone should be importing that anyway. So keep the same name and append |
pandas-stubs/core/groupby/base.pyi
Outdated
|
||
@dataclasses.dataclass(order=True, frozen=True) | ||
class OutputKey: | ||
label: Hashable | ||
position: int | ||
|
||
reduction_kernels: TypeAlias = Literal[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as commented elsewhere, lets call this reduction_kernels_type
so that the types of the objects line up.
tests/test_series.py
Outdated
# type of `sum` not well inferred by mypy | ||
return sum(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use s.sum()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue was if I passed sum
the builtin one in the aggregate method directly, here is does not change anything.
tests/test_series.py
Outdated
return s.astype(float).min() | ||
|
||
s = pd.Series([1, 2, 3, 4]) | ||
s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you want a check(assert_type(...
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct my mistake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the inference on the fly of lambdas is not super clear so you need to define the function on the side to have the right types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is an issue with lambda functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think you can have a test of
check(assert_type( s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min()), pd.Series), pd.Series)
which would be worthwhile
tests/test_series.py
Outdated
# type of `sum` not well inferred by mypy | ||
return sum(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use s.sum()
check( | ||
assert_type(s.groupby(level=0).agg([min, sum]), pd.DataFrame), pd.DataFrame | ||
) | ||
|
||
|
||
def test_types_groupby_transform() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should add tests for two of the string transform arguments (e.g., "mean", "first")
def aggregate( | ||
self, | ||
func: AggFuncTypeBase | None = ..., | ||
/, | ||
*args, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this overload, you could add this overload:
@overload
def aggregate(
self,
func: Callable[[Series], S2],
*args,
engine: WindowingEngine = ...,
engine_kwargs: WindowingEngineKwargs = ...,
**kwargs,
) -> Series[S2]: ...
Then you know that if you start with a Series
with a known type, then the return type would be inferred from the callable. And it works with a lambda function, e.g.:
s = pd.Series([1, 2, 3, 4])
q = s.groupby([1, 1, 2, 2]).agg(lambda x: x.astype(float).min())
In this case, q
would have type Series[float]
, which is what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's because the type of new_func
isn't clear.
But I think it would work if you did check(assert_type(s.groupby([1,1,2,2]).agg(lambda x: x.astype(float).min()), "pd.Series[int]"), pd.Series, int)
Because then it can know that x
is a Series[int]
and that the lambda
becomes Series[int]
Can you try that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that for the last push, see
pandas-stubs/tests/test_series.py
Line 1167 in f9863d0
check(assert_type(s.groupby([1,1,2,2]).agg(lambda x: x.astype(float).min()), "pd.Series[float]"), pd.Series, int) |
It fails in all CI:
===========================================
Beginning: 'Run mypy on 'tests' (using the local stubs) and on the local stubs'
===========================================
tests/test_series.py:1167: error: Expression is of type "Series[Any]", not "Series[float]" [assert-type]
Found 1 error in 1 file (checked 224 source files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I look with how mypy
reads the type of the lambda, it has no idea about the type of x
:
tests/test_series.py:1168: note: Revealed type is "def (x: Any) -> Any"
so that may explain why it fails on lambda expressions whatsoever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - so we can leave the lambda
test in, but just have it assert_type()
against Series
instead of Series[float]
tests/test_series.py
Outdated
@@ -1147,6 +1165,9 @@ def func(s: pd.Series[int]) -> float: | |||
np.floating, | |||
) | |||
|
|||
# test below passes with mypy but pyright correctly sees it as pd.Series[float] | |||
# check(assert_type(s.groupby([1,1,2,2]).agg(lambda x: x.astype(float).min()), pd.Series), pd.Series, float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the commented test in there so it is still there and executes, since it works for both type checkers, but comment out the one that is "better" that has pyright
infer it as Series[float]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am forced to comment it out because pyright
sees it as pd.Series[float]
but mypy
sees it as pd.Series
so in both versions the CI will fail either for mypy or pyright step. What would you recommend doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you have to keep it commented out. Do you have a test like this that passes both checkers:
func: Callable[[pd.Series], float] = lambda x: x.astype(float).min()
check(assert_type(s.groupby([1,1,2,2]).agg(func), "pd.Series[float]"), pd.Series, float)
So you can have the "preferred" version in there commented out, but I think the above test would pass both type checkers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that also fails to pass with mypy (pyright is fine with it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a bunch of ideas and couldn't get it to work. It's probably a mypy bug, but I couldn't come up with a simple example that illustrates the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have to adjust that comment, then we are good to go.
tests/test_series.py
Outdated
np.floating, | ||
) | ||
|
||
# test below passes with mypy but pyright correctly sees it as pd.Series[float] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just have to change the comment to say "fails with mypy"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @loicdiridollou
assert_type()
to assert the type of any return value