-
-
Notifications
You must be signed in to change notification settings - Fork 141
DataFrame.groupby
should return _DataFrameGroupByScalar
when grouping by a PeriodIndex
#674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure this can be done without a lot of work to make the internal The main issue here is that there are so many different types that can be passed to I'll keep this open in case anyone wants to give this a try, but I think it will be very difficult to get this to work. For now, you'd have to do a cast on We could consider adding
to return Iterator[tuple[tuple | Scalar, DataFrame]] instead, or maybe it's more safe to not add Period to Scalar and use Iterator[tuple[tuple | Scalar | Period, DataFrame]] but that might break some existing type checking in other people's codes (including some of my staff...)
|
One other note - right now we have set up >>> import pandas as pd
>>> df = pd.DataFrame({0: range(2), 1: [i*i for i in range(2)]})
>>> df
0 1
0 0 0
1 1 1
>>> for a,b in df.groupby([0, 1]):
... print(a, b)
...
(0, 0) 0 1
0 0 0
(1, 1) 0 1
1 1 1
>>> df.columns=["x", "y"]
>>> for a,b in df.groupby([0, 1]):
... print(a, b)
...
0 x y
0 0 0
1 x y
1 1 1 Just changing the type of the column names changes the type of the result of iterating on the |
The quickest and least intrusive solution to me seems to be to extend the scalar overload on And then also add one new overload to handle This would not be a complete solution, but would at least improve the situation a little bit. |
That may work. As long as it doesn't break existing tests, I'm open to a PR with this proposed change (while adding appropriate tests). |
I'm happy to take a stab at it. I should have some time tomorrow. |
It looks like the inheritance pattern of One possible way to solve this would be to make This will however mean that we will have to |
I opted for ignoring the unsafe overlapping overload error for now, since it means fewer code changes. |
Describe the bug
When performing a groupby on a dataframe with a
PeriodIndex
the group identifier should be aPeriod
, i.e. a scalar (On that notePeriod
appears to be missing from the definition ofScalar
in_typing.pyi
)To Reproduce
Additional comments
This should probably extend to most other specialized Index types apart from the base
Index
andMultiIndex
, for the latter it's obvious that it would be non-scalar, for the former it would depend on what Python object is referenced, since convertingMultiIndex
to and from anIndex
on abuiltins.tuple
is quite common, it's probably the right move to treatIndex
as non-scalar by default or return anAny
variant ofDataFrameGroupBy
that could either be scalar or non-scalar.The text was updated successfully, but these errors were encountered: