-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TYP: Fix typing issues, mainly Something => IndexLabel #53469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ebeb865
to
03dd4f7
Compare
I have a general request: I would like to have better names in general.
There is also It is not always clear to me which type to use where and when. |
Just a general comment here... The types used in the pandas code are meant to help with keeping the pandas code consistent. The types in pandas-stubs are meant for public consumption. There are some inconsistencies in naming between the two projects. See pandas-dev/pandas-stubs#128 and pandas-dev/pandas-stubs#161 for some discussion. I'm not going to look at the details of this PR, but if your goal is to have better naming for use in type checking user code that calls pandas, it might be a better contribution to do something in the pandas-stubs project instead of in pandas, or you could do both. |
pandas/core/frame.py
Outdated
# Force MultiIndex for single column | ||
if is_list_like(subset) and len(subset) == 1: | ||
# Force MultiIndex for a list_like subset with a single column | ||
if is_list_like(subset) and len(counts.index.names) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes because is_list_like
is not a TypeGuard
, so it isn't clear from the types that len(subset)
can be run
No, my goal is to fix wrong type hints in Pandas, specifically:
The naming is just an additional issue which makes it more difficult than it should be. I shouldn't have raised that topic here. |
pandas/core/frame.py
Outdated
# Force MultiIndex for single column | ||
if is_list_like(subset) and len(subset) == 1: | ||
# Force MultiIndex for a list_like subset with a single column | ||
if is_list_like(subset) and len(list(subset)) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line causes the error. The problem is that is_list_like
does not return a sort of TypeGuard
, so I don't know how to get len(subset)
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be easier to keep the code the same but add an ignore comment instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
pandas/core/frame.py
Outdated
@@ -7375,7 +7375,7 @@ def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame: | |||
result.columns = result.columns.swaplevel(i, j) | |||
return result | |||
|
|||
def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame: | |||
def reorder_levels(self, order: Sequence[Hashable], axis: Axis = 0) -> DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also update the doc-string: it specifically mentions only int/str right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I undid this change because I fear I would have to dig much deeper. Further functions called by reorder_levels are typed with int | str
where actually a lot of other types work as well - but I am not sure whether this is accidentally supported or not. It seems you can have a float label, for example - not that I've come across a use case.
In order not to dig deeper here, I got rid of this change and am limiting this commit to value_counts, stack and unstack where the changes should hopefully be uncontroversial. (lets hope I did the ignore comment correctly)
Wrong type hints for pandas users? Or for pandas developers? If it is the former, then you should also look at If you are using the pandas source to type check your user code, then it will be incomplete as compared to using |
Thanks @behrenhoff! |
Several typing annotations in Pandas are not correct. This patch tries to fix some of them.
The internal names in Pandas are not straight forward. An
IndexLabel
is used for one or multiple columns.value_counts
also works for a single columnstack
/unstack
can work with multiple levels at the same timeI have the feeling that there might be more such problems in the code. There are also some incorrect annotations in pandas-stubs mainly related to one/multiple index levels (refer to pandas-dev/pandas-stubs#703 - even though I haven't had time for a PR over there)