TYP: Fix typing issues, mainly Something => IndexLabel #53469

behrenhoff · 2023-05-31T14:19:20Z

Several typing annotations in Pandas are not correct. This patch tries to fix some of them.

The internal names in Pandas are not straight forward. An IndexLabel is used for one or multiple columns.

value_counts also works for a single column
stack/unstack can work with multiple levels at the same time

I have the feeling that there might be more such problems in the code. There are also some incorrect annotations in pandas-stubs mainly related to one/multiple index levels (refer to pandas-dev/pandas-stubs#703 - even though I haven't had time for a PR over there)

behrenhoff · 2023-05-31T16:41:19Z

I have a general request: I would like to have better names in general.

Axis (singular) is int or a Literal['columns', ...]. So far so good.
Axes (plural) is a ListLike and is NOT used for multiple axis but instead for columns: Axes - a list of columns (or rows)
IndexLabel seems to be used for one or multiple columns (or rows) - it is a Sequence[Hashable]|Hashable. So that means it does not support a pd.Index?

There is also Level = Hashable. When functions take one or multiple levels, that makes it Level | Sequence[Level] - or IndexLabel again. Is that a good name or should there be a LevelOrLevels alias? (I hate "Or" in names but cannot come up with something better that is as clear)

It is not always clear to me which type to use where and when.

Dr-Irv · 2023-06-01T02:05:43Z

Just a general comment here...

The types used in the pandas code are meant to help with keeping the pandas code consistent.

The types in pandas-stubs are meant for public consumption.

There are some inconsistencies in naming between the two projects. See pandas-dev/pandas-stubs#128 and pandas-dev/pandas-stubs#161 for some discussion.

I'm not going to look at the details of this PR, but if your goal is to have better naming for use in type checking user code that calls pandas, it might be a better contribution to do something in the pandas-stubs project instead of in pandas, or you could do both.

behrenhoff · 2023-06-01T07:58:10Z

pandas/core/frame.py

-        # Force MultiIndex for single column
-        if is_list_like(subset) and len(subset) == 1:
+        # Force MultiIndex for a list_like subset with a single column
+        if is_list_like(subset) and len(counts.index.names) == 1:


changes because is_list_like is not a TypeGuard, so it isn't clear from the types that len(subset) can be run

behrenhoff · 2023-06-01T07:59:31Z

I'm not going to look at the details of this PR, but if your goal is to have better naming for use in type checking user code that calls pandas

No, my goal is to fix wrong type hints in Pandas, specifically:

value_counts also works for a single column but the current type hints don't cover that
stack/unstack can work with multiple levels at the same time but the current type hints don't cover that

The naming is just an additional issue which makes it more difficult than it should be. I shouldn't have raised that topic here.

behrenhoff · 2023-06-01T12:18:10Z

pandas/core/frame.py

-        # Force MultiIndex for single column
-        if is_list_like(subset) and len(subset) == 1:
+        # Force MultiIndex for a list_like subset with a single column
+        if is_list_like(subset) and len(list(subset)) == 1:


This line causes the error. The problem is that is_list_like does not return a sort of TypeGuard, so I don't know how to get len(subset) here.

It might be easier to keep the code the same but add an ignore comment instead.

twoertwein · 2023-06-01T13:29:20Z

pandas/core/frame.py

@@ -7375,7 +7375,7 @@ def swaplevel(self, i: Axis = -2, j: Axis = -1, axis: Axis = 0) -> DataFrame:
            result.columns = result.columns.swaplevel(i, j)
        return result

-    def reorder_levels(self, order: Sequence[int | str], axis: Axis = 0) -> DataFrame:
+    def reorder_levels(self, order: Sequence[Hashable], axis: Axis = 0) -> DataFrame:


Can you please also update the doc-string: it specifically mentions only int/str right now.

I undid this change because I fear I would have to dig much deeper. Further functions called by reorder_levels are typed with int | str where actually a lot of other types work as well - but I am not sure whether this is accidentally supported or not. It seems you can have a float label, for example - not that I've come across a use case.

In order not to dig deeper here, I got rid of this change and am limiting this commit to value_counts, stack and unstack where the changes should hopefully be uncontroversial. (lets hope I did the ignore comment correctly)

Dr-Irv · 2023-06-05T17:25:18Z

No, my goal is to fix wrong type hints in Pandas, specifically:

Wrong type hints for pandas users? Or for pandas developers?

If it is the former, then you should also look at pandas-stubs . If it is the latter, then that's fine.

If you are using the pandas source to type check your user code, then it will be incomplete as compared to using pandas-stubs .

twoertwein · 2023-06-10T15:55:44Z

Thanks @behrenhoff!

) TYP: Fix typing issues, Label => IndexLabel

behrenhoff force-pushed the fix-types branch 2 times, most recently from ebeb865 to 03dd4f7 Compare May 31, 2023 16:26

behrenhoff force-pushed the fix-types branch from 03dd4f7 to 33d7ed6 Compare June 1, 2023 07:56

behrenhoff commented Jun 1, 2023

View reviewed changes

behrenhoff force-pushed the fix-types branch from 33d7ed6 to 936b1aa Compare June 1, 2023 09:13

behrenhoff commented Jun 1, 2023

View reviewed changes

twoertwein reviewed Jun 1, 2023

View reviewed changes

TYP: Fix typing issues, Label => IndexLabel

eab9f9f

behrenhoff force-pushed the fix-types branch from 936b1aa to eab9f9f Compare June 9, 2023 11:43

twoertwein approved these changes Jun 10, 2023

View reviewed changes

twoertwein merged commit 5cf5c73 into pandas-dev:main Jun 10, 2023

mroeschke added this to the 2.1 milestone Jun 10, 2023

Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023

TYP: Fix typing issues, mainly Something => IndexLabel (pandas-dev#53469

dd52206

) TYP: Fix typing issues, Label => IndexLabel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP: Fix typing issues, mainly Something => IndexLabel #53469

TYP: Fix typing issues, mainly Something => IndexLabel #53469

behrenhoff commented May 31, 2023 •

edited

Loading

behrenhoff commented May 31, 2023 •

edited

Loading

Dr-Irv commented Jun 1, 2023

behrenhoff Jun 1, 2023

behrenhoff commented Jun 1, 2023

behrenhoff Jun 1, 2023

twoertwein Jun 1, 2023

behrenhoff Jun 1, 2023

twoertwein Jun 1, 2023

behrenhoff Jun 9, 2023

Dr-Irv commented Jun 5, 2023

twoertwein commented Jun 10, 2023

TYP: Fix typing issues, mainly Something => IndexLabel #53469

TYP: Fix typing issues, mainly Something => IndexLabel #53469

Conversation

behrenhoff commented May 31, 2023 • edited Loading

behrenhoff commented May 31, 2023 • edited Loading

Dr-Irv commented Jun 1, 2023

behrenhoff Jun 1, 2023

Choose a reason for hiding this comment

behrenhoff commented Jun 1, 2023

behrenhoff Jun 1, 2023

Choose a reason for hiding this comment

twoertwein Jun 1, 2023

Choose a reason for hiding this comment

behrenhoff Jun 1, 2023

Choose a reason for hiding this comment

twoertwein Jun 1, 2023

Choose a reason for hiding this comment

behrenhoff Jun 9, 2023

Choose a reason for hiding this comment

Dr-Irv commented Jun 5, 2023

twoertwein commented Jun 10, 2023

behrenhoff commented May 31, 2023 •

edited

Loading

behrenhoff commented May 31, 2023 •

edited

Loading