Skip to content

TYP: Update type results for Groupby.count() and Groupby.size() #45904

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 7, 2022

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Feb 9, 2022

Here's a test using mypy that reveals the issue.

import pandas as pd

ind = pd.MultiIndex.from_product([["a", "b"], ["x", "y"]], names=["ab", "xy"])
df = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
s = pd.Series([1, 2, 3, 4], index=ind)

cnt_df = df.groupby("x").count()

siz_df = df.groupby("x").size()

app_df = df.groupby("x").apply(sum)

cnt_s = s.groupby("ab").count()

siz_s = s.groupby("xy").size()

app_s = s.groupby("xy").apply(sum)

reveal_type(cnt_df)
reveal_type(siz_df)
reveal_type(app_df)
reveal_type(cnt_s)
reveal_type(siz_s)
reveal_type(app_s)

Before this PR, the result of mypy on the reveal_type lines is:

issue45875.py:17: note: Revealed type is "Union[pandas.core.series.Series, pandas.core.frame.DataFrame]"
issue45875.py:18: note: Revealed type is "Union[pandas.core.frame.DataFrame, pandas.core.series.Series]"
issue45875.py:19: note: Revealed type is "Any"
issue45875.py:20: note: Revealed type is "Union[pandas.core.series.Series, pandas.core.frame.DataFrame]"
issue45875.py:21: note: Revealed type is "Union[pandas.core.frame.DataFrame, pandas.core.series.Series]"
issue45875.py:22: note: Revealed type is "Any"

After this PR, the result is:

issue45875.py:19: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:20: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:21: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:22: note: Revealed type is "pandas.core.series.Series"
issue45875.py:23: note: Revealed type is "pandas.core.series.Series"
issue45875.py:24: note: Revealed type is "pandas.core.series.Series"

This will be needed as we work on the general typing issues.

@Dr-Irv Dr-Irv requested a review from jbrockmendel February 9, 2022 18:08
@Dr-Irv Dr-Irv added the Typing type annotations, mypy/pyright type checking label Feb 9, 2022
Copy link
Member

@twoertwein twoertwein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me from a typing perspective.

@simonjayhawkins
Copy link
Member

Here's a test using mypy that reveals the issue.

This code sample raises ValueError: No axis named 1 for object type Series in the python interpreter

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Feb 10, 2022

Here's a test using mypy that reveals the issue.

This code sample raises ValueError: No axis named 1 for object type Series in the python interpreter

I fixed it. Edited above.

@simonjayhawkins @jbrockmendel just did a newer commit that takes advantage of the generic aspect of the GroupBy class. Requires a couple of casts, but I think this is cleaner.

@simonjayhawkins
Copy link
Member

After this PR, the result is:

issue45875.py:19: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:20: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:21: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:22: note: Revealed type is "pandas.core.series.Series"
issue45875.py:23: note: Revealed type is "pandas.core.series.Series"
issue45875.py:24: note: Revealed type is "pandas.core.series.Series"

running code sample through interpreter gives...

   y
x   
1  1
2  1
3  1
4  1
<class 'pandas.core.frame.DataFrame'>
x
1    1
2    1
3    1
4    1
dtype: int64
<class 'pandas.core.series.Series'>
   x  y
x      
1  1  4
2  2  5
3  3  6
4  4  7
<class 'pandas.core.frame.DataFrame'>
ab
a    2
b    2
dtype: int64
<class 'pandas.core.series.Series'>
xy
x    2
y    2
dtype: int64
<class 'pandas.core.series.Series'>
xy
x    4
y    6
dtype: int64
<class 'pandas.core.series.Series'>

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dr-Irv lgtm can you update the PR title to be more descriptive TYP: ...

@Dr-Irv Dr-Irv changed the title Issue45875 TYP: Update type results for Groupby.count() and Groupby.size() Feb 10, 2022
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Feb 10, 2022

For the record, here is what I used for testing:

import pandas as pd
from typing import TYPE_CHECKING

if not TYPE_CHECKING:

    def reveal_type(*args, **kwargs):
        pass


ind = pd.MultiIndex.from_product([["a", "b"], ["x", "y"]], names=["ab", "xy"])
df = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
s = pd.Series([1, 2, 3, 4], index=ind)

cnt_df = df.groupby("x").count()
print(type(cnt_df))

cnt_df_s = df.groupby("x").y.count()
print(type(cnt_df_s))

# siz_df = df.groupby("x").size()
# print(type(siz_df))

app_df = df.groupby("x").apply(sum)
print(type(app_df))

cnt_df_a = df.groupby("x", as_index=False).count()
print(type(cnt_df_a))

cnt_df_s_a = df.groupby("x", as_index=False).y.count()
print(type(cnt_df_s_a))

# siz_df_a = df.groupby("x", as_index=False).size()
# print(type(siz_df_a))

app_df_a = df.groupby("x", as_index=False).apply(sum)
print(type(app_df_a))

cnt_s = s.groupby("ab").count()
print(type(cnt_s))

# siz_s = s.groupby("xy").size()
# print(type(siz_s))

app_s = s.groupby("xy").apply(sum)
print(type(app_s))

reveal_type(cnt_df)
reveal_type(cnt_df_s)
# reveal_type(siz_df)
reveal_type(app_df)
reveal_type(cnt_df_a)
reveal_type(cnt_df_s_a)
# reveal_type(siz_df_a)
reveal_type(app_df_a)
reveal_type(cnt_s)
# reveal_type(siz_s)
reveal_type(app_s)

I took out the size() test because it is dynamically typed based on as_index. Running this program yields:

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

mypy reports:

issue45875.py:47: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:48: note: Revealed type is "Any"
issue45875.py:50: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:51: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:52: note: Revealed type is "Any"
issue45875.py:54: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:55: note: Revealed type is "pandas.core.series.Series*"
issue45875.py:57: note: Revealed type is "pandas.core.series.Series"

The two Any values from mypy are because we can't type df.groupby("x").y or df.groupby("x", as_index=False).y

@jreback
Copy link
Contributor

jreback commented Mar 6, 2022

@Dr-Irv status here

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 7, 2022

@jreback all green now - merged with latest main

@jreback jreback added this to the 1.5 milestone Mar 7, 2022
@jreback jreback merged commit 5ab67d7 into pandas-dev:main Mar 7, 2022
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
@Dr-Irv Dr-Irv deleted the issue45875 branch February 13, 2023 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: groupby().count() should be better typed
5 participants