Skip to content

BUG: groupby().count() should be better typed #45875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
Dr-Irv opened this issue Feb 8, 2022 · 3 comments · Fixed by #45904
Closed
3 tasks done

BUG: groupby().count() should be better typed #45875

Dr-Irv opened this issue Feb 8, 2022 · 3 comments · Fixed by #45904
Labels
Groupby Typing type annotations, mypy/pyright type checking
Milestone

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Feb 8, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> df = pd.DataFrame({"x": [1,2,3], "y":[4,5,6], "z":[7,8,9]})
>>> df.groupby("x").count()
   y  z
x
1  1  1
2  1  1
3  1  1
>>> df.groupby("x").x.count()
x
1    1
2    1
3    1
Name: x, dtype: int64

Issue Description

In #43519 @jbrockmendel removed the method count() from SeriesGroupBy and DataFrameGroupBy classes and merged it into generic.py. Then in the Groupby class, he typed count() as:

def count(self) -> Series | DataFrame:

The problem here is that previously, you know that SeriesGroupBy.count() returns a Series and DataFrameGroupBy returns a DataFrame, so now we have lost that ability to more strongly know the type of the result.

Same thing is also true for size()

Expected Behavior

SeriesGroupBy.count() returns a Series and DataFrameGroupBy.count() returns a DataFrame so we need to type them that way, maybe by just putting the method in groupby.py in each class and having it call super().count()

Installed Versions

INSTALLED VERSIONS

commit : 24ecfe2
python : 3.8.12.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.5.0.dev0+313.g24ecfe2c72
numpy : 1.22.0
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 60.5.0
Cython : 0.29.26
pytest : 6.2.5
hypothesis : 6.35.0
sphinx : 4.3.2
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.7.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.0.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fastparquet : 0.7.2
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.5.1
numba : 0.53.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyreadstat : 1.1.4
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.3
sqlalchemy : 1.4.29
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None

@Dr-Irv Dr-Irv added Bug Needs Triage Issue that has not been reviewed by a pandas team member Typing type annotations, mypy/pyright type checking labels Feb 8, 2022
@jbrockmendel
Copy link
Member

IIRC making GroupBy a Generic was somehow going to address this. If not, then i guess just having a super() call is fine

@simonjayhawkins
Copy link
Member

The problem here is that previously, you know that SeriesGroupBy.count() returns a Series and DataFrameGroupBy returns a DataFrame

>>> type(df.groupby("x").x.count())
<class 'pandas.core.series.Series'>
>>> 
>>> type(df.groupby("x", as_index=False).x.count())
<class 'pandas.core.frame.DataFrame'>

>>> type(df.groupby("x"))
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>
>>> 
>>> type(df.groupby("x", as_index=False))
<class 'pandas.core.groupby.generic.DataFrameGroupBy'>

so the union return type looks ok to me, unless it is possible to refine the return type of the class method based on the class attributes.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Feb 10, 2022

You have to look at the type of df.groupby("x").x :

>>> df.groupby("x").x
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000012DFFD5B5E0>
>>> df.groupby("x", as_index=False).x
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000012D803EAE20>

So for count(), it will do the right thing because count() returns a Series for SeriesGroupBy and DataFrame for DataFrameGroupBy, which is what you expect.

The issue with size() is that the type of the result depends more directly on the value of as_index , which I think is different than the other GroupBy methods.

@mroeschke mroeschke added Groupby and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 11, 2022
@jreback jreback added this to the 1.5 milestone Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants