Skip to content

ENH:AttributeError: 'SeriesGroupBy' object has no attribute 'kurtosis' #40139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rsuhada opened this issue Mar 1, 2021 · 11 comments · Fixed by #60433
Closed

ENH:AttributeError: 'SeriesGroupBy' object has no attribute 'kurtosis' #40139

rsuhada opened this issue Mar 1, 2021 · 11 comments · Fixed by #60433

Comments

@rsuhada
Copy link

rsuhada commented Mar 1, 2021

Code Sample, a copy-pastable example

df = pd.DataFrame({'birb':['Falcon', 'Falcon', 'Falcon', 'Falcon',
                           'Parrot', 'Parrot', 'Parrot', 'Parrot'],
                   'value': [390., 350.,390., 350.,
                             30., 20., 30., 20.]})
df.groupby('birb').aggregate({'value': ['count', 'mean', 'skew']})

Problem description

Code gives error: AttributeError: 'SeriesGroupBy' object has no attribute 'kurtosis' instead of correct group-wise kurtosis.
Skew works ok.

Expected Output

       value            
       count   mean kurtosis
birb                    
Falcon     4  370.0  -6.0
Parrot     4   25.0  -6.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-66-generic
Version : #74~18.04.2-Ubuntu SMP Fri Feb 5 11:17:31 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_IE.UTF-8
LOCALE : en_IE.UTF-8

pandas : 1.2.2
numpy : 1.17.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 49.2.1
Cython : None
pytest : 5.3.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.20.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.52.0

@rsuhada rsuhada added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 1, 2021
RebwarBajallan95 pushed a commit to johbess/pandas that referenced this issue Mar 2, 2021
@rhshadrach
Copy link
Member

I don't believe SeriesGroupBy is documented as having a kurtosis method. Are you finding something different?

@rhshadrach rhshadrach changed the title BUG:AttributeError: 'SeriesGroupBy' object has no attribute 'kurtosis' ENH:AttributeError: 'SeriesGroupBy' object has no attribute 'kurtosis' Mar 3, 2021
@rsuhada
Copy link
Author

rsuhada commented Mar 3, 2021

You are right... it just tripped me up that first three moments are implemented, but not the fourth one.
One can workaround it with the scipy implementation of kurtosis, but it would be nice to have it directly here.

@rhshadrach
Copy link
Member

rhshadrach commented Mar 4, 2021

+1 on adding, I think this would just be adding to the common_apply_allowlist. In the meantime, from https://stackoverflow.com/questions/37345493/kurtosis-on-groupby-of-pandas-dataframe-doesnt-work

df.groupby('a').apply(pd.DataFrame.kurt)

This is essentially what is occurring under the hood for skew. Might want to use pd.Series instead of pd.DataFrame here.

@lithomas1 lithomas1 removed the Needs Triage Issue that has not been reviewed by a pandas team member label Mar 4, 2021
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Mar 5, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@jaepil-choi
Copy link

Can't believe this is still a thing. mean, std, skew, count, min, max, median all other things work but not kurtosis? I mean.. why?

@rhshadrach
Copy link
Member

@jaepil-choi - PRs are welcome!

@jaepil-choi
Copy link

@rhshadrach

Found out that pd.DataFrame.kurt workaround works. Would a very basic temporary fix like simply converting 'kurt' to pd.DataFrame.kurt get passed and merged?

@rhshadrach
Copy link
Member

I believe you can implement this similar to corr:

SeriesGroupBy:

@doc(Series.corr.__doc__)
def corr(
self,
other: Series,
method: CorrelationMethod = "pearson",
min_periods: int | None = None,
) -> Series:
result = self._op_via_apply(
"corr", other=other, method=method, min_periods=min_periods
)
return result

DataFrameGroupBy:

@doc(DataFrame.corr.__doc__)
def corr(
self,
method: str | Callable[[np.ndarray, np.ndarray], float] = "pearson",
min_periods: int = 1,
numeric_only: bool = False,
) -> DataFrame:
result = self._op_via_apply(
"corr", method=method, min_periods=min_periods, numeric_only=numeric_only
)
return result

A PR would need the right arguments and tests.

@saldanhad
Copy link
Contributor

take

@Shubhank-Gyawali
Copy link
Contributor

take

@saldanhad saldanhad removed their assignment Nov 16, 2024
@snitish
Copy link
Member

snitish commented Nov 27, 2024

take

@snitish
Copy link
Member

snitish commented Dec 1, 2024

@rhshadrach I raised a PR for this - #60433. Could you please review? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants