Skip to content

BUG: "min_period" and "min_periods" problem in df.groupby().rolling #34037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
ZiyueWang25 opened this issue May 6, 2020 · 5 comments · Fixed by #42940
Closed
1 task

BUG: "min_period" and "min_periods" problem in df.groupby().rolling #34037

ZiyueWang25 opened this issue May 6, 2020 · 5 comments · Fixed by #42940
Assignees
Labels
Error Reporting Incorrect or improved errors from pandas good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@ZiyueWang25
Copy link

  • [y] I have checked that this issue has not already been reported.

  • [y] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here

import pandas as pd
name_l = ["Alice"] * 5 + ["Bob"] * 5 
val_l = [np.nan,np.nan,1,2,3]+[np.nan,1,2,3,4]
test_df = pd.DataFrame([name_l,val_l]).T
test_df.columns = ["name","val"]

# correct one
test_df.groupby("name")["val"].rolling(window=2,min_periods=1).sum()

# wrong one with "min_period” parameter 
test_df.groupby("name")["val"].rolling(window=2,min_period=1).sum() 

Problem description

The one with min_period does not work the same as min_periods and should not be allowed.

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

Expected Output

The one with min_period as parameter should produce an error.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-7629-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.3
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.7.3
pip : 20.1
setuptools : 46.1.3
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.12
tables : None
tabulate : None
xarray : 0.15.1
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.46.0

@ZiyueWang25 ZiyueWang25 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 6, 2020
@charlesdong1991 charlesdong1991 added Error Reporting Incorrect or improved errors from pandas and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 17, 2020
@gabrielvf1
Copy link
Contributor

gabrielvf1 commented May 27, 2020

I was looking for a fix on this issue and i realize that doesnt matter if you put arguments that are invalid inside the rolling() that wont be checked, is that normal? or should this be checked? Like:
The output of this code:
rolling(window=2,teste="teste",min_periods=1)
Is the same as this:
rolling(window=2,min_periods=1).
I think that the best way is to check if all arguments are valid, no? Is there a reason to dont check it?
Im working on that one!

@ZiyueWang25
Copy link
Author

This should be checked because people will input 'min_period' intended to achieve the functionality of 'min_periods', but it cannot be achieved. If we don't have 'groupby()' method before the rolling, then it will output "TypeError: rolling() got an unexpected keyword argument 'min_period'". I think this should apply to the case here as well. In short, we need a TypeError if we input min_period instead of min_periods.

@gabrielvf1
Copy link
Contributor

I agree with you but lets see what @charlesdong1991 thinks just to check

@mroeschke
Copy link
Member

This looks to raise an error correctly on master now. Could use a test

In [9]: test_df.groupby("name")["val"].rolling(window=2,min_period=1).sum()
TypeError: __init__() got an unexpected keyword argument 'min_period'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Aug 7, 2021
@horaceklai
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants