Skip to content

ENH: Change default behavior of rolling.count to be consistent with others #31302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fujiaxiang opened this issue Jan 25, 2020 · 0 comments · Fixed by #36649
Closed

ENH: Change default behavior of rolling.count to be consistent with others #31302

fujiaxiang opened this issue Jan 25, 2020 · 0 comments · Fixed by #36649
Labels
API - Consistency Internal Consistency of API/Behavior Deprecate Functionality to remove in pandas Window rolling, ewma, expanding
Milestone

Comments

@fujiaxiang
Copy link
Member

Following on discussion happened in #30923, we may want to change the default behavior of rolling.count with regards to its parameter min_periods, so it is consistent with all other similar APIs such as rolling.mean and rolling.sum.

Code Sample

With the updates from #30923

>>> import numpy as np
>>> import pandas as pd
>>> s = pd.Series([1, 1, 1, np.nan, 1, 1, 1])
>>> s
0    1.0
1    1.0
2    1.0
3    NaN
4    1.0
5    1.0
6    1.0
dtype: float64

# rolling.mean and rolling.sum defaults min_periods to the same value as window size (3 in this case)
# notice that it requires not only the window size to be at least 3, but also the number of valid entries (not NaN) to be at least 3
>>> s.rolling(3).mean()  
0    NaN
1    NaN
2    1.0
3    NaN
4    NaN
5    NaN
6    1.0
dtype: float64

>>> s.rolling(3).sum()
0    NaN
1    NaN
2    3.0
3    NaN
4    NaN
5    NaN
6    3.0
dtype: float64

# the default value of min_periods for rolling.count is 0
# we may want to change this behavior so it's consistent with other APIs
>>> s.rolling(3).count()
0    1.0
1    2.0
2    3.0
3    2.0
4    2.0
5    2.0
6    3.0
dtype: float64

# notice that rolling.count requires window size to be at least equal to min_periods to give a result
# it doesn't care about how many valid entries (not NaN) to determine if it should output NaN
# we should retain this behavior because this function is meant to count the number of valid entries
>>> s.rolling(3, min_periods=3).count()
0    NaN
1    NaN
2    3.0
3    2.0
4    2.0
5    2.0
6    3.0
dtype: float64

Expected Output

>>> s.rolling(3).count()
0    NaN
1    NaN
2    3.0
3    2.0
4    2.0
5    2.0
6    3.0
dtype: float64

Problem description

With the updates from #30923, the min_periods argument of rolling.count is now respected (it used to be completely ignored). However, the default value remains 0 for backward compatibility purpose. In future updates we probably want to change this default behavior so it's consistent with other similar APIs.

@mroeschke previously mentioned we needed to start with a DeprecationWarning to inform users of future changes, then probably in the following release make the actual change.

@jreback @WillAyd
Let me know what you guys think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Deprecate Functionality to remove in pandas Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants