Skip to content

DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
Iqigai opened this issue Nov 30, 2021 · 6 comments
Closed
2 of 3 tasks
Labels
Docs Window rolling, ewma, expanding

Comments

@Iqigai
Copy link

Iqigai commented Nov 30, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

df = pd.DataFrame({'A': [1] * 5 + [0]*5})
win_size = 4
df['left'] = df['A'].rolling(win_size, closed='left').sum()
df['right'] = df['A'].rolling(win_size, closed='right').sum()
df['both'] = df['A'].rolling(win_size, closed='both').sum()
df['neither'] = df['A'].rolling(win_size, closed='neither').sum()
df
Out[6]: 
   A  left  right  both  neither
0  1   NaN    NaN   NaN      NaN
1  1   NaN    NaN   NaN      NaN
2  1   NaN    NaN   NaN      NaN
3  1   NaN    4.0   4.0      NaN
4  1   4.0    4.0   5.0      NaN
5  0   4.0    3.0   4.0      NaN
6  0   3.0    2.0   3.0      NaN
7  0   2.0    1.0   2.0      NaN
8  0   1.0    0.0   1.0      NaN
9  0   0.0    0.0   0.0      NaN

Issue Description

There seems to be some inconsistencies in the behavior of the rolling function related to the parameter 'closed'. Different options were tested and assigned to a different column in the toy example above.
First, when using 'neither' it returns NaNs as per the 'neither' column in the output.
Second, when we use 'right' or 'left', the count reaches the maximum window size, which is not coherent. If we exclude one of the endpoints the maximum should be win_size - 1. What more, in the case of closed='both', we even get a count of elements greater than the window size, in this case 5, vs 4. It seems that the closed parameter affects the window position and shape rather than just the inclusion of the endpoints. The exact behavior should be better described if this is not a bug.

Expected Behavior

1- Using 'neither', should yield the same result as 'left' minus 1. Actually this parameter value is redundant since the result could be obtained by taking a window size one unit smaller and closed set to 'left'.
2- Using 'left' or 'right' should exclude from the calculation one of the endpoints and never take the whole window for calculation.
3- Using 'both', should use a window, exactly the same size as the parameter 'window', inclusive of the current row.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Canada.1252
pandas : 1.3.4
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.3.1
setuptools : 58.2.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.2
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.29.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.54.1

@Iqigai Iqigai added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 30, 2021
@jbrockmendel
Copy link
Member

pls add a informative title to the issue

@Iqigai
Copy link
Author

Iqigai commented Dec 2, 2021

I hope this title is good enough.

@MarcoGorelli
Copy link
Member

I hope this title is good enough.

Not really, can you add a descriptive title please?

And also fixup the example so it runs (df[A'] is a syntax error), and the formatting is off

(Yes, I know I could fix all of these for you, but that doesn't scale)

@Iqigai Iqigai changed the title BUG: BUG: Incoherent results for pandas.DataFrame.rolling() when using different values for the parameter "closed" Dec 2, 2021
@Iqigai
Copy link
Author

Iqigai commented Dec 2, 2021

Description and title updated.

@mroeschke mroeschke added Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 27, 2021
@dicristina
Copy link
Contributor

This is not a bug but the documentation could be clearer, particularly the documentation for Series.rolling and DataFrame.rolling. The user guide states:

The windows are comprised by looking back the length of the window from the current observation.

I like to think that a fixed window always consists of the current row plus window rows before the current row and that the first or the last (current) row might be excluded according to the closed parameter. If closed is set to 'both' then no rows will be excluded which means that window+1 rows will be used for the result, window rows are used for 'left' and 'right', and window-1 rows are used when closed is set to 'neither'.

The 'neither' column in the example resulted in all NaN values because the min_periods parameter is automatically set to window for fixed windows. To avoid this problem min_periods has to be set to a value lower than window whenever closed='neither'.

@mroeschke mroeschke added Docs and removed Bug labels Jul 11, 2022
@mroeschke mroeschke changed the title BUG: Incoherent results for pandas.DataFrame.rolling() when using different values for the parameter "closed" DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" Jul 11, 2022
@rhshadrach
Copy link
Member

Closing in favor of #60485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Window rolling, ewma, expanding
Projects
None yet
Development

No branches or pull requests

6 participants