DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687

Iqigai · 2021-11-30T09:23:14Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

df = pd.DataFrame({'A': [1] * 5 + [0]*5})
win_size = 4
df['left'] = df['A'].rolling(win_size, closed='left').sum()
df['right'] = df['A'].rolling(win_size, closed='right').sum()
df['both'] = df['A'].rolling(win_size, closed='both').sum()
df['neither'] = df['A'].rolling(win_size, closed='neither').sum()
df
Out[6]: 
   A  left  right  both  neither
0  1   NaN    NaN   NaN      NaN
1  1   NaN    NaN   NaN      NaN
2  1   NaN    NaN   NaN      NaN
3  1   NaN    4.0   4.0      NaN
4  1   4.0    4.0   5.0      NaN
5  0   4.0    3.0   4.0      NaN
6  0   3.0    2.0   3.0      NaN
7  0   2.0    1.0   2.0      NaN
8  0   1.0    0.0   1.0      NaN
9  0   0.0    0.0   0.0      NaN

Issue Description

There seems to be some inconsistencies in the behavior of the rolling function related to the parameter 'closed'. Different options were tested and assigned to a different column in the toy example above.
First, when using 'neither' it returns NaNs as per the 'neither' column in the output.
Second, when we use 'right' or 'left', the count reaches the maximum window size, which is not coherent. If we exclude one of the endpoints the maximum should be win_size - 1. What more, in the case of closed='both', we even get a count of elements greater than the window size, in this case 5, vs 4. It seems that the closed parameter affects the window position and shape rather than just the inclusion of the endpoints. The exact behavior should be better described if this is not a bug.

Expected Behavior

1- Using 'neither', should yield the same result as 'left' minus 1. Actually this parameter value is redundant since the result could be obtained by taking a window size one unit smaller and closed set to 'left'.
2- Using 'left' or 'right' should exclude from the calculation one of the endpoints and never take the whole window for calculation.
3- Using 'both', should use a window, exactly the same size as the parameter 'window', inclusive of the current row.

Installed Versions

INSTALLED VERSIONS

commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Canada.1252
pandas : 1.3.4
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.3.1
setuptools : 58.2.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.2
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.29.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.54.1

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2021-12-01T21:01:27Z

pls add a informative title to the issue

Iqigai · 2021-12-02T07:00:20Z

I hope this title is good enough.

MarcoGorelli · 2021-12-02T08:36:26Z

I hope this title is good enough.

Not really, can you add a descriptive title please?

And also fixup the example so it runs (df[A'] is a syntax error), and the formatting is off

(Yes, I know I could fix all of these for you, but that doesn't scale)

Iqigai · 2021-12-02T15:43:42Z

Description and title updated.

dicristina · 2022-07-11T03:36:43Z

This is not a bug but the documentation could be clearer, particularly the documentation for Series.rolling and DataFrame.rolling. The user guide states:

The windows are comprised by looking back the length of the window from the current observation.

I like to think that a fixed window always consists of the current row plus window rows before the current row and that the first or the last (current) row might be excluded according to the closed parameter. If closed is set to 'both' then no rows will be excluded which means that window+1 rows will be used for the result, window rows are used for 'left' and 'right', and window-1 rows are used when closed is set to 'neither'.

The 'neither' column in the example resulted in all NaN values because the min_periods parameter is automatically set to window for fixed windows. To avoid this problem min_periods has to be set to a value lower than window whenever closed='neither'.

rhshadrach · 2024-12-03T23:03:51Z

Closing in favor of #60485

Iqigai added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 30, 2021

Iqigai changed the title ~~BUG:~~ BUG: Incoherent results for pandas.DataFrame.rolling() when using different values for the parameter "closed" Dec 2, 2021

mroeschke added Window rolling, ewma, expanding and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 27, 2021

mroeschke added Docs and removed Bug labels Jul 11, 2022

mroeschke changed the title ~~BUG: Incoherent results for pandas.DataFrame.rolling() when using different values for the parameter "closed"~~ DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" Jul 11, 2022

rhshadrach closed this as completed Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687

DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687

Iqigai commented Nov 30, 2021 •

edited

Loading

INSTALLED VERSIONS

jbrockmendel commented Dec 1, 2021

Iqigai commented Dec 2, 2021

MarcoGorelli commented Dec 2, 2021

Iqigai commented Dec 2, 2021

dicristina commented Jul 11, 2022

rhshadrach commented Dec 3, 2024

DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687

DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687

Comments

Iqigai commented Nov 30, 2021 • edited Loading

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

jbrockmendel commented Dec 1, 2021

Iqigai commented Dec 2, 2021

MarcoGorelli commented Dec 2, 2021

Iqigai commented Dec 2, 2021

dicristina commented Jul 11, 2022

rhshadrach commented Dec 3, 2024

Iqigai commented Nov 30, 2021 •

edited

Loading