-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Clarify pandas.DataFrame.rolling() when using different values for the parameter "closed" #44687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls add a informative title to the issue |
I hope this title is good enough. |
Not really, can you add a descriptive title please? And also fixup the example so it runs ( (Yes, I know I could fix all of these for you, but that doesn't scale) |
Description and title updated. |
This is not a bug but the documentation could be clearer, particularly the documentation for
I like to think that a fixed window always consists of the current row plus The |
Closing in favor of #60485 |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
There seems to be some inconsistencies in the behavior of the rolling function related to the parameter 'closed'. Different options were tested and assigned to a different column in the toy example above.
First, when using 'neither' it returns NaNs as per the 'neither' column in the output.
Second, when we use 'right' or 'left', the count reaches the maximum window size, which is not coherent. If we exclude one of the endpoints the maximum should be win_size - 1. What more, in the case of closed='both', we even get a count of elements greater than the window size, in this case 5, vs 4. It seems that the closed parameter affects the window position and shape rather than just the inclusion of the endpoints. The exact behavior should be better described if this is not a bug.
Expected Behavior
1- Using 'neither', should yield the same result as 'left' minus 1. Actually this parameter value is redundant since the result could be obtained by taking a window size one unit smaller and closed set to 'left'.
2- Using 'left' or 'right' should exclude from the calculation one of the endpoints and never take the whole window for calculation.
3- Using 'both', should use a window, exactly the same size as the parameter 'window', inclusive of the current row.
Installed Versions
INSTALLED VERSIONS
commit : 945c9ed
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Canada.1252
pandas : 1.3.4
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 21.3.1
setuptools : 58.2.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.23.2
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.29.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.54.1
The text was updated successfully, but these errors were encountered: