-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Offset-based rolling window, with closed='left' and max as aggregation function, make python crash #21704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The bug exists on |
Thanks for the report - that does look strange. Investigation and PRs are always welcome! |
Same thing happen with closed='neither', but it works for closed='right' |
[Disclaimer : I don't know much about Cython] |
…21704) User reported that `df.rolling(to_offset('3D'), closed='left').max()` segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268 `i` is initialized to `endi[0]`, which is 0 when `closed=left`. So in the next line when it tries to set `output[i-1]` it goes out of bounds. In addition, there are 2 more bugs in the `roll_min_max` code. The second bug is that for variable size windows, the `nobs` is never updated when elements leave the window. The third bug is at the end of the fixed window where all output elements up to `minp` are initialized to 0 if the input is not float. This PR fixes all three of the aforementioned bugs, at the cost of casting the output array to floating point even if the input is integer. This is less than ideal if the output has no NaNs, but is still consistent with roll_sum behavior.
I just hit this as well. @JacquotLeHaricot you're intuition was spot on. Definitely an off-by-one. |
…21704) User reported that `df.rolling(to_offset('3D'), closed='left').max()` segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268 `i` is initialized to `endi[0]`, which is 0 when `closed=left`. So in the next line when it tries to set `output[i-1]` it goes out of bounds. In addition, there are 2 more bugs in the `roll_min_max` code. The second bug is that for variable size windows, the `nobs` is never updated when elements leave the window. The third bug is at the end of the fixed window where all output elements up to `minp` are initialized to 0 if the input is not float. This PR fixes all three of the aforementioned bugs, at the cost of casting the output array to floating point even if the input is integer. This is less than ideal if the output has no NaNs, but is still consistent with roll_sum behavior.
…21704) User reported that `df.rolling(to_offset('3D'), closed='left').max()` segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268 `i` is initialized to `endi[0]`, which is 0 when `closed=left`. So in the next line when it tries to set `output[i-1]` it goes out of bounds. In addition, there are 2 more bugs in the `roll_min_max` code. The second bug is that for variable size windows, the `nobs` is never updated when elements leave the window. The third bug is at the end of the fixed window where all output elements up to `minp` are initialized to 0 if the input is not float. This PR fixes all three of the aforementioned bugs, at the cost of casting the output array to floating point even if the input is integer. This is less than ideal if the output has no NaNs, but is still consistent with roll_sum behavior.
…21704) User reported that `df.rolling(to_offset('3D'), closed='left').max()` segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268 `i` is initialized to `endi[0]`, which is 0 when `closed=left`. So in the next line when it tries to set `output[i-1]` it goes out of bounds. In addition, there are 2 more bugs in the `roll_min_max` code. The second bug is that for variable size windows, the `nobs` is never updated when elements leave the window. The third bug is at the end of the fixed window where all output elements up to `minp` are initialized to 0 if the input is not float. This PR fixes these three bugs, at the cost of casting the output array to floating point even if the input is integer. This is less than ideal if the output has no NaNs, but is consistent with roll_sum behavior.
…21704) User reported that `df.rolling(to_offset('3D'), closed='left').max()` segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268 `i` is initialized to `endi[0]`, which is 0 when `closed=left`. So in the next line when it tries to set `output[i-1]` it goes out of bounds. In addition, there are 2 more bugs in the `roll_min_max` code. The second bug is that for variable size windows, the `nobs` is never updated when elements leave the window. The third bug is at the end of the fixed window where all output elements up to `minp` are initialized to 0 if the input is not float. This PR fixes these three bugs, at the cost of casting the output array to floating point even if the input is integer. This is less than ideal if the output has no NaNs, but is consistent with roll_sum behavior.
Code Sample, a copy-pastable example if possible
Problem description
With this rolling and aggregation function, python just crashes. It does too with
.min()
or.agg(np.max)
as aggregation steps. It does not whenclosed='right'
or withmean
as an aggregation function.Expected Output
or, if
closed='left'
raises complicated problems (as it is not implemented for fixed windows), disable it too for the offset-based windows.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.1
pytest: 3.2.1
pip: 10.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.5.3
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: 2.7.3.2 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: