Skip to content

BUG: datetime rolling min/max segfaults when closed=left (#21704) #21853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 20, 2018

Conversation

changhiskhan
Copy link
Contributor

@changhiskhan changhiskhan commented Jul 11, 2018

User reported that df.rolling(to_offset('3D'), closed='left').max()
segfaults when df has a datetime index. The bug was in PR #19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
i is initialized to endi[0], which is 0 when closed=left.
So in the next line when it tries to set output[i-1] it goes out of bounds.
In addition, there are 2 more bugs in the roll_min_max code.
The second bug is that for variable size windows, the nobs is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to minp are initialized to 0 if
the input is not float.

This PR fixes all three of the aforementioned bugs, at the cost of casting the
output array to floating point even if the input is integer. This is less
than ideal if the output has no NaNs, but is still consistent with roll_sum
behavior.

@pep8speaks
Copy link

pep8speaks commented Jul 11, 2018

Hello @changhiskhan! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 13, 2018 at 05:02 Hours UTC

@@ -3772,7 +3816,7 @@ def test_ragged_apply(self, raw):

df = self.ragged

f = lambda x: 1
def f(x): return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter doesn’t like this. You need to either revert this change or put the return on the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like i accidentally had autopep8 or something on. i'll revert the change

index=pd.date_range('2000', periods=10))
ser[ser.index[-3:]] = np.nan
result = ser.rolling('3D', min_periods=2, closed='left').min()
print(ser)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this print needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i'll take it out

W.push_back(i)
output[0] = calc_mm(minp, nobs, input[Q.front()])
close_offset = 0
else: # if right is open then the first output is always NaN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two spaces before comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Addressed in update.

@codecov
Copy link

codecov bot commented Jul 11, 2018

Codecov Report

Merging #21853 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #21853   +/-   ##
=======================================
  Coverage   91.91%   91.91%           
=======================================
  Files         164      164           
  Lines       49992    49992           
=======================================
  Hits        45951    45951           
  Misses       4041     4041
Flag Coverage Δ
#multiple 90.3% <ø> (ø) ⬆️
#single 42.16% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 365eac4...7b839e5. Read the comment docs.

@changhiskhan
Copy link
Contributor Author

Not sure why the linter is failing on the travis-ci build. I updated my feature branch with E704 issues, and runnning ci/lint.sh locally passes just fine...

@changhiskhan
Copy link
Contributor Author

ack, now it's reporting the travis-ci has passed. I guess it just didn't update yet after i pushed

@jbrockmendel
Copy link
Member

@jreback this looks about ready, but I don't know this part of the code that well.

@changhiskhan
Copy link
Contributor Author

Piggy-backing off this PR for a pandas-dev question.
When I run the unit tests locally, I get a few locale related failures that I don't see on CI. e.g., pandas/tests/series/test_datetime_values.py:305:
Shows:

../../anaconda3/lib/python3.6/locale.py:581: in getlocale
    return _parse_localename(localename)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

**localename = 'en_IL'**

I'm not even sure where the "en_IL" is coming from:

(base) chang@changhispad ~/code/pandas (gh21704)$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

@@ -1241,49 +1242,69 @@ cdef _roll_min_max(ndarray[numeric] input, int64_t win, int64_t minp,
input, win,
minp, index, closed)

output = np.empty(N, dtype=input.dtype)
output = np.empty(N, dtype=float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this coercing integer input to float?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for two reasons:

  1. because when closed=left the first output should be NaN.
  2. because output elements up to minp should also be NaN.

I believe this behavior is consistent with rolling sum (coerced to float).

ndarray[int64_t] starti, endi
ndarray[numeric, ndim=1] output
ndarray[double_t, ndim=1] output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for changing the type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -464,6 +464,49 @@ def test_closed(self):
with pytest.raises(ValueError):
df.rolling(window=3, closed='neither')

def test_closed_min_max_datetime(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you parametrize this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in update. Waiting for CI

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. if you can parameterize that one test by the closed parameter & test a few differernt dtypes would be great. ping on green.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Numeric Operations Arithmetic, Comparison, and Logical operations labels Jul 12, 2018
@changhiskhan
Copy link
Contributor Author

actually, hold on, i think I found a problem with my fix. I'll ping again once it's ready. Thanks.

@changhiskhan
Copy link
Contributor Author

sorry this was a little bit more involved than I thought. I ended up adding more test cases as well and separated the variable vs fixed window code into two different functions so it's a bit easier to read.

…21704)

User reported that `df.rolling(to_offset('3D'), closed='left').max()`
segfaults when df has a datetime index. The bug was in PR pandas-dev#19549. In
that PR, in https://github.com/pandas-dev/pandas/blame/master/pandas/_libs/window.pyx#L1268
`i` is initialized to `endi[0]`, which is 0 when `closed=left`.
So in the next line when it tries to set `output[i-1]` it goes out of bounds.
In addition, there are 2 more bugs in the `roll_min_max` code.
The second bug is that for variable size windows, the `nobs` is never updated
when elements leave the window. The third bug is at the end of the fixed
window where all output elements up to `minp` are initialized to 0 if
the input is not float.

This PR fixes these three bugs, at the cost of casting the output array to
floating point even if the input is integer. This is less than ideal if the
output has no NaNs, but is consistent with roll_sum behavior.
@changhiskhan
Copy link
Contributor Author

@jreback I think this is ready for you to take another look now.

@jreback jreback added this to the 0.24.0 milestone Jul 14, 2018
@jreback
Copy link
Contributor

jreback commented Jul 14, 2018

looks good @changhiskhan if you can just run the asvs for min/max window ops to make sure nothing has significatnly changed.

@WillAyd if any comments, merge on good asv results.

@jreback
Copy link
Contributor

jreback commented Jul 20, 2018

@WillAyd if you are ok, pls approve & merge

@WillAyd WillAyd merged commit 0a0b2b9 into pandas-dev:master Jul 20, 2018
@WillAyd WillAyd added the Window rolling, ewma, expanding label Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode Window rolling, ewma, expanding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Offset-based rolling window, with closed='left' and max as aggregation function, make python crash
5 participants