-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG #10228: segfault due to out-of-bounds in binning #10337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
of course you need a test, otherwise how can you tell if your fix works? The test can be as simple as asserting that it runs. |
@@ -9110,6 +9110,8 @@ def group_count_bin_float64(ndarray[float64_t, ndim=2] out, | |||
ndarray[int64_t, ndim=2] nobs = np.zeros((out.shape[0], out.shape[1]), | |||
dtype=np.int64) | |||
|
|||
if len(bins) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is by definition generated code, edits here will be lost. rather make the fix in src/generate_code.py
, thenpython generate_code.py
creates the generated.pyx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh crap, my bad! I did see the name, but then I looked at git history and saw what appeared to be people editing it (but I get it now).
Do you think it's worth adding a warning message at the top for new devs like NumPy does? Or should it be obvious from the name?
@jreback, sure I'll write a test. I have a piece of code that reliably segfaults for me:
That being said, this doesn't mean it'll reliably segfault for other machines since out-of-bounds access are undefined behavior. One way would be to re-enable the bounds-checking on these Cython functions just for the unit tests. (I actually posed this question on Programmers Stack Exchange and they seemed surprised that the disabling of bounds-checking was hard-coded rather than something that can be switched on and off) |
@Garrett-R then that is a reasonable test. Even if it doesn't segfault on some platforms that is ok. Note that you can change the directive at run-time, see here, e.g. you would have a flag that is passed in to re-enable this. But to be honest it is overkill, and not necessary to tests with that. Since you have an example that fails, that can serve as a smoke tests (e.g. the test is simply running that code, if it doesn't segfault then the test passes). |
@Garrett-R No objection to a separate PR for adding a warning to |
# These were sometimes causing a segfault (for the functions with | ||
# bounds-checking disabled) or an IndexError. We just run them to | ||
# ensure they no longer do. (#10228) | ||
index = pd.DatetimeIndex([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would actually make a couple of loops here, iterate over all of the index types, then all of the possible dtypes (object,float,datetime if you are adventurous - only works for some of the hows), then all of the 'hows' and exhaustivly test the possibilities
I would actually define these functions in utils.testing
(right below the make index) functions
def all_index_factory():
return [ tm.makeIntIndex, tm.makeFloatIndex,
tm.makeStringIndex, tm.makeUnicodeIndex,
tm.makeDateIndex, tm.makePeriodIndex,
tm.makeTimedeltaIndex, tm.makeBoolIndex,
tm.makeCategoricalIndex]
def datetimelike_index_factory():
return [ tm.makeDateIndex, tm.makePeriodIndex,
tm.makeTimedeltaIndex ]
obviously use the 2nd one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I ended up making these generators that returned the index instances, which I thought would keep the test cleaner. Let me know if you prefer to just return a list of functions though and I can change it.
@jreback, good call about adding more comprehensive testing. This helped me find some more out-of-bounds bugs (which were in I'll send a PR soon to supply a "DO NOT EDIT THIS GENERATED FILE...." warning for other fellow noobs. |
merged via 25fc49d thanks! nice fix. btw if you'd like to update the docs w.r.t. the |
Closes #10228. I also deleted some duplicated code while I was at it.
So, I wasn't sure if I should be including a unit test for this. This issue was a segfault, which was happening when you do:
so I suppose I could add this code into a test, and just verify it doesn't segfault, but that seemed like a bad idea. It would be testing for something that's undefined behavior and therefore be a non-deterministic test.