BUG #10228: segfault due to out-of-bounds in binning #10337

Garrett-R · 2015-06-12T08:13:02Z

Closes #10228. I also deleted some duplicated code while I was at it.

So, I wasn't sure if I should be including a unit test for this. This issue was a segfault, which was happening when you do:

s = pd.Series([], index=pd.DatetimeIndex([]), dtype=np.object)
s.resample('d', how='count')

so I suppose I could add this code into a test, and just verify it doesn't segfault, but that seemed like a bad idea. It would be testing for something that's undefined behavior and therefore be a non-deterministic test.

jreback · 2015-06-12T12:06:45Z

of course you need a test, otherwise how can you tell if your fix works? The test can be as simple as asserting that it runs.

jreback · 2015-06-12T12:07:42Z

pandas/src/generated.pyx

@@ -9110,6 +9110,8 @@ def group_count_bin_float64(ndarray[float64_t, ndim=2] out,
        ndarray[int64_t, ndim=2] nobs = np.zeros((out.shape[0], out.shape[1]),
                                                 dtype=np.int64)

+    if len(bins) == 0:


this is by definition generated code, edits here will be lost. rather make the fix in src/generate_code.py, thenpython generate_code.py creates the generated.pyx

Oh crap, my bad! I did see the name, but then I looked at git history and saw what appeared to be people editing it (but I get it now).

Do you think it's worth adding a warning message at the top for new devs like NumPy does? Or should it be obvious from the name?

Garrett-R · 2015-06-14T06:37:41Z

@jreback, sure I'll write a test. I have a piece of code that reliably segfaults for me:

ss = pd.Series([], index=pd.DatetimeIndex([]), dtype=np.object)
ss.resample('d', how='count')

That being said, this doesn't mean it'll reliably segfault for other machines since out-of-bounds access are undefined behavior. One way would be to re-enable the bounds-checking on these Cython functions just for the unit tests. (I actually posed this question on Programmers Stack Exchange and they seemed surprised that the disabling of bounds-checking was hard-coded rather than something that can be switched on and off)

jreback · 2015-06-14T12:38:54Z

@Garrett-R then that is a reasonable test. Even if it doesn't segfault on some platforms that is ok. Note that you can change the directive at run-time, see here, e.g. you would have a flag that is passed in to re-enable this. But to be honest it is overkill, and not necessary to tests with that. Since you have an example that fails, that can serve as a smoke tests (e.g. the test is simply running that code, if it doesn't segfault then the test passes).

jreback · 2015-06-14T12:40:08Z

@Garrett-R No objection to a separate PR for adding a warning to generated.pyx (and more importantly prob a note in internals.rst about how/what to change. thanks!

jreback · 2015-06-14T12:47:33Z

pandas/tseries/tests/test_resample.py

+        # These were sometimes causing a segfault (for the functions with
+        # bounds-checking disabled) or an IndexError.  We just run them to
+        # ensure they no longer do.  (#10228)
+        index = pd.DatetimeIndex([])


I would actually make a couple of loops here, iterate over all of the index types, then all of the possible dtypes (object,float,datetime if you are adventurous - only works for some of the hows), then all of the 'hows' and exhaustivly test the possibilities

I would actually define these functions in utils.testing (right below the make index) functions

def all_index_factory(): return [ tm.makeIntIndex, tm.makeFloatIndex, tm.makeStringIndex, tm.makeUnicodeIndex, tm.makeDateIndex, tm.makePeriodIndex, tm.makeTimedeltaIndex, tm.makeBoolIndex, tm.makeCategoricalIndex] def datetimelike_index_factory(): return [ tm.makeDateIndex, tm.makePeriodIndex, tm.makeTimedeltaIndex ]

obviously use the 2nd one

Cool, I ended up making these generators that returned the index instances, which I thought would keep the test cleaner. Let me know if you prefer to just return a list of functions though and I can change it.

Garrett-R · 2015-06-18T05:28:37Z

@jreback, good call about adding more comprehensive testing. This helped me find some more out-of-bounds bugs (which were in algos.pyx).

I'll send a PR soon to supply a "DO NOT EDIT THIS GENERATED FILE...." warning for other fellow noobs.

jreback · 2015-06-26T23:24:27Z

merged via 25fc49d

thanks!

nice fix.

btw if you'd like to update the docs w.r.t. the generate_code.pyx would be gr8. pls put in internals.rst

Garrett-R force-pushed the fix_10228 branch from b723329 to cf89945 Compare June 12, 2015 08:27

jreback reviewed Jun 12, 2015
View reviewed changes

Garrett-R force-pushed the fix_10228 branch from cf89945 to c16edd4 Compare June 14, 2015 09:06

jreback added Bug Resample resample method labels Jun 14, 2015

jreback reviewed Jun 14, 2015
View reviewed changes

Garrett-R force-pushed the fix_10228 branch from c16edd4 to 0e4e80d Compare June 18, 2015 05:21

BUG: pandas-dev#10228 resampling empty Series caused segfaults

2309bbb

Garrett-R force-pushed the fix_10228 branch from 0e4e80d to 2309bbb Compare June 18, 2015 05:25

jreback added this to the 0.17.0 milestone Jun 26, 2015

jreback closed this Jun 26, 2015

jreback mentioned this pull request Jun 26, 2015

resample() with how=count causes Segmentation Fault #10228

Closed

Garrett-R deleted the fix_10228 branch June 27, 2015 18:29

Garrett-R mentioned this pull request Jun 27, 2015

DOC: Add warning for newbs not to edit auto-generated file #10456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG #10228: segfault due to out-of-bounds in binning #10337

BUG #10228: segfault due to out-of-bounds in binning #10337

Garrett-R commented Jun 12, 2015

jreback commented Jun 12, 2015

jreback Jun 12, 2015

Garrett-R Jun 14, 2015

Garrett-R commented Jun 14, 2015

jreback commented Jun 14, 2015

jreback commented Jun 14, 2015

jreback Jun 14, 2015

Garrett-R Jun 18, 2015

Garrett-R commented Jun 18, 2015

jreback commented Jun 26, 2015

BUG #10228: segfault due to out-of-bounds in binning #10337

BUG #10228: segfault due to out-of-bounds in binning #10337

Conversation

Garrett-R commented Jun 12, 2015

jreback commented Jun 12, 2015

jreback Jun 12, 2015

Choose a reason for hiding this comment

Garrett-R Jun 14, 2015

Choose a reason for hiding this comment

Garrett-R commented Jun 14, 2015

jreback commented Jun 14, 2015

jreback commented Jun 14, 2015

jreback Jun 14, 2015

Choose a reason for hiding this comment

Garrett-R Jun 18, 2015

Choose a reason for hiding this comment

Garrett-R commented Jun 18, 2015

jreback commented Jun 26, 2015