intermittent segfault after take error #2892

dsm054 · 2013-02-18T17:53:53Z

Was playing around and managed to get a segfault, in several versions. It's a little intermittent, and it may have to do with how things are left after an error is encountered. The same code without the the faulty take line doesn't crash for me.

>>> import pandas as pd, sys
>>> sys.version
'2.7.3 (default, Aug  1 2012, 05:16:07) \n[GCC 4.6.3]'
>>> pd.__version__
'0.11.0.dev-14a04dd'
>>> df = pd.DataFrame({'A': {1.0: 0.0, 2.0: 6.0, 3.0: 12.0}, 'C': {1.0: 2.0, 2.0: 8.0, 3.0: 14.0}, 'B': {1.0: 1.0, 2.0: 7.0, 3.0: 13.0}})
>>> df
    A   B   C
1   0   1   2
2   6   7   8
3  12  13  14
>>> 
>>> df.take(df.unstack())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_14a04dd-py2.7-linux-i686.egg/pandas/core/frame.py", line 2951, in take
    new_index = self.index.take(indices)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_14a04dd-py2.7-linux-i686.egg/pandas/core/index.py", line 416, in take
    taken = self.view(np.ndarray).take(indexer)
IndexError: index out of range for array
>>> df.unstack()
A  2     0
   2     6
   3    12
B  2     1
   2     7
   3    13
C  2     2
   2     8
   3    14
Dtype: float64
>>> df.take(df.unstack())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_14a04dd-py2.7-linux-i686.egg/pandas/core/frame.py", line 2951, in take
    new_index = self.index.take(indices)
  File "/usr/local/lib/python2.7/dist-packages/pandas-0.11.0.dev_14a04dd-py2.7-linux-i686.egg/pandas/core/index.py", line 416, in take
    taken = self.view(np.ndarray).take(indexer)
IndexError: index out of range for array
>>> df.unstack()
Segmentation fault (core dumped)

stephenwlin · 2013-02-18T18:57:49Z

Not 100% sure, but it looks like a numpy issue:

In [1]: import pandas as pd, sys

In [2]: import numpy as np

In [3]: df=pd.DataFrame({'A': {1.0: 0.0, 2.0: 6.0, 3.0: 12.0},
   ...:                  'C': {1.0: 2.0, 2.0: 8.0, 3.0: 14.0},
   ...:                  'B': {1.0: 1.0, 2.0: 7.0, 3.0: 13.0}})

In [4]: i = df.index.view(np.ndarray)

In [5]: u = df.unstack().astype(np.int_).view(np.ndarray)

In [6]: i
Out[6]: array([1.0, 2.0, 3.0], dtype=object)

In [7]: i.take(u)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-8293b97d15cc> in <module>()
----> 1 i.take(u)

IndexError: index out of range for array

In [8]: i
Segmentation fault (core dumped)

Line 7 only involves numpy types, although the underlying buffers have been touched by pandas (so it's possible it's happening because pandas violated a contract somewhere). i.take doesn't seem to have any reason to affect the value of i though, since it returns a new object instead of modifying itself: I'm guessing numpy's implementation is modifying i for some optimization purpose but it wasn't coded in an exception-safe manner...leaving i in some invalid state.

Will dig in a bit more.

dsm054 · 2013-02-18T19:10:42Z

Note that the unstack results after the first take are already off (it shouldn't be 2,2,3). On a different system I can get

>>> df.unstack()
A  1358910081     0
   2              6
   3             12
B  1358910081     1
   2              7
   3             13
C  1358910081     2
   2              8
   3             14

instead.

stephenwlin · 2013-02-18T19:11:05Z

Changing line 5 to u = np.asarray(df.unstack().astype(np.int_).copy()).copy() still results in a segfault as does changing line 4 to either i = np.asarray(df.index.copy()) or i = np.asarray(df.index).copy() seem to make it go away. So seems either Index is violating some contract in subclassing ndarray, ndarray.take is buggy, or both...

EDIT: actually it seems like changing 4 to either option above still segfaults (at least for me), but it requires multiple calls to ndarray.take to do so.

stephenwlin · 2013-02-18T19:13:20Z

@dsm054 yes, it looks like the the index contents are being affected by ndarray.take, which seems like numpy optimization gone awry (since ndarray.take doesn't semantically mutate its self argument)

stephenwlin · 2013-02-18T20:26:33Z

definitely numpy bug (here's with dev numpy)

In [1]: import numpy as np

In [2]: u = np.asarray([ 0, 6, 12, 1, 7, 13, 2, 8, 14])

In [3]: i = np.asarray([1.0, 2.0, 3.0]).astype(object)

In [4]: i
Out[4]: array([1.0, 2.0, 3.0], dtype=object)

In [5]: i.take(u)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-8293b97d15cc> in <module>()
----> 1 i.take(u)

IndexError: index 6 is out of bounds for axis 0 with size 3

In [6]: i
Segmentation fault (core dumped)

dsm054 · 2013-02-18T20:28:44Z

Should we close this, then, and bump it upstream?

stephenwlin · 2013-02-18T21:39:42Z

well, maybe pandas should work around it by checking index bounds itself first for object arrays (at some minor efficiency cost...)? in which case we might want to leave the issue open for that.

it might be important since Index arrays are usually object-based (unless specialized via a subclass)

wesm · 2013-02-19T02:52:45Z

why don't you replace the usage of ndarray.take in Index.take with take_nd (which should not corrupt memory)?

stephenwlin · 2013-02-19T04:24:29Z

yeah, that works too...thanks! (it might be marginally faster too because ndarray.take is coded for the general nd-case but we have cython specializations for most 1-d cases)

wesm · 2013-03-12T13:54:26Z

What's the status here?

stephenwlin · 2013-03-12T16:57:52Z

sorry for the lack of update, this turned out to be somewhat of a rabbit hole; basically, we're exposed to this same numpy bug everywhere where we call ndarray.take with user-supplied indices, which turns out to be a lot of places. it doesn't make that much sense to just convert one call arbitrarily so I started doing a blanket conversion...but that turns out to be the wrong thing because ndarray.take has different semantics with negative indicies than our cythonic takes (i.e. the normal ones, rather than -1 being a dummy value for fill behavior)...lots of tests were broken in subtle ways before I realized the issue and regardless it's unclear if anyone is depending on the normal semantics already (since there's no documentation either way that this is supported officially, afaict).

i think the right thing to do is probably the following:

audit the code and try to disambiguate uses of user-supplied indices and algorithm-generated indices throughout
convert the former to either a bug-free cythonic take that does normal negative indices, or a bounds-checking wrapper around ndarray.take
(optional) if new cythonic take is used, then get rid of the whole back and forth "platform int <-> int64" conversion mess, which is basically only there to support ndarray.take.

but that's a pretty big change...i don't have free cycles for it right now but if someone else wants to take a look they're free to

stephenwlin · 2013-03-12T17:05:36Z

(btw, in case someone picks this up...there's no bounds checking in take_nd in most cases, which might be ok if we can ensure that no user-supplied indices are every passed to it, but maybe it should be addressed to be safe; probably too slow to turn on cython's built-in bounds check though, since that'll check with every access, better thing to do would be to add the minimal checks manually for a row/column of calls at a time)

jreback · 2013-03-12T17:05:43Z

this is the change I made in _ixs in frame.py (this was the old icol), as negative user-supplied indicies were incorrect (the change was all wrapped up in the .iloc changes). I am sure as @stephenwlin points out there are more user facing changes like this (prob not too many, and most prob hit this code-block anyhow)

can anyone think of a place where there are user-facing posssibilites of negative indices except:

_ixs/icol
frame/take

other uses of negative indices denote labels (or hit _ixs) AFAICT

ghost · 2013-03-12T17:08:10Z

no bound checking ocassionaly bites with a kindly segfault, #3011, #2775/#2778
are all those cython functions really in the inner loop?

stephenwlin · 2013-03-12T17:09:03Z

well, if all the indices are generated internally and the algorithms to do so are not buggy, then it's technically superfluous to check--there's just no guarantee that that's the case

ghost · 2013-03-12T17:12:29Z

obviously that wasn't the case in those examples.

jreback · 2013-03-12T17:13:30Z

so why don't we do negative index conversion in frame/take, and make that only user facing (and can call from _ixs), then @stephenwlin can fix the take bugs independently? could also have a flag (for safety sake?)

stephenwlin · 2013-03-12T17:14:16Z

@y-p of course :) just saying there's a case for why it wasn't there to begin with (i didn't take it out, it was turned off before I started working on it)

ghost · 2013-03-12T17:14:17Z

generate_code.py uses @cython.boundscheck(False) across the board. maybe
this can be revamped with the conversion to fused types.

stephenwlin · 2013-03-12T17:15:14Z

@jreback maybe, i don't know how many user-facing functions there are to look at actually, it's a bit hard to tell what's user-facing and what isn't in python sometimes...it might be less of a deal than i think it is

stephenwlin · 2013-03-12T17:16:18Z

@y-p i don't think we should pay for per-access bounds checking though, that'll be O(prod(shape)) overhead, unless cython is smart enough to factor out the checks

ghost · 2013-03-12T17:21:29Z

Wouldn't it be a function of access count rather then strictly the shape of the array? probably most
cython code does do significant work, so point taken in any case.

I'm just thinking of reducing bug surface area, perhaps integrating a boundscheck preamble into the
cython code would be better.

jreback · 2013-03-12T17:40:30Z

So user facing take is broken as well for negative indicies (because take_nd doesn like negative indices)
(and the 2 cases below have to do whether the internals method take is called or the indexers take is called

In [5]: df = pd.DataFrame(np.random.rand(8,3))

In [6]: df.take([0,1,-1])
Out[6]: 
          0         1         2
0  0.610429  0.185237  0.174880
1  0.059846  0.290267  0.012082
7       NaN       NaN       NaN

In [7]: df
Out[7]: 
          0         1         2
0  0.610429  0.185237  0.174880
1  0.059846  0.290267  0.012082
2  0.719515  0.322478  0.953638
3  0.450850  0.374526  0.781923
4  0.588323  0.951343  0.995791
5  0.385074  0.276861  0.720896
6  0.149348  0.553080  0.652363
7  0.554311  0.635818  0.148375

In [8]: df.take([0,1,-1],axis=1)
Out[8]: 
          0         1   2
0  0.610429  0.185237 NaN
1  0.059846  0.290267 NaN
2  0.719515  0.322478 NaN
3  0.450850  0.374526 NaN
4  0.588323  0.951343 NaN
5  0.385074  0.276861 NaN
6  0.149348  0.553080 NaN
7  0.554311  0.635818 NaN

but works for mixed

In [9]: df['test'] = 'foo'

In [10]: df.take([0,1,-1],axis=1)
Out[10]: 
          0         1 test
0  0.610429  0.185237  foo
1  0.059846  0.290267  foo
2  0.719515  0.322478  foo
3  0.450850  0.374526  foo
4  0.588323  0.951343  foo
5  0.385074  0.276861  foo
6  0.149348  0.553080  foo
7  0.554311  0.635818  foo

but not in mixed with axis=0

In [3]: df = pd.DataFrame(dict(A = np.random.rand(8), B = 'foo'), index=range(8))

In [4]: df.take([-1],axis=0)
Exception: Indices must be nonzero and less than the axis length

stephenwlin · 2013-03-12T18:25:18Z

so should negative indices be allowed everywhere where the user can supply indices, or just through .ix and friends?

anyway, regardless of what we allow or don't allow, we need to make sure any user-supplied indices are sanitized before being passed into functions that assume properly bounded indices, so disambiguating the two is important.

@y-p maybe just add an option to generate.py to add bounds checking as a debug tool? also, i'm pretty sure it's only the take_* functions that are a potential problem, there's almost certainly no user-supplied indicies going into pad_*, backfill_*, etc.

ghost · 2013-03-12T18:31:53Z

Sure, but that's useful as a way to track down segfaults after users report them.

sanitizing is good, but probably less effort to "head them off at the pass" in the
cython function preamble then tracking down all the code paths that lead there.

#3011 was indeed groupby_int64 delegating to take_1d in numpy_helper.

stephenwlin · 2013-03-12T18:39:22Z

ok, i guess that's ok, but if it's just take_* etc., we can at least minimize the performance hit but writing the bounds checks manually so they don't repeat checks unnecessarily, rather than relying on the cython ones.

ghost · 2013-03-12T18:43:15Z

+1

stephenwlin · 2013-03-12T18:50:49Z

oh actually, i guess that's what you meant by doing it in the preamble...yeah that's fine; actually, you can be a bit more efficient than that (for the normal case) by doing it as you iterate, instead of iterating twice...i'll do that for now until we resolve the semantics/sanitization issue

ghost · 2013-03-12T18:51:31Z

In #3011 take_1d was in the inner loop, and it's really the caller that should have done the
check in this case. Though it was another cython function.

Probably need to do case-by-case unfortunately.

stephenwlin · 2013-03-12T18:54:57Z

@y-p yeah, in take_1d there's only one loop so it needs to check every index anyway, but take_2d_* etc. can do it column-by-column or row-by-row during the outer loop

stephenwlin · 2013-03-12T19:30:16Z

anyway, so it's still an open question here where we want to allow negative indexing. it'll probably be simpler to only allow it through .ix and friends, to reduce the amount of cases we need to support and test...and with bounds checks in the cython take implementations we can probably get away with not sanitizing elsewhere (although the ideal would be to sanitize on any user-facing function and avoid the bounds check internally; not sure how obtainable that would be at this point without a thorough review)

wesm · 2013-03-12T19:32:35Z

The -1 business was done as a matter of convenience. That whole mess should be replaced with a proper INT64_MIN or INT32_MIN depending on the platform / byte width. Sigh

jreback · 2013-03-12T19:33:15Z

it is tested/works with .ix and friends, take (though I would be open to ONLY allowing with .iloc but thats a different issue)

stephenwlin · 2013-03-12T19:45:24Z

so are we willing to break code that uses negative indices with Index.take, DataFrame.take, etc? guessing there's not a whole lot of it, and it's broken already for some cases (as show by @jreback's example)

jreback · 2013-03-12T19:50:18Z

@stephenwlin DataFrame.take is fixed (for neg indicies), as well as neg inicies for .ix (which worked before), see #3027, I don't think Index.take is broken?

I was just musing if we should restrict .ix to non-negative (but maybe there is not good reason)

stephenwlin · 2013-03-12T19:52:23Z

yeah, DataFrame.take is broken because it mixes ndarray.take and take_nd calls, Index.take presumably isn't (other than possibly triggering a numpy segfault)...but it would probably make things easier to just not support it except at a few defined entry points rather than supporting it inconsistently?

jreback · 2013-03-12T23:25:04Z

@stephenwlin i just merged in #3027 to master. I cannot reproduce this bug (but I didn't do anything to 'fix' it), maybe the bounds checking fixed this somehow? weird

correction: it hits the exception before it can segfault....because the 3 from the unstack was an out-of-bounds...but
didn't you discover some other bug here?

stephenwlin · 2013-03-12T23:35:50Z

no that's pretty much it, that should take care of it; the only problem is if we haven't covered all the cases that we might be passing out-of-range indices into ndarray.take

jreback · 2013-03-12T23:49:43Z

ok great

it's funny that the take didn't blow up with an illegal access on the first action

should close this issue then?
(as u moved everything else to a new issue)

stephenwlin · 2013-03-13T00:14:08Z

sure, i split off another issue in case we want to do a bigger overhaul

stephenwlin mentioned this issue Feb 18, 2013

BUG: take on object array (sometimes?) leaves self in inconsistent state when IndexError is raised numpy/numpy#3001

Closed

stephenwlin mentioned this issue Feb 26, 2013

ENH: add .iloc attribute to provide location-based indexing #2922

Merged

jreback mentioned this issue Mar 12, 2013

BUG: Bug in user-facing take with negative indicies was incorrect #3027

Merged

ghost mentioned this issue Mar 12, 2013

Fortify cython functions against segfaults (without killing perf) #3028

Closed

stephenwlin mentioned this issue Mar 13, 2013

TODO: convert from ndarray.take to internal takes, remove platform int <-> int64 conversions? #3033

Closed

jreback closed this as completed Mar 13, 2013

intermittent segfault after take error #2892

intermittent segfault after take error #2892

Comments

dsm054 commented Feb 18, 2013

stephenwlin commented Feb 18, 2013

dsm054 commented Feb 18, 2013

stephenwlin commented Feb 18, 2013

stephenwlin commented Feb 18, 2013

stephenwlin commented Feb 18, 2013

dsm054 commented Feb 18, 2013

stephenwlin commented Feb 18, 2013

wesm commented Feb 19, 2013

stephenwlin commented Feb 19, 2013

wesm commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

jreback commented Mar 12, 2013

ghost commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

ghost commented Mar 12, 2013

jreback commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

ghost commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

ghost commented Mar 12, 2013

jreback commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

ghost commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

ghost commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

ghost commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

wesm commented Mar 12, 2013

jreback commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

jreback commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

jreback commented Mar 12, 2013

stephenwlin commented Mar 12, 2013

jreback commented Mar 12, 2013

stephenwlin commented Mar 13, 2013