-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
intermittent segfault after take error #2892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Not 100% sure, but it looks like a numpy issue: In [1]: import pandas as pd, sys
In [2]: import numpy as np
In [3]: df=pd.DataFrame({'A': {1.0: 0.0, 2.0: 6.0, 3.0: 12.0},
...: 'C': {1.0: 2.0, 2.0: 8.0, 3.0: 14.0},
...: 'B': {1.0: 1.0, 2.0: 7.0, 3.0: 13.0}})
In [4]: i = df.index.view(np.ndarray)
In [5]: u = df.unstack().astype(np.int_).view(np.ndarray)
In [6]: i
Out[6]: array([1.0, 2.0, 3.0], dtype=object)
In [7]: i.take(u)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-7-8293b97d15cc> in <module>()
----> 1 i.take(u)
IndexError: index out of range for array
In [8]: i
Segmentation fault (core dumped) Line 7 only involves numpy types, although the underlying buffers have been touched by pandas (so it's possible it's happening because pandas violated a contract somewhere). Will dig in a bit more. |
Note that the >>> df.unstack()
A 1358910081 0
2 6
3 12
B 1358910081 1
2 7
3 13
C 1358910081 2
2 8
3 14 instead. |
Changing line 5 to EDIT: actually it seems like changing 4 to either option above still segfaults (at least for me), but it requires multiple calls to |
@dsm054 yes, it looks like the the index contents are being affected by |
definitely numpy bug (here's with dev numpy) In [1]: import numpy as np
In [2]: u = np.asarray([ 0, 6, 12, 1, 7, 13, 2, 8, 14])
In [3]: i = np.asarray([1.0, 2.0, 3.0]).astype(object)
In [4]: i
Out[4]: array([1.0, 2.0, 3.0], dtype=object)
In [5]: i.take(u)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-5-8293b97d15cc> in <module>()
----> 1 i.take(u)
IndexError: index 6 is out of bounds for axis 0 with size 3
In [6]: i
Segmentation fault (core dumped) |
Should we close this, then, and bump it upstream? |
well, maybe pandas should work around it by checking index bounds itself first for object arrays (at some minor efficiency cost...)? in which case we might want to leave the issue open for that. it might be important since |
why don't you replace the usage of ndarray.take in Index.take with |
yeah, that works too...thanks! (it might be marginally faster too because |
What's the status here? |
sorry for the lack of update, this turned out to be somewhat of a rabbit hole; basically, we're exposed to this same numpy bug everywhere where we call i think the right thing to do is probably the following:
but that's a pretty big change...i don't have free cycles for it right now but if someone else wants to take a look they're free to |
(btw, in case someone picks this up...there's no bounds checking in |
this is the change I made in can anyone think of a place where there are user-facing posssibilites of negative indices except:
other uses of negative indices denote labels (or hit |
well, if all the indices are generated internally and the algorithms to do so are not buggy, then it's technically superfluous to check--there's just no guarantee that that's the case |
obviously that wasn't the case in those examples. |
so why don't we do negative index conversion in |
@y-p of course :) just saying there's a case for why it wasn't there to begin with (i didn't take it out, it was turned off before I started working on it) |
generate_code.py uses @cython.boundscheck(False) across the board. maybe |
@jreback maybe, i don't know how many user-facing functions there are to look at actually, it's a bit hard to tell what's user-facing and what isn't in python sometimes...it might be less of a deal than i think it is |
@y-p i don't think we should pay for per-access bounds checking though, that'll be O(prod(shape)) overhead, unless cython is smart enough to factor out the checks |
Wouldn't it be a function of access count rather then strictly the shape of the array? probably most I'm just thinking of reducing bug surface area, perhaps integrating a boundscheck preamble into the |
So user facing take is broken as well for negative indicies (because
but works for mixed
but not in mixed with axis=0
|
so should negative indices be allowed everywhere where the user can supply indices, or just through anyway, regardless of what we allow or don't allow, we need to make sure any user-supplied indices are sanitized before being passed into functions that assume properly bounded indices, so disambiguating the two is important. @y-p maybe just add an option to generate.py to add bounds checking as a debug tool? also, i'm pretty sure it's only the |
Sure, but that's useful as a way to track down segfaults after users report them. sanitizing is good, but probably less effort to "head them off at the pass" in the #3011 was indeed groupby_int64 delegating to take_1d in numpy_helper. |
ok, i guess that's ok, but if it's just |
+1 |
oh actually, i guess that's what you meant by doing it in the preamble...yeah that's fine; actually, you can be a bit more efficient than that (for the normal case) by doing it as you iterate, instead of iterating twice...i'll do that for now until we resolve the semantics/sanitization issue |
In #3011 take_1d was in the inner loop, and it's really the caller that should have done the Probably need to do case-by-case unfortunately. |
@y-p yeah, in |
anyway, so it's still an open question here where we want to allow negative indexing. it'll probably be simpler to only allow it through |
The |
it is tested/works with |
so are we willing to break code that uses negative indices with |
@stephenwlin I was just musing if we should restrict |
yeah, |
@stephenwlin i just merged in #3027 to master. I cannot reproduce this bug (but I didn't do anything to 'fix' it), maybe the bounds checking fixed this somehow? weird correction: it hits the exception before it can segfault....because the 3 from the unstack was an out-of-bounds...but |
no that's pretty much it, that should take care of it; the only problem is if we haven't covered all the cases that we might be passing out-of-range indices into |
ok great it's funny that the take didn't blow up with an illegal access on the first action should close this issue then? |
sure, i split off another issue in case we want to do a bigger overhaul |
Was playing around and managed to get a segfault, in several versions. It's a little intermittent, and it may have to do with how things are left after an error is encountered. The same code without the the faulty take line doesn't crash for me.
The text was updated successfully, but these errors were encountered: