-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Accept callable for skiprows in read_csv #15059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Right now, I have added |
One possibility - is the GIL released when you're trying to call that function in the C code? |
@chris-b1 : Can't check at the moment, but could you explain why that would cause a segfault? |
To be honest, not sure if/how it would cause a segfault, but the general rule is you can't do anything involving a |
aeca2bd
to
c8bb881
Compare
Might be of interest or relevance: scipy/scipy #6509 |
@chris-b1 : You were right! I needed to ensure that we were in a state to call |
5f24be0
to
3581921
Compare
3581921
to
bca4a63
Compare
Current coverage is 84.75% (diff: 100%)@@ master #15059 diff @@
==========================================
Files 145 145
Lines 51220 51238 +18
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43414 43426 +12
- Misses 7806 7812 +6
Partials 0 0
|
@jreback @jorisvandenbossche @chris-b1 : Thoughts on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, couple comments on the c-engine calls
|
||
def test_skiprows_callable(self): | ||
data = 'a\n1\n2\n3\n4\n5' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add tests with bad functions (raises and returns non-bool) - in particular want to make sure error propagates from c-engine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Done.
|
||
if (self->skipfunc != NULL) { | ||
state = PyGILState_Ensure(); | ||
should_skip = PyObject_IsTrue(PyObject_CallFunction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyObject_CallFunction
returns a new reference, so I think you need to capture the result and decref it (Py_DECREF
). Related to my comment on the tests, you may also need some error-trapping here in case the function fails.
https://docs.python.org/2/c-api/object.html#c.PyObject_CallFunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch (in so many senses of the word 😄). You caught a bug with this, as attempts to trap this error revealed a bug in raise_parser_error
in which we were assuming PyErr_Fetch
always returned an Exception
-type for *value
, but that was not the case (nor is it specified in the documentation).
bca4a63
to
b9b6818
Compare
at the start of the file. | ||
|
||
If callable, the callable function will be evaluated against the row | ||
indices, returning indices where the callable function evaluates to True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"returning indices where the callable function evaluates to True" -> should this be "False" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we never "return" any indices. That should be changed. Good catch.
b9b6818
to
d15e3a3
Compare
@jreback : @jorisvandenbossche and @chris-b1 have approved these changes. I'm not sure if they're waiting for you to have a look, but if there are no other concerns, this should be good to merge I think. |
if value != NULL: | ||
old_exc = <object> value | ||
Py_XDECREF(value) | ||
raise old_exc | ||
|
||
# PyErr_Fetch only returned the error message in *value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this not necessary before? this seems like lots of hoop jumping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think because we got lucky. This should have been caught before and processed accordingly. It's annoying, but that's what happens unfortunately when the Python C API can't promise you anything.
give a perf test on this, e.g. when the skiprows is NOT a callable perf should be roughly unchanged. |
@jreback : Ran benchmarks, and no major change in performance when using |
thanks! |
Title is self-explanatory. xref pandas-dev#10882. Author: gfyoung <[email protected]> Closes pandas-dev#15059 from gfyoung/skiprows-callable and squashes the following commits: d15e3a3 [gfyoung] ENH: Accept callable for skiprows
Title is self-explanatory.
xref #10882.