-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Segmentation fault with read_csv(io.StringIO("a\na"), float_precision="round_trip")
#15140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It appears to have died while trying to read the thread state: oldtype = tstate->curexc_type;
with nogil:
error = _try_double_nogil(parser, col, line_start, line_end,
na_filter, na_hashset, use_na_flist,
na_fset, NA, data, &na_count) This is problematic, as the double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
int skip_trailing) {
#if PY_VERSION_HEX >= 0x02070000
return PyOS_string_to_double(p, q, 0);
#else
return strtod(p, q);
#endif
} The funny thing is that even if I delete |
This appears to work around the problem (kept the GIL and silenced the Python exception): diff --git a/pandas/parser.pyx b/pandas/parser.pyx
index bd793c98e..dc0292e5b 100644
--- a/pandas/parser.pyx
+++ b/pandas/parser.pyx
@@ -1699,10 +1699,9 @@ cdef _try_double(parser_t *parser, int col, int line_start, int line_end,
result = np.empty(lines, dtype=np.float64)
data = <double *> result.data
na_fset = kset_float64_from_list(na_flist)
- with nogil:
- error = _try_double_nogil(parser, col, line_start, line_end,
- na_filter, na_hashset, use_na_flist,
- na_fset, NA, data, &na_count)
+ error = _try_double_nogil(parser, col, line_start, line_end,
+ na_filter, na_hashset, use_na_flist,
+ na_fset, NA, data, &na_count)
kh_destroy_float64(na_fset)
if error != 0:
return None, None
diff --git a/pandas/src/parser/tokenizer.c b/pandas/src/parser/tokenizer.c
index 87e17fe5f..77c36ef8a 100644
--- a/pandas/src/parser/tokenizer.c
+++ b/pandas/src/parser/tokenizer.c
@@ -1774,7 +1774,9 @@ double precise_xstrtod(const char *str, char **endptr, char decimal, char sci,
double round_trip(const char *p, char **q, char decimal, char sci, char tsep,
int skip_trailing) {
#if PY_VERSION_HEX >= 0x02070000
- return PyOS_string_to_double(p, q, 0);
+ double r = PyOS_string_to_double(p, q, 0);
+ PyErr_Clear();
+ return r;
#else
return strtod(p, q);
#endif |
yeah this looks to be violating gil holding. Welcome for you to add a test / fix. prob simply best just hold the gil if float precision is specified, since this is not the default. |
I made a pull request here: #15148 |
`round_trip` calls back into Python, so the GIL must be held. It also fails to silence the Python exception, leading to spurious errors. Closes pandas-dev#15140. Author: Phil Ruffwind <[email protected]> Closes pandas-dev#15148 from Rufflewind/master and squashes the following commits: c513d2e [Phil Ruffwind] BUG: Segfault due to float_precision='round_trip'
Code Sample, a copy-pastable example if possible
The input needs to be at least two lines and must contain non-numerical data.
Experienced this problem on Arch Linux with Python 3.
Problem description
Why is the current behaviour a problem? (1) I can't parse a CSV file containing text with
round_trip
precision (2) Possible security vulnerability (3) It fills up my hard drive with core dumpsExpected Output
Nothing.
Actual Output
Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: