Skip to content

Fixed GIL regressions with Cython 3 #55650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 24, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions pandas/_libs/parsers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ cdef extern from "pandas/parser/tokenizer.h":
# pick one, depending on whether the converter requires GIL
double (*double_converter)(const char *, char **,
char, char, char,
int, int *, int *) nogil
int, int *, int *) noexcept nogil

# error handling
char *warn_msg
Expand Down Expand Up @@ -1605,7 +1605,7 @@ cdef _categorical_convert(parser_t *parser, int64_t col,

# -> ndarray[f'|S{width}']
cdef _to_fw_string(parser_t *parser, int64_t col, int64_t line_start,
int64_t line_end, int64_t width):
int64_t line_end, int64_t width) noexcept:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to put noexcept even when there's no nogil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not the GIL is in play you will now be doing a PyErr_Occurred() after each invocation, which causes Python runtime interaction and will be "slower". With that said I didn't benchmark this particular function - just reset it to the original .29 behavior. I'll try to run some tests later though to see if it matters; it would be better to check for exceptions in case the numpy call fails, or this gets refactored in the future.

Generally I'd consider the use of void functions in our Cython code bad practice, unless failure within the function absolutely cannot be handled (ex: in destructors). It would be better to always return an int for error handling, which Cython can check without Python runtime interaction

cdef:
char *data
ndarray result
Expand All @@ -1621,7 +1621,7 @@ cdef _to_fw_string(parser_t *parser, int64_t col, int64_t line_start,

cdef void _to_fw_string_nogil(parser_t *parser, int64_t col,
int64_t line_start, int64_t line_end,
size_t width, char *data) nogil:
size_t width, char *data) noexcept nogil:
cdef:
int64_t i
coliter_t it
Expand Down Expand Up @@ -1677,7 +1677,7 @@ cdef _try_double(parser_t *parser, int64_t col,
cdef int _try_double_nogil(parser_t *parser,
float64_t (*double_converter)(
const char *, char **, char,
char, char, int, int *, int *) nogil,
char, char, int, int *, int *) noexcept nogil,
Comment on lines 1677 to +1680
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #55650 (comment), don't we not want noexcept here? It's still not clear to me why we don't though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout - this PR just got us back to 0.29 behavior but we could do better with the error handling. The problem is the body of the function returns 1 on error and doesn't set a Python exception. We at least would need to return -1 and I think set a Python exception in an ideal world

int64_t col, int64_t line_start, int64_t line_end,
bint na_filter, kh_str_starts_t *na_hashset,
bint use_na_flist,
Expand Down