Fixed GIL regressions with Cython 3 #55650

WillAyd · 2023-10-23T17:22:42Z

The main issue was that the functions returning double were not marked noexcept, so in Cython 3.0 these now explicitly check the global Python error indicator, which requires a reacquisition of the GIL

The changes to to_fw_string were to fix warnings generated by Cython that subsequently appeared:

 [2/4] Compiling Cython source /home/willayd/clones/pandas/pandas/_libs/parsers.pyx
  performance hint: /home/willayd/clones/pandas/pandas/_libs/parsers.pyx:1622:5: Exception check on '_to_fw_string_nogil' will always require the GIL to be acquired.
  Possible solutions:
        1. Declare the function as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
        2. Use an 'int' return type on the function to allow an error code to be returned.
  performance hint: /home/willayd/clones/pandas/pandas/_libs/parsers.pyx:1617:27: Exception check will always require the GIL to be acquired.
  Possible solutions:
        1. Declare the function as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
        2. Use an 'int' return type on the function to allow an error code to be returned.
  performance hint: /home/willayd/clones/pandas/pandas/_libs/parsers.pyx:1707:42: Exception check will always require the GIL to be acquired. Declare the function as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.
  performance hint: /home/willayd/clones/pandas/pandas/_libs/parsers.pyx:1731:38: Exception check will always require the GIL to be acquired. Declare the function as 'noexcept' if you control the definition and you're sure you don't want the function to raise exceptions.

We maybe shouldn't just be swallowing errors like this, but that is a different problem for a different day...for now this should get us back to 0.29 performance

rhshadrach

I'm seeing 4 more instances of nogil without noexcept, 3 in parsers.pyx and one in offsets.pyx. Add these as well?

WillAyd · 2023-10-23T21:15:45Z

Which functions are you looking at? In theory I think we only want to do this for non-integral return types. The integral returning functions should be checking the error sentinel first before trying to acquire the GIL, which I think is more reasonable than noexcept

lithomas1 · 2023-10-24T15:02:45Z

pandas/_libs/parsers.pyx

@@ -1605,7 +1605,7 @@ cdef _categorical_convert(parser_t *parser, int64_t col,

 # -> ndarray[f'|S{width}']
 cdef _to_fw_string(parser_t *parser, int64_t col, int64_t line_start,
-                   int64_t line_end, int64_t width):
+                   int64_t line_end, int64_t width) noexcept:


Do we have to put noexcept even when there's no nogil?

Whether or not the GIL is in play you will now be doing a PyErr_Occurred() after each invocation, which causes Python runtime interaction and will be "slower". With that said I didn't benchmark this particular function - just reset it to the original .29 behavior. I'll try to run some tests later though to see if it matters; it would be better to check for exceptions in case the numpy call fails, or this gets refactored in the future.

Generally I'd consider the use of void functions in our Cython code bad practice, unless failure within the function absolutely cannot be handled (ex: in destructors). It would be better to always return an int for error handling, which Cython can check without Python runtime interaction

lithomas1

Thanks Will.

mroeschke · 2023-10-24T16:08:32Z

Thanks @WillAyd

rhshadrach · 2023-10-24T20:38:01Z

pandas/_libs/parsers.pyx

 cdef int _try_double_nogil(parser_t *parser,
                           float64_t (*double_converter)(
                               const char *, char **, char,
-                               char, char, int, int *, int *) nogil,
+                               char, char, int, int *, int *) noexcept nogil,


Based on #55650 (comment), don't we not want noexcept here? It's still not clear to me why we don't though.

Good callout - this PR just got us back to 0.29 behavior but we could do better with the error handling. The problem is the body of the function returns 1 on error and doesn't set a Python exception. We at least would need to return -1 and I think set a Python exception in an ideal world

rhshadrach · 2023-10-24T20:40:21Z

Which functions are you looking at? In theory I think we only want to do this for non-integral return types. The integral returning functions should be checking the error sentinel first before trying to acquire the GIL, which I think is more reasonable than noexcept

They were indeed all integral return types. Can you explain further the italic bit above? Is the "should be checking" bit done by the caller? Isn't whether an exception was raised checked before we get back to the caller?

Fixed GIL regressions with Cython 3

970aa80

mroeschke added Interval Interval data type Internals Related to non-user accessible pandas implementation and removed Interval Interval data type labels Oct 23, 2023

rhshadrach reviewed Oct 23, 2023

View reviewed changes

lithomas1 reviewed Oct 24, 2023

View reviewed changes

lithomas1 approved these changes Oct 24, 2023

View reviewed changes

lithomas1 added this to the 2.2 milestone Oct 24, 2023

mroeschke approved these changes Oct 24, 2023

View reviewed changes

mroeschke merged commit 6f950c1 into pandas-dev:main Oct 24, 2023

WillAyd deleted the fix-gil-regression branch October 24, 2023 16:17

rhshadrach reviewed Oct 24, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed GIL regressions with Cython 3 #55650

Fixed GIL regressions with Cython 3 #55650

WillAyd commented Oct 23, 2023

rhshadrach left a comment

WillAyd commented Oct 23, 2023

lithomas1 Oct 24, 2023

WillAyd Oct 24, 2023

lithomas1 left a comment

mroeschke commented Oct 24, 2023

rhshadrach Oct 24, 2023

WillAyd Oct 24, 2023

rhshadrach commented Oct 24, 2023

Fixed GIL regressions with Cython 3 #55650

Fixed GIL regressions with Cython 3 #55650

Conversation

WillAyd commented Oct 23, 2023

rhshadrach left a comment

Choose a reason for hiding this comment

WillAyd commented Oct 23, 2023

lithomas1 Oct 24, 2023

Choose a reason for hiding this comment

WillAyd Oct 24, 2023

Choose a reason for hiding this comment

lithomas1 left a comment

Choose a reason for hiding this comment

mroeschke commented Oct 24, 2023

rhshadrach Oct 24, 2023

Choose a reason for hiding this comment

WillAyd Oct 24, 2023

Choose a reason for hiding this comment

rhshadrach commented Oct 24, 2023