-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEP: Use Cython 3.0 #55179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEP: Use Cython 3.0 #55179
Conversation
I couldn't post the ASVs at once as it was too large for a GitHub comment. ASV Perf increases
|
Maybe adopt 3.0 in one PR and create a new issue (tracker issue?) summarizing the regressions? Assuming they can be summarized... |
Looks like a lot of regressions are in strings.methods. Is there anything in the annotation there that would point to why? |
This was already done in #54335. At the sprint, the consensus was to revert until we could understand why the regressions were happening. If you have the time to have a look, that'd be great. scikit-learn also had some issues with perf regressions (I opened an issue there), their problems seem to be GIL related. |
I did a quick callgrind on the string series constructor showing the most regression. Here are the top 10 functions from Cython 2: 3,300,042,608 (17.84%) ???:get_item_pointer [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
3,200,052,169 (17.30%) ???:PyArray_Scalar [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
3,000,289,006 (16.22%) ???:__pyx_pw_6pandas_5_libs_3lib_33ensure_string_array [/home/willayd/clones/pandas/build/cp310/pandas/_libs/lib.cpython-310-x86_64-linux-gnu.so]
2,800,017,582 (15.14%) ???:array_item [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
500,000,000 ( 2.70%) ???:OBJECT_getitem [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
272,375,045 ( 1.47%) /usr/local/src/conda/python-3.10.12/Python/ceval.c:_PyEval_EvalFrameDefault'2 [/home/willayd/mambaforge/envs/pandas-dev/bin/python3.10]
221,369,714 ( 1.20%) /usr/local/src/conda/python-3.10.12/Modules/sre_lib.h:sre_ucs1_match [/home/willayd/mambaforge/envs/pandas-dev/bin/python3.10]
221,185,641 ( 1.20%) /usr/local/src/conda/python-3.10.12/Parser/tokenizer.c:tok_nextc [/home/willayd/mambaforge/envs/pandas-dev/bin/python3.10]
163,678,799 ( 0.88%) /usr/local/src/conda/python-3.10.12/Parser/tokenizer.c:tok_get [/home/willayd/mambaforge/envs/pandas-dev/bin/python3.10]
138,241,199 ( 0.75%) /usr/local/src/conda/python-3.10.12/Python/pyhash.c:siphash24 [/home/willayd/mambaforge/envs/pandas-dev/bin/python3.10] Versus with Cython 3: 11,900,079,530 (20.99%) ???:prepare_index [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
6,500,003,510 (11.47%) ???:array_subscript [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
4,323,588,712 ( 7.63%) /usr/local/src/conda/python-3.10.12/Objects/obmalloc.c:object_dealloc
3,598,715,264 ( 6.35%) ???:__Pyx_GetItemInt_Fast.constprop.0 [/home/willayd/clones/pandas/build/cp310/pandas/_libs/tslibs/dtypes.cpython-310-x86_64-linux-gnu.so]
3,300,042,608 ( 5.82%) ???:get_item_pointer [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
3,200,052,169 ( 5.65%) ???:PyArray_Scalar [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so]
2,798,272,572 ( 4.94%) /usr/local/src/conda/python-3.10.12/Objects/longobject.c:PyLong_FromSsize_t [/home/willayd/mambaforge/envs/pandas-dev/bin/python3.10]
2,400,016,776 ( 4.23%) ???:PyArray_PyIntAsIntp_ErrMsg [/home/willayd/mambaforge/envs/pandas-dev/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so
]
2,398,998,037 ( 4.23%) /usr/local/src/conda/python-3.10.12/Objects/obmalloc.c:_PyLong_New
2,200,360,008 ( 3.88%) ???:__pyx_pw_6pandas_5_libs_3lib_33ensure_string_array [/home/willayd/clones/pandas/build/cp310/pandas/_libs/lib.cpython-310-x86_64-linux-gnu.so] Interesting to see the |
At least according to callgrind this patch should help the string methods a good deal: diff --git a/pandas/_libs/lib.pyx b/pandas/_libs/lib.pyx
index 0c0610f720..b05d9b665e 100644
--- a/pandas/_libs/lib.pyx
+++ b/pandas/_libs/lib.pyx
@@ -776,6 +776,7 @@ cpdef ndarray[object] ensure_string_array(
cdef:
Py_ssize_t i = 0, n = len(arr)
bint already_copied = True
+ ndarray[object] newarr
if hasattr(arr, "to_numpy"):
@@ -800,8 +801,9 @@ cpdef ndarray[object] ensure_string_array(
# short-circuit, all elements are str
return result
+ newarr = arr
for i in range(n):
- val = arr[i]
+ val = newarr[i]
if isinstance(val, str):
continue I think the Cython3 version is doing a lot of unecessary Python interation because we never explicitly declare |
Are the "failed" asvs bc it is slow or raising. The latter seem like theyd be deal-breakers. |
It is raising. I suspect it is because of this type-hint: The ASV in question is passing |
I like @WillAyd's approach - opening separate PRs from this when the adjustments are perf neutral or positive with Cython 0.x. I'm happy to routinely update the benchmarks here as we make progress. I think once we get it whittled down a bit and feel comfortable with any remaining regressions, we can open a tracking issue for the remaining as @lukemanley suggested. |
…n_cython_everywhere
The regressions in the OP have been updated. |
…n_cython_everywhere
While we've made some progress by specifying dtypes for ndarrays, there are a number of places in the code (e.g. map_infer in lib.pyx) that are meant to handle generic ndarrays. It seems like there is a performance regression in this regard with Cython 3.0. |
If you pull the .pyi changs from #54335, the typing should go green again btw. |
@rhshadrach (The linked PR for the GIL stuff was merged for 3.0.3, but 3.0.3 also had issues compiling pandas.) I just realized that numpy 2.0 is coming out soonish (the RC will be out in December) and we need this for that, so it'd be good to take another look at what's regressed again here before that. |
…n_cython_everywhere
@lithomas1 - it seems like it might be possible, but so far I haven't gotten it to work. We can tell ASV to use
However, when I use
in the matrix, I get
I've also tried leaving out Still, I'm going to try to rerun on 3.0.3, see if I run into the build problems. |
@lithomas1 - benchmarks updated; I'm still seeing gil regressions. |
AFAICT the GIL regression might trace back to how we are using a function pointer to a double converter in our module. Here is what I believe to be the troublesome code: pandas/pandas/_libs/parsers.pyx Line 229 in f32c52d
If you look at the generated code for say /* "pandas/_libs/parsers.pyx":1709
* data[0] = double_converter(word, &p_end, parser.decimal,
* parser.sci, parser.thousands,
* 1, &error, NULL) # <<<<<<<<<<<<<<
* if error != 0 or p_end == word or p_end[0]:
* error = 0
*/
(__pyx_v_data[0]) = __pyx_v_double_converter(__pyx_v_word, (&__pyx_v_p_end), __pyx_v_parser->decimal, __pyx_v_parser->sci, __pyx_v_parser->thousands, 1, (&__pyx_v_error), NULL); Whereas the latter does /* "pandas/_libs/parsers.pyx":1709
* data[0] = double_converter(word, &p_end, parser.decimal,
* parser.sci, parser.thousands,
* 1, &error, NULL) # <<<<<<<<<<<<<<
* if error != 0 or p_end == word or p_end[0]:
* error = 0
*/
__pyx_t_6 = __pyx_v_double_converter(__pyx_v_word, (&__pyx_v_p_end), __pyx_v_parser->decimal, __pyx_v_parser->sci, __pyx_v_parser->thousands, 1, (&__pyx_v_error), NULL); if (unlikely(__Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 1707, __pyx_L1_error) I'm guessing that the latter is affected by the changes to nogil and noexcept that occurred with Cython 3, although changing the troublesome declaration to |
Hard to make an MRE for Cython though. Even this generates the proper code with 3.0.4: cdef extern from "string.h":
int strcmp(const char* s1, const char* s2) nogil
cdef int compares_foo() nogil:
return strcmp("foo", "foo")
cdef int compares_foo_noexcept() noexcept nogil:
return strcmp("foo", "foo")
ctypedef struct foo_t:
int (*fnptr)() noexcept nogil
def main():
print(compares_foo())
print(compares_foo_noexcept())
cdef foo_t foo
foo.fnptr = compares_foo_noexcept
print(foo.fnptr())
if __name__ == "__main__":
main() yields: /* "cython_repo.pyx":17
* print(compares_foo_noexcept())
* cdef foo_t foo
* foo.fnptr = compares_foo_noexcept # <<<<<<<<<<<<<<
* print(foo.fnptr())
*
*/
__pyx_v_foo.fnptr = __pyx_f_11cython_repo_compares_foo_noexcept;
/* "cython_repo.pyx":18
* cdef foo_t foo
* foo.fnptr = compares_foo_noexcept
* print(foo.fnptr()) # <<<<<<<<<<<<<<
*
*
*/
__pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_foo.fnptr()); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 18, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_builtin_print, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 18, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_3);
__Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; So the issue seems really specific to our parser code |
…n_cython_everywhere � Conflicts: � pyproject.toml
Ah wow that's a great find. Let me see if I can boil that down to an MRE and will report upstream. Thanks a ton @seberg ! |
Yeah, I linked it. It is intentional, so not sure if NumPy could short-cut a bit earlier, although you still have a few more layers either way. |
Old issue, but did you also change the signature that is part of the function signature itself? The linked one wouldn't be the one used in the example. |
That was exactly the problem - well spotted |
Performance regressions and enhancements have been updated. Looks like the series_methods were False positives, they mostly seem to have disappeared. Nice job on #55915 @WillAyd - indexing is mostly gone as well. I think we're getting very close to being able to write the rest up as issues, if we aren't already there. |
IMO the benefit of the performance increases outweigh the regressions at this point, so I would be +1 to move forward with this and move regressions to a separate issue |
…n_cython_everywhere
…n_cython_everywhere
|
I think it's time we put this in now. We can't compile against the nightly numpys anymore on Windows because of a bug in old Cython. @rhshadrach Can you merge in main to fix the docs? It also looks like a few more stubs fell out of date since the last time I fixed them. |
@lithomas1 - done. I thought I had pushed my change to Cython 3.0.5, but I've apparently been running ASVs with 3.0.4 most recently. I think this may be why I can't reproduce regressions locally (been compiling with 3.0.5). I'd like to do one more full ASV run tonight before merging. |
@lithomas1 - benchmarks posted, no significant changes. Good to merge on my end. I did open cython/cython#5821 because when I go to profile the regression it disappears. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This LGTM.
I fixed a typo (the version in pyproject.toml was "3.0.4" which I changed to "3.0.5").
I'll merge when that change become green.
thanks @rhshadrach |
nice! |
And thanks @WillAyd for all the help on fixing regressions! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.We're seeing some perf regressions (but also a lot of perf improvements) by using Cython 3.0. Not sure what the best way to approach this is - do we try to handle fixing these in a single branch along with changing to 3.0? Or do we just adopt 3.0 in one PR and try to address the regressions after?
ASV Regressions for Cython 3.0
I'm hoping theGIL regressions are now gone.gil.*
regressions will be fixed by cython/cython#5685