-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CI: py 3.10 build failing #41935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Confusing, nothing feels like it quite adds up? The first branch should not fail, then the second branch fails but works with a DType (before the dtype object is created?). And then mentions a boolean DType... The only way I can make sense of the error (and reproduce it), is by "monkeypatching" np.core.numeric.dtype = np.dtype("?") Is there anything else to this? I just tried on a fresh compile of the 3.10 python branch ( |
Not sure how to get a py3.10 running with pandas (I got numpy). If you got that running, I would suggest running pytest with Something is weird, but if |
@jbrockmendel I think I've figured it out. So it turns out that sys.setprofile, which is called in our tests for read_csv, is somehow changing the value of np.core.numeric.dtype. In #43910, where I skip this test, the Python 3.10 tests all pass. One explanation might be that we are not resetting sys.setprofile back correctly, but the sys.setprofile(None) call should be the correct way to reset it back. I will continue looking into this. cc @mzeitlin11 |
Thanks for looking into this @lithomas1. Only other possibility I can think of beyond what you mention is there's an actual bug here in |
@seberg any plausible way sys.setprofile would affect |
I certainly don't see it. Unless there is something else that modifies numpy global state for any reason? Would be interesting to know, but I don't have an idea for a lead. |
@lithomas1 did you make any progress figuring this out? Looks like some of the |
Test is still skipped. Didn't have time to look into it more, but we should probably fix for Python 3.11 at least. pandas/pandas/tests/io/parser/common/test_common_basic.py Lines 679 to 695 in 193ca73
|
For me, it fails in this line of code:
|
I've run into this problem while trying to debug (in PyCharm) some code that uses pandas 1.3.5, and I was able to create a minimal reproducible example: import sys
import numpy as np
import pandas as pd
from numpy.core import numeric
def trace(frame, event, arg):
return trace
sys.settrace(trace) # This call isn't necessary when debugging.
arrays = [np.array([1, 2]), np.array([3, 4])]
index = pd.MultiIndex.from_arrays(arrays, names=["iA", "iB"])
dtype_class = numeric.dtype
print(f"Before DataFrame:\n {numeric.dtype=}\n {type(numeric.dtype)=}")
a = pd.DataFrame(
data={"C1": np.array([10.0, 20.0]), "C2": np.array([30.0, 40.0])},
index=index,
)
# This import fails:
# import scipy.linalg.lapack
# But this check is simpler:
print(f"After DataFrame:\n {numeric.dtype=}\n {type(numeric.dtype)=}")
assert numeric.dtype is dtype_class Note that pandas is changing the value of
If we comment out
If we uncomment Traceback (most recent call last):
File "C:\dev\bug\.venv\lib\site-packages\numpy\core\getlimits.py", line 649, in __init__
self.dtype = numeric.dtype(int_type)
TypeError: 'NoneType' object is not callable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\__init__.py", line 195, in <module>
from .misc import *
File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\misc.py", line 4, in <module>
from .lapack import get_lapack_funcs
File "C:\dev\bug\.venv\lib\site-packages\scipy\linalg\lapack.py", line 990, in <module>
_int32_max = _np.iinfo(_np.int32).max
File "C:\dev\bug\.venv\lib\site-packages\numpy\core\getlimits.py", line 651, in __init__
self.dtype = numeric.dtype(type(int_type))
TypeError: 'NoneType' object is not callable Using a custom trace function, I've pinpointed that the global This call goes into Cython code and, looking at it, I've found this suspicious assignment that may be the cause (but I'm not sure, as there's also the weird issue that this only happens when there's a tracer or a profiler): https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L2655 (also another one here: https://github.com/pandas-dev/pandas/blob/v1.3.5/pandas/_libs/lib.pyx#L1429). (Also, not sure why would an assignment like that change a global in a |
Great debugging there! Still utterly puzzling :). Just to note, I can reproduce the example in python3.10.1, but not python3.9.0 (Maybe we knew that long ago). Further, it does not matter whether I run python compiled for debugging. So we know that this is sensitive to python3.10 and has to do with tracing being active? We also know it is probably related to Cython. And I feel I have heard about tricky changes in Python 3.10 that affected cython? It feels like it is probably time to open either a python or cython issue about this? |
Aha, I think I have a lead... To cut it down more, this line is sufficient to trigger the issue: pd._libs.lib.maybe_convert_objects(np.array([None], dtype=object)) And that should end up side-stepping almost all code in Not feeling like getting pandas dev setup running right now, but there is one line here:
which is called from cython using: /* "pandas/_libs/lib.pyx":2441
* uints = np.empty(n, dtype='u8')
* bools = np.empty(n, dtype=np.uint8)
* mask = np.full(n, False) # <<<<<<<<<<<<<<
*
* if convert_datetime:
*/
__Pyx_GetModuleGlobalName(__pyx_t_6, __pyx_n_s_np); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_6);
__pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_t_6, __pyx_n_s_full); if (unlikely(!__pyx_t_5)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_5);
__Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
__pyx_t_6 = PyInt_FromSsize_t(__pyx_v_n); if (unlikely(!__pyx_t_6)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_6);
__pyx_t_2 = NULL;
__pyx_t_8 = 0;
if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_5))) {
__pyx_t_2 = PyMethod_GET_SELF(__pyx_t_5);
if (likely(__pyx_t_2)) {
PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5);
__Pyx_INCREF(__pyx_t_2);
__Pyx_INCREF(function);
__Pyx_DECREF_SET(__pyx_t_5, function);
__pyx_t_8 = 1;
}
}
#if CYTHON_FAST_PYCALL
if (PyFunction_Check(__pyx_t_5)) {
PyObject *__pyx_temp[3] = {__pyx_t_2, __pyx_t_6, Py_False};
__pyx_t_15 = __Pyx_PyFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_GOTREF(__pyx_t_15);
__Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
} else
#endif
#if CYTHON_FAST_PYCCALL
if (__Pyx_PyFastCFunction_Check(__pyx_t_5)) {
PyObject *__pyx_temp[3] = {__pyx_t_2, __pyx_t_6, Py_False};
__pyx_t_15 = __Pyx_PyCFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
__Pyx_GOTREF(__pyx_t_15);
__Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
} else
#endif
{
__pyx_t_1 = PyTuple_New(2+__pyx_t_8); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
if (__pyx_t_2) {
__Pyx_GIVEREF(__pyx_t_2); PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_t_2); __pyx_t_2 = NULL;
}
__Pyx_GIVEREF(__pyx_t_6);
PyTuple_SET_ITEM(__pyx_t_1, 0+__pyx_t_8, __pyx_t_6);
__Pyx_INCREF(Py_False);
__Pyx_GIVEREF(Py_False);
PyTuple_SET_ITEM(__pyx_t_1, 1+__pyx_t_8, Py_False);
__pyx_t_6 = 0;
__pyx_t_15 = __Pyx_PyObject_Call(__pyx_t_5, __pyx_t_1, NULL); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 2441, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_15);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
}
__Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
__pyx_v_mask = __pyx_t_15;
__pyx_t_15 = 0; Now, that may be nothing, but EDIT: Continuing down the rabbit hole a bit. In fact the value is mutated by the time the trace function says that EDIT2: I opened a Python issue here: https://bugs.python.org/issue46451 |
Could the |
I honestly suspect now that it is a Python bug. The global gets changed during the call to |
I keep running into this or a very similar issue, outside of CI: building packages with newer cython seems to fix this issue, right? I'd like to check whether it also fixes my issue. If it doesn't I guess I have another issue. Are there builds around with the newer cython? i'm observing this with python3.10 on windows 10, |
The important part is whether tracing is enabled (i.e. typically a debugger or profiler is being used). In that case you will run into this issue. Check also cython/cython#4609 Basically, your options are to upgrade Cython (to the non-released version as of now), to use the Cython 3 alpha, or to use the correct compile time option to disable the faulty paths. |
Now that Cython 0.29.29 and 0.29.30 are out, a new pandas release ought to solve this problem, no? Would be wonderful to have this resolved. |
@seberg have you tried this without cython? I get the same issue but don't think I have cython installed (that is, I haven't explicitly installed it and can't find it anywhere on my machine).
|
It doesn't matter whether you have cython installed. It matters which cython was used for building pandas, scikit-learn, ... All of these packages need to update slowly so that you can avoid installing your own cython but still get the fix. |
@seberg @jbrockmendel Any idea when the next release is coming? I know I can resolve this by building pandas locally, but it would be great if the PyPI release worked out of the box. Thanks for your work on it! |
will be in the 1.4.3 release. discussion on release date in #46610 |
Only happen in python 3.10, so not going to force updating pandas version in requirements. Related pandas-dev/pandas#41935
* Copy demos repo stories * Exclude self-hosting-lineapy notebooks from CI * Use zip artifact store file * Skip cell due to bugs in pandas Only happen in python 3.10, so not going to force updating pandas version in requirements. Related pandas-dev/pandas#41935 Co-authored-by: Humble bot servant <[email protected]>
* bump up the version * copying over the items from platform-demo * minor edits to the self-hosting-demo * Update README.md * Update README.md * update demo notebook tagging * Add tutorials to docs * first attempt at the github action. flood gates will open after MJL finishes his copy * also copy our favourite tutorial for the same treatment as other demos. * Add artifact store file for main PR (#798) * Update and reorganize RTD docs (#795) * Move intro text to pipeline's TOC section * Reorganize RTD pages related to artifact storage * Add a new section on using existing artifacts * Update index.rst minor change * Update index.rst minor tweaks * Update s3.rst minor tweak * Update s3.rst @yoonspark let me know if this modification makes sense * Fix typos; remove duplacate doc * For Postgres and S3, make clearer distinction between storing artifact values vs. metadata * Add phrasing suggested by MMA Co-authored-by: Moustafa AbdelBaky <[email protected]> * Add mkdir step before cp (#799) * Lin 621 migrate everything from demos repo to lineapy examples (#794) * Copy demos repo stories * Exclude self-hosting-lineapy notebooks from CI * Use zip artifact store file * Skip cell due to bugs in pandas Only happen in python 3.10, so not going to force updating pandas version in requirements. Related pandas-dev/pandas#41935 Co-authored-by: Humble bot servant <[email protected]> * Add tutorials to docs * Lin 621 cleanup nbval prefix (#800) * Fix mkdir demos error and remove NBVAL_prefix in demos notebooks * Exclude notebooks in demos from CI * Modify a comment to trigger demo copy (#803) * Modify a comment * Add tutorials to docs Co-authored-by: Humble bot servant <[email protected]> * get rid of is_demo. es confuso * add lineapy install at the top. * Add tutorials to docs * load lineapy as an extension. * Add tutorials to docs * refresh all the tags and rebuild demos. * Add tutorials to docs * two actions were competing with each other. arranging for both demos and docs folder to sync together and commit only once (might need to merge into main for this to work) * add explicit image for colab * Refresh demos folder and update docs Co-authored-by: Moustafa <[email protected]> Co-authored-by: dorx <[email protected]> Co-authored-by: Humble bot servant <[email protected]> Co-authored-by: Mingjer Lee <[email protected]> Co-authored-by: Sangyoon Park <[email protected]>
* bump up the version * copying over the items from platform-demo * minor edits to the self-hosting-demo * Update README.md * Update README.md * update demo notebook tagging * Add tutorials to docs * first attempt at the github action. flood gates will open after MJL finishes his copy * also copy our favourite tutorial for the same treatment as other demos. * Add artifact store file for main PR (#798) * Update and reorganize RTD docs (#795) * Move intro text to pipeline's TOC section * Reorganize RTD pages related to artifact storage * Add a new section on using existing artifacts * Update index.rst minor change * Update index.rst minor tweaks * Update s3.rst minor tweak * Update s3.rst @yoonspark let me know if this modification makes sense * Fix typos; remove duplacate doc * For Postgres and S3, make clearer distinction between storing artifact values vs. metadata * Add phrasing suggested by MMA Co-authored-by: Moustafa AbdelBaky <[email protected]> * Add mkdir step before cp (#799) * Lin 621 migrate everything from demos repo to lineapy examples (#794) * Copy demos repo stories * Exclude self-hosting-lineapy notebooks from CI * Use zip artifact store file * Skip cell due to bugs in pandas Only happen in python 3.10, so not going to force updating pandas version in requirements. Related pandas-dev/pandas#41935 Co-authored-by: Humble bot servant <[email protected]> * Add tutorials to docs * Lin 621 cleanup nbval prefix (#800) * Fix mkdir demos error and remove NBVAL_prefix in demos notebooks * Exclude notebooks in demos from CI * Modify a comment to trigger demo copy (#803) * Modify a comment * Add tutorials to docs Co-authored-by: Humble bot servant <[email protected]> * get rid of is_demo. es confuso * add lineapy install at the top. * Add tutorials to docs * load lineapy as an extension. * Add tutorials to docs * refresh all the tags and rebuild demos. * Add tutorials to docs * two actions were competing with each other. arranging for both demos and docs folder to sync together and commit only once (might need to merge into main for this to work) * add explicit image for colab * Refresh demos folder and update docs * start using lineapy/main images for colab * Refresh demos folder and update docs * remove references to demos repo from readme. * remnove the referrences to binder. update bitly link for api basics. * Refresh demos folder and update docs * update links pointing to demos repo back to lineapy. * delete unused files inside docs * move demos folder to be .colab * Update the bitly links * bump up version to 0.2.1 * demos was renamed. Co-authored-by: Moustafa <[email protected]> Co-authored-by: dorx <[email protected]> Co-authored-by: Humble bot servant <[email protected]> Co-authored-by: Mingjer Lee <[email protected]> Co-authored-by: Sangyoon Park <[email protected]>
@seberg this build is using numpy 1.22dev, looks like a bunch of the failures are raising in
np.iinfo(np.int64).max
The text was updated successfully, but these errors were encountered: