Skip to content

Numpydev CI Failures #35481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Jul 30, 2020 · 27 comments
Closed

Numpydev CI Failures #35481

WillAyd opened this issue Jul 30, 2020 · 27 comments
Labels
CI Continuous Integration Compat pandas objects compatability with Numpy or Python functions

Comments

@WillAyd
Copy link
Member

WillAyd commented Jul 30, 2020

It looks like some .intersection tests are failing. AFAICT this traces back to here:

return np.asarray(obj, dtype='int64'), 0

Which on numpy dev during test execution isn't raising an OverflowError and instead performs a wraparound. Interestingly enough, running:

np.asarray([2 ** 63, 2**63 + 1], dtype="int64") 

With numpydev does overflow in the Python space, so perhaps a Cython issue?

@WillAyd WillAyd added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 30, 2020
@simonjayhawkins simonjayhawkins added CI Continuous Integration Compat pandas objects compatability with Numpy or Python functions and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 30, 2020
@simonjayhawkins
Copy link
Member

The current test failures on master are

=========================== short test summary info ============================
FAILED pandas/tests/indexes/test_base.py::TestIndex::test_intersection_base[uint]
FAILED pandas/tests/indexes/test_numeric.py::TestUInt64Index::test_intersection_base[index_inc]
FAILED pandas/tests/indexes/test_numeric.py::TestUInt64Index::test_intersection_base[index_dec]
FAILED pandas/tests/indexing/test_loc.py::TestLoc2::test_loc_setitem_empty_append_raises
=== 4 failed, 70016 passed, 4046 skipped, 1018 xfailed in 464.33s (0:07:44) ====

@WillAyd
Copy link
Member Author

WillAyd commented Jul 31, 2020

I think the problem for most of these is the Cython version; looks like numpy bumped their minimum Cython version to 0.29.21 but we are still pinned to 0.29.16 on this environment.

numpy/numpy@fc9d862

@WillAyd
Copy link
Member Author

WillAyd commented Jul 31, 2020

Hmm so maybe the version isn't it. When bissecting numpy I stopped between 86fcce6 and fc9d862 so one other possible culprit is numpy/numpy#16200

@WillAyd
Copy link
Member Author

WillAyd commented Jul 31, 2020

OK after I bisect I am fairly certain the failures here trace back to numpy/numpy#16200 .

@seberg not sure if you have any idea on the OP

@bashtage
Copy link
Contributor

bashtage commented Aug 3, 2020

Candidate fix in #35485

@seberg
Copy link
Contributor

seberg commented Aug 3, 2020

@WillAyd what is obj here? There is a change here, but it should only kick in if obj happens to be a larger integer NumPy scalar. I.e. [np.uint64(-1)] in this case.

Its not really ideal behaviour, and my hope is that by the actual NumPy release we may have a warning in-place for this type of casts (to basically remove it again). The reason for the change is, that np.array(np.uint64(-1)) always did wrap-around.

@bashtage
Copy link
Contributor

bashtage commented Aug 3, 2020

The issue I saw, assuming I'm talking about the same thing, was happening when something like

[np.empty(0,dtype="object"),np.empty(0,dtype="object")]

was being assigned to a NumPy object array using a slice. Assigning item-by-item avoids the coercion.

@seberg
Copy link
Contributor

seberg commented Aug 3, 2020

@bashtage, hmmm, my examples was explicitly to explain the OverflowError can you give a full example, of the change there? It does look like something which may have changed, but I am not quite sure what change you are seeing exactly.

@bashtage
Copy link
Contributor

bashtage commented Aug 3, 2020

Perhaps this was a different issue.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 3, 2020

@WillAyd what is obj here? There is a change here, but it should only kick in if obj happens to be a larger integer NumPy scalar. I.e. [np.uint64(-1)] in this case.

In the failing test case obj is a list containing Python integers that exceed 2 ** 63 -1

@seberg
Copy link
Contributor

seberg commented Aug 3, 2020

@WillAyd, yeah, but as you noted, the example you gave gives the overflow error (as it should), so I am curious what incarnation this actually boils down to (maybe even on the C-level, due to cython?), so that it uses casting. One possibility would be that it actually ends up calling:

tmp = np.asarray([2 ** 63, 2**63+1])
arr = np.asarray(tmp, dtype="int64")

Which does not explain any change in behaviour, however. The only explenation I have would be for uint64 scalars cropping up somewhere, which would be something like:

tmp = np.asarray([2 ** 63, 2**63+1])
for i in range(len(tmp)):
    arr]i] = tmp[i]

which seems weird...

Do you have any idea what code is actually called deep down here? (even the C-code?)

@WillAyd
Copy link
Member Author

WillAyd commented Aug 3, 2020

Here is the entire function as generated by Cython (I added a print statement for inspection)

static PyObject *__pyx_pf_6pandas_5_libs_3lib_34clean_index_list(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v_obj) {
  Py_ssize_t __pyx_v_i;
  Py_ssize_t __pyx_v_n;
  PyObject *__pyx_v_val = 0;
  int __pyx_v_all_arrays;
  PyObject *__pyx_v_inferred = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  Py_ssize_t __pyx_t_1;
  Py_ssize_t __pyx_t_2;
  Py_ssize_t __pyx_t_3;
  PyObject *__pyx_t_4 = NULL;
  int __pyx_t_5;
  int __pyx_t_6;
  int __pyx_t_7;
  PyObject *__pyx_t_8 = NULL;
  PyObject *__pyx_t_9 = NULL;
  PyObject *__pyx_t_10 = NULL;
  PyObject *__pyx_t_11 = NULL;
  PyObject *__pyx_t_12 = NULL;
  PyObject *__pyx_t_13 = NULL;
  int __pyx_t_14;
  PyObject *__pyx_t_15 = NULL;
  PyObject *__pyx_t_16 = NULL;
  PyObject *__pyx_t_17 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("clean_index_list", 0);

  /* "pandas/_libs/lib.pyx":664
 *     """
 *     cdef:
 *         Py_ssize_t i, n = len(obj)             # <<<<<<<<<<<<<<
 *         object val
 *         bint all_arrays = True
 */
  if (unlikely(__pyx_v_obj == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "object of type 'NoneType' has no len()");
    __PYX_ERR(0, 664, __pyx_L1_error)
  }
  __pyx_t_1 = PyList_GET_SIZE(__pyx_v_obj); if (unlikely(__pyx_t_1 == ((Py_ssize_t)-1))) __PYX_ERR(0, 664, __pyx_L1_error)
  __pyx_v_n = __pyx_t_1;

  /* "pandas/_libs/lib.pyx":666
 *         Py_ssize_t i, n = len(obj)
 *         object val
 *         bint all_arrays = True             # <<<<<<<<<<<<<<
 * 
 *     for i in range(n):
 */
  __pyx_v_all_arrays = 1;

  /* "pandas/_libs/lib.pyx":668
 *         bint all_arrays = True
 * 
 *     for i in range(n):             # <<<<<<<<<<<<<<
 *         val = obj[i]
 *         if not (isinstance(val, list) or
 */
  __pyx_t_1 = __pyx_v_n;
  __pyx_t_2 = __pyx_t_1;
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
    __pyx_v_i = __pyx_t_3;

    /* "pandas/_libs/lib.pyx":669
 * 
 *     for i in range(n):
 *         val = obj[i]             # <<<<<<<<<<<<<<
 *         if not (isinstance(val, list) or
 *                 util.is_array(val) or hasattr(val, '_data')):
 */
    if (unlikely(__pyx_v_obj == Py_None)) {
      PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
      __PYX_ERR(0, 669, __pyx_L1_error)
    }
    __pyx_t_4 = PyList_GET_ITEM(__pyx_v_obj, __pyx_v_i);
    __Pyx_INCREF(__pyx_t_4);
    __Pyx_XDECREF_SET(__pyx_v_val, __pyx_t_4);
    __pyx_t_4 = 0;

    /* "pandas/_libs/lib.pyx":670
 *     for i in range(n):
 *         val = obj[i]
 *         if not (isinstance(val, list) or             # <<<<<<<<<<<<<<
 *                 util.is_array(val) or hasattr(val, '_data')):
 *             all_arrays = False
 */
    __pyx_t_6 = PyList_Check(__pyx_v_val); 
    __pyx_t_7 = (__pyx_t_6 != 0);
    if (!__pyx_t_7) {
    } else {
      __pyx_t_5 = __pyx_t_7;
      goto __pyx_L6_bool_binop_done;
    }

    /* "pandas/_libs/lib.pyx":671
 *         val = obj[i]
 *         if not (isinstance(val, list) or
 *                 util.is_array(val) or hasattr(val, '_data')):             # <<<<<<<<<<<<<<
 *             all_arrays = False
 *             break
 */
    __pyx_t_7 = (__pyx_f_6pandas_5_libs_6tslibs_4util_is_array(__pyx_v_val) != 0);
    if (!__pyx_t_7) {
    } else {
      __pyx_t_5 = __pyx_t_7;
      goto __pyx_L6_bool_binop_done;
    }
    __pyx_t_7 = __Pyx_HasAttr(__pyx_v_val, __pyx_n_u_data); if (unlikely(__pyx_t_7 == ((int)-1))) __PYX_ERR(0, 671, __pyx_L1_error)
    __pyx_t_6 = (__pyx_t_7 != 0);
    __pyx_t_5 = __pyx_t_6;
    __pyx_L6_bool_binop_done:;

    /* "pandas/_libs/lib.pyx":670
 *     for i in range(n):
 *         val = obj[i]
 *         if not (isinstance(val, list) or             # <<<<<<<<<<<<<<
 *                 util.is_array(val) or hasattr(val, '_data')):
 *             all_arrays = False
 */
    __pyx_t_6 = ((!__pyx_t_5) != 0);
    if (__pyx_t_6) {

      /* "pandas/_libs/lib.pyx":672
 *         if not (isinstance(val, list) or
 *                 util.is_array(val) or hasattr(val, '_data')):
 *             all_arrays = False             # <<<<<<<<<<<<<<
 *             break
 * 
 */
      __pyx_v_all_arrays = 0;

      /* "pandas/_libs/lib.pyx":673
 *                 util.is_array(val) or hasattr(val, '_data')):
 *             all_arrays = False
 *             break             # <<<<<<<<<<<<<<
 * 
 *     if all_arrays:
 */
      goto __pyx_L4_break;

      /* "pandas/_libs/lib.pyx":670
 *     for i in range(n):
 *         val = obj[i]
 *         if not (isinstance(val, list) or             # <<<<<<<<<<<<<<
 *                 util.is_array(val) or hasattr(val, '_data')):
 *             all_arrays = False
 */
    }
  }
  __pyx_L4_break:;

  /* "pandas/_libs/lib.pyx":675
 *             break
 * 
 *     if all_arrays:             # <<<<<<<<<<<<<<
 *         return obj, all_arrays
 * 
 */
  __pyx_t_6 = (__pyx_v_all_arrays != 0);
  if (__pyx_t_6) {

    /* "pandas/_libs/lib.pyx":676
 * 
 *     if all_arrays:
 *         return obj, all_arrays             # <<<<<<<<<<<<<<
 * 
 *     # don't force numpy coerce with nan's
 */
    __Pyx_XDECREF(__pyx_r);
    __pyx_t_4 = __Pyx_PyBool_FromLong(__pyx_v_all_arrays); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 676, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_8 = PyTuple_New(2); if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 676, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_8);
    __Pyx_INCREF(__pyx_v_obj);
    __Pyx_GIVEREF(__pyx_v_obj);
    PyTuple_SET_ITEM(__pyx_t_8, 0, __pyx_v_obj);
    __Pyx_GIVEREF(__pyx_t_4);
    PyTuple_SET_ITEM(__pyx_t_8, 1, __pyx_t_4);
    __pyx_t_4 = 0;
    __pyx_r = __pyx_t_8;
    __pyx_t_8 = 0;
    goto __pyx_L0;

    /* "pandas/_libs/lib.pyx":675
 *             break
 * 
 *     if all_arrays:             # <<<<<<<<<<<<<<
 *         return obj, all_arrays
 * 
 */
  }

  /* "pandas/_libs/lib.pyx":679
 * 
 *     # don't force numpy coerce with nan's
 *     inferred = infer_dtype(obj, skipna=False)             # <<<<<<<<<<<<<<
 *     if inferred in ['string', 'bytes', 'mixed', 'mixed-integer']:
 *         return np.asarray(obj, dtype=object), 0
 */
  __Pyx_GetModuleGlobalName(__pyx_t_8, __pyx_n_s_infer_dtype); if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 679, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_8);
  __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 679, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_INCREF(__pyx_v_obj);
  __Pyx_GIVEREF(__pyx_v_obj);
  PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_v_obj);
  __pyx_t_9 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 679, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_9);
  if (PyDict_SetItem(__pyx_t_9, __pyx_n_s_skipna, Py_False) < 0) __PYX_ERR(0, 679, __pyx_L1_error)
  __pyx_t_10 = __Pyx_PyObject_Call(__pyx_t_8, __pyx_t_4, __pyx_t_9); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 679, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_10);
  __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
  __pyx_v_inferred = __pyx_t_10;
  __pyx_t_10 = 0;

  /* "pandas/_libs/lib.pyx":680
 *     # don't force numpy coerce with nan's
 *     inferred = infer_dtype(obj, skipna=False)
 *     if inferred in ['string', 'bytes', 'mixed', 'mixed-integer']:             # <<<<<<<<<<<<<<
 *         return np.asarray(obj, dtype=object), 0
 *     elif inferred in ['integer']:
 */
  __Pyx_INCREF(__pyx_v_inferred);
  __pyx_t_10 = __pyx_v_inferred;
  __pyx_t_5 = (__Pyx_PyUnicode_Equals(__pyx_t_10, __pyx_n_u_string, Py_EQ)); if (unlikely(__pyx_t_5 < 0)) __PYX_ERR(0, 680, __pyx_L1_error)
  if (!__pyx_t_5) {
  } else {
    __pyx_t_6 = __pyx_t_5;
    goto __pyx_L11_bool_binop_done;
  }
  __pyx_t_5 = (__Pyx_PyUnicode_Equals(__pyx_t_10, __pyx_n_u_bytes, Py_EQ)); if (unlikely(__pyx_t_5 < 0)) __PYX_ERR(0, 680, __pyx_L1_error)
  if (!__pyx_t_5) {
  } else {
    __pyx_t_6 = __pyx_t_5;
    goto __pyx_L11_bool_binop_done;
  }
  __pyx_t_5 = (__Pyx_PyUnicode_Equals(__pyx_t_10, __pyx_n_u_mixed, Py_EQ)); if (unlikely(__pyx_t_5 < 0)) __PYX_ERR(0, 680, __pyx_L1_error)
  if (!__pyx_t_5) {
  } else {
    __pyx_t_6 = __pyx_t_5;
    goto __pyx_L11_bool_binop_done;
  }
  __pyx_t_5 = (__Pyx_PyUnicode_Equals(__pyx_t_10, __pyx_kp_u_mixed_integer, Py_EQ)); if (unlikely(__pyx_t_5 < 0)) __PYX_ERR(0, 680, __pyx_L1_error)
  __pyx_t_6 = __pyx_t_5;
  __pyx_L11_bool_binop_done:;
  __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
  __pyx_t_5 = (__pyx_t_6 != 0);
  if (__pyx_t_5) {

    /* "pandas/_libs/lib.pyx":681
 *     inferred = infer_dtype(obj, skipna=False)
 *     if inferred in ['string', 'bytes', 'mixed', 'mixed-integer']:
 *         return np.asarray(obj, dtype=object), 0             # <<<<<<<<<<<<<<
 *     elif inferred in ['integer']:
 *         # TODO: we infer an integer but it *could* be a uint64
 */
    __Pyx_XDECREF(__pyx_r);
    __Pyx_GetModuleGlobalName(__pyx_t_10, __pyx_n_s_np); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 681, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_10);
    __pyx_t_9 = __Pyx_PyObject_GetAttrStr(__pyx_t_10, __pyx_n_s_asarray); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 681, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_9);
    __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
    __pyx_t_10 = PyTuple_New(1); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 681, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_10);
    __Pyx_INCREF(__pyx_v_obj);
    __Pyx_GIVEREF(__pyx_v_obj);
    PyTuple_SET_ITEM(__pyx_t_10, 0, __pyx_v_obj);
    __pyx_t_4 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 681, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    if (PyDict_SetItem(__pyx_t_4, __pyx_n_s_dtype, __pyx_builtin_object) < 0) __PYX_ERR(0, 681, __pyx_L1_error)
    __pyx_t_8 = __Pyx_PyObject_Call(__pyx_t_9, __pyx_t_10, __pyx_t_4); if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 681, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_8);
    __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
    __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 681, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_GIVEREF(__pyx_t_8);
    PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_8);
    __Pyx_INCREF(__pyx_int_0);
    __Pyx_GIVEREF(__pyx_int_0);
    PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_int_0);
    __pyx_t_8 = 0;
    __pyx_r = __pyx_t_4;
    __pyx_t_4 = 0;
    goto __pyx_L0;

    /* "pandas/_libs/lib.pyx":680
 *     # don't force numpy coerce with nan's
 *     inferred = infer_dtype(obj, skipna=False)
 *     if inferred in ['string', 'bytes', 'mixed', 'mixed-integer']:             # <<<<<<<<<<<<<<
 *         return np.asarray(obj, dtype=object), 0
 *     elif inferred in ['integer']:
 */
  }

  /* "pandas/_libs/lib.pyx":682
 *     if inferred in ['string', 'bytes', 'mixed', 'mixed-integer']:
 *         return np.asarray(obj, dtype=object), 0
 *     elif inferred in ['integer']:             # <<<<<<<<<<<<<<
 *         # TODO: we infer an integer but it *could* be a uint64
 *         try:
 */
  __Pyx_INCREF(__pyx_v_inferred);
  __pyx_t_4 = __pyx_v_inferred;
  __pyx_t_5 = (__Pyx_PyUnicode_Equals(__pyx_t_4, __pyx_n_u_integer, Py_EQ)); if (unlikely(__pyx_t_5 < 0)) __PYX_ERR(0, 682, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __pyx_t_6 = (__pyx_t_5 != 0);
  if (__pyx_t_6) {

    /* "pandas/_libs/lib.pyx":684
 *     elif inferred in ['integer']:
 *         # TODO: we infer an integer but it *could* be a uint64
 *         try:             # <<<<<<<<<<<<<<
 *             print("obj is ", obj)
 *             return np.asarray(obj, dtype='int64'), 0
 */
    {
      __Pyx_PyThreadState_declare
      __Pyx_PyThreadState_assign
      __Pyx_ExceptionSave(&__pyx_t_11, &__pyx_t_12, &__pyx_t_13);
      __Pyx_XGOTREF(__pyx_t_11);
      __Pyx_XGOTREF(__pyx_t_12);
      __Pyx_XGOTREF(__pyx_t_13);
      /*try:*/ {

        /* "pandas/_libs/lib.pyx":685
 *         # TODO: we infer an integer but it *could* be a uint64
 *         try:
 *             print("obj is ", obj)             # <<<<<<<<<<<<<<
 *             return np.asarray(obj, dtype='int64'), 0
 *         except OverflowError:
 */
        __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 685, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_4);
        __Pyx_INCREF(__pyx_kp_u_obj_is);
        __Pyx_GIVEREF(__pyx_kp_u_obj_is);
        PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_kp_u_obj_is);
        __Pyx_INCREF(__pyx_v_obj);
        __Pyx_GIVEREF(__pyx_v_obj);
        PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_v_obj);
        __pyx_t_8 = __Pyx_PyObject_Call(__pyx_builtin_print, __pyx_t_4, NULL); if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 685, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_8);
        __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
        __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;

        /* "pandas/_libs/lib.pyx":686
 *         try:
 *             print("obj is ", obj)
 *             return np.asarray(obj, dtype='int64'), 0             # <<<<<<<<<<<<<<
 *         except OverflowError:
 *             return np.asarray(obj, dtype='object'), 0
 */
        __Pyx_XDECREF(__pyx_r);
        __Pyx_GetModuleGlobalName(__pyx_t_8, __pyx_n_s_np); if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 686, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_8);
        __pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_8, __pyx_n_s_asarray); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 686, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_4);
        __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
        __pyx_t_8 = PyTuple_New(1); if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 686, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_8);
        __Pyx_INCREF(__pyx_v_obj);
        __Pyx_GIVEREF(__pyx_v_obj);
        PyTuple_SET_ITEM(__pyx_t_8, 0, __pyx_v_obj);
        __pyx_t_10 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 686, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_10);
        if (PyDict_SetItem(__pyx_t_10, __pyx_n_s_dtype, __pyx_n_u_int64) < 0) __PYX_ERR(0, 686, __pyx_L15_error)
        __pyx_t_9 = __Pyx_PyObject_Call(__pyx_t_4, __pyx_t_8, __pyx_t_10); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 686, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_9);
        __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
        __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
        __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
        __pyx_t_10 = PyTuple_New(2); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 686, __pyx_L15_error)
        __Pyx_GOTREF(__pyx_t_10);
        __Pyx_GIVEREF(__pyx_t_9);
        PyTuple_SET_ITEM(__pyx_t_10, 0, __pyx_t_9);
        __Pyx_INCREF(__pyx_int_0);
        __Pyx_GIVEREF(__pyx_int_0);
        PyTuple_SET_ITEM(__pyx_t_10, 1, __pyx_int_0);
        __pyx_t_9 = 0;
        __pyx_r = __pyx_t_10;
        __pyx_t_10 = 0;
        goto __pyx_L19_try_return;

        /* "pandas/_libs/lib.pyx":684
 *     elif inferred in ['integer']:
 *         # TODO: we infer an integer but it *could* be a uint64
 *         try:             # <<<<<<<<<<<<<<
 *             print("obj is ", obj)
 *             return np.asarray(obj, dtype='int64'), 0
 */
      }
      __pyx_L15_error:;
      __Pyx_XDECREF(__pyx_t_10); __pyx_t_10 = 0;
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_XDECREF(__pyx_t_8); __pyx_t_8 = 0;
      __Pyx_XDECREF(__pyx_t_9); __pyx_t_9 = 0;

      /* "pandas/_libs/lib.pyx":687
 *             print("obj is ", obj)
 *             return np.asarray(obj, dtype='int64'), 0
 *         except OverflowError:             # <<<<<<<<<<<<<<
 *             return np.asarray(obj, dtype='object'), 0
 * 
 */
      __pyx_t_14 = __Pyx_PyErr_ExceptionMatches(__pyx_builtin_OverflowError);
      if (__pyx_t_14) {
        __Pyx_AddTraceback("pandas._libs.lib.clean_index_list", __pyx_clineno, __pyx_lineno, __pyx_filename);
        if (__Pyx_GetException(&__pyx_t_10, &__pyx_t_9, &__pyx_t_8) < 0) __PYX_ERR(0, 687, __pyx_L17_except_error)
        __Pyx_GOTREF(__pyx_t_10);
        __Pyx_GOTREF(__pyx_t_9);
        __Pyx_GOTREF(__pyx_t_8);

        /* "pandas/_libs/lib.pyx":688
 *             return np.asarray(obj, dtype='int64'), 0
 *         except OverflowError:
 *             return np.asarray(obj, dtype='object'), 0             # <<<<<<<<<<<<<<
 * 
 *     return np.asarray(obj), 0
 */
        __Pyx_XDECREF(__pyx_r);
        __Pyx_GetModuleGlobalName(__pyx_t_4, __pyx_n_s_np); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 688, __pyx_L17_except_error)
        __Pyx_GOTREF(__pyx_t_4);
        __pyx_t_15 = __Pyx_PyObject_GetAttrStr(__pyx_t_4, __pyx_n_s_asarray); if (unlikely(!__pyx_t_15)) __PYX_ERR(0, 688, __pyx_L17_except_error)
        __Pyx_GOTREF(__pyx_t_15);
        __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
        __pyx_t_4 = PyTuple_New(1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 688, __pyx_L17_except_error)
        __Pyx_GOTREF(__pyx_t_4);
        __Pyx_INCREF(__pyx_v_obj);
        __Pyx_GIVEREF(__pyx_v_obj);
        PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_v_obj);
        __pyx_t_16 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_16)) __PYX_ERR(0, 688, __pyx_L17_except_error)
        __Pyx_GOTREF(__pyx_t_16);
        if (PyDict_SetItem(__pyx_t_16, __pyx_n_s_dtype, __pyx_n_u_object) < 0) __PYX_ERR(0, 688, __pyx_L17_except_error)
        __pyx_t_17 = __Pyx_PyObject_Call(__pyx_t_15, __pyx_t_4, __pyx_t_16); if (unlikely(!__pyx_t_17)) __PYX_ERR(0, 688, __pyx_L17_except_error)
        __Pyx_GOTREF(__pyx_t_17);
        __Pyx_DECREF(__pyx_t_15); __pyx_t_15 = 0;
        __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
        __Pyx_DECREF(__pyx_t_16); __pyx_t_16 = 0;
        __pyx_t_16 = PyTuple_New(2); if (unlikely(!__pyx_t_16)) __PYX_ERR(0, 688, __pyx_L17_except_error)
        __Pyx_GOTREF(__pyx_t_16);
        __Pyx_GIVEREF(__pyx_t_17);
        PyTuple_SET_ITEM(__pyx_t_16, 0, __pyx_t_17);
        __Pyx_INCREF(__pyx_int_0);
        __Pyx_GIVEREF(__pyx_int_0);
        PyTuple_SET_ITEM(__pyx_t_16, 1, __pyx_int_0);
        __pyx_t_17 = 0;
        __pyx_r = __pyx_t_16;
        __pyx_t_16 = 0;
        __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
        __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
        goto __pyx_L18_except_return;
      }
      goto __pyx_L17_except_error;
      __pyx_L17_except_error:;

      /* "pandas/_libs/lib.pyx":684
 *     elif inferred in ['integer']:
 *         # TODO: we infer an integer but it *could* be a uint64
 *         try:             # <<<<<<<<<<<<<<
 *             print("obj is ", obj)
 *             return np.asarray(obj, dtype='int64'), 0
 */
      __Pyx_XGIVEREF(__pyx_t_11);
      __Pyx_XGIVEREF(__pyx_t_12);
      __Pyx_XGIVEREF(__pyx_t_13);
      __Pyx_ExceptionReset(__pyx_t_11, __pyx_t_12, __pyx_t_13);
      goto __pyx_L1_error;
      __pyx_L19_try_return:;
      __Pyx_XGIVEREF(__pyx_t_11);
      __Pyx_XGIVEREF(__pyx_t_12);
      __Pyx_XGIVEREF(__pyx_t_13);
      __Pyx_ExceptionReset(__pyx_t_11, __pyx_t_12, __pyx_t_13);
      goto __pyx_L0;
      __pyx_L18_except_return:;
      __Pyx_XGIVEREF(__pyx_t_11);
      __Pyx_XGIVEREF(__pyx_t_12);
      __Pyx_XGIVEREF(__pyx_t_13);
      __Pyx_ExceptionReset(__pyx_t_11, __pyx_t_12, __pyx_t_13);
      goto __pyx_L0;
    }

    /* "pandas/_libs/lib.pyx":682
 *     if inferred in ['string', 'bytes', 'mixed', 'mixed-integer']:
 *         return np.asarray(obj, dtype=object), 0
 *     elif inferred in ['integer']:             # <<<<<<<<<<<<<<
 *         # TODO: we infer an integer but it *could* be a uint64
 *         try:
 */
  }

  /* "pandas/_libs/lib.pyx":690
 *             return np.asarray(obj, dtype='object'), 0
 * 
 *     return np.asarray(obj), 0             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_GetModuleGlobalName(__pyx_t_9, __pyx_n_s_np); if (unlikely(!__pyx_t_9)) __PYX_ERR(0, 690, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_9);
  __pyx_t_10 = __Pyx_PyObject_GetAttrStr(__pyx_t_9, __pyx_n_s_asarray); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 690, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_10);
  __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
  __pyx_t_9 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_10))) {
    __pyx_t_9 = PyMethod_GET_SELF(__pyx_t_10);
    if (likely(__pyx_t_9)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_10);
      __Pyx_INCREF(__pyx_t_9);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_10, function);
    }
  }
  __pyx_t_8 = (__pyx_t_9) ? __Pyx_PyObject_Call2Args(__pyx_t_10, __pyx_t_9, __pyx_v_obj) : __Pyx_PyObject_CallOneArg(__pyx_t_10, __pyx_v_obj);
  __Pyx_XDECREF(__pyx_t_9); __pyx_t_9 = 0;
  if (unlikely(!__pyx_t_8)) __PYX_ERR(0, 690, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_8);
  __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
  __pyx_t_10 = PyTuple_New(2); if (unlikely(!__pyx_t_10)) __PYX_ERR(0, 690, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_10);
  __Pyx_GIVEREF(__pyx_t_8);
  PyTuple_SET_ITEM(__pyx_t_10, 0, __pyx_t_8);
  __Pyx_INCREF(__pyx_int_0);
  __Pyx_GIVEREF(__pyx_int_0);
  PyTuple_SET_ITEM(__pyx_t_10, 1, __pyx_int_0);
  __pyx_t_8 = 0;
  __pyx_r = __pyx_t_10;
  __pyx_t_10 = 0;
  goto __pyx_L0;

  /* "pandas/_libs/lib.pyx":659
 * @cython.wraparound(False)
 * @cython.boundscheck(False)
 * def clean_index_list(obj: list):             # <<<<<<<<<<<<<<
 *     """
 *     Utility used in ``pandas.core.indexes.api.ensure_index``.
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_8);
  __Pyx_XDECREF(__pyx_t_9);
  __Pyx_XDECREF(__pyx_t_10);
  __Pyx_XDECREF(__pyx_t_15);
  __Pyx_XDECREF(__pyx_t_16);
  __Pyx_XDECREF(__pyx_t_17);
  __Pyx_AddTraceback("pandas._libs.lib.clean_index_list", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_val);
  __Pyx_XDECREF(__pyx_v_inferred);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

@seberg
Copy link
Contributor

seberg commented Aug 3, 2020

@WillAyd, thanks. So it does the same thing as a pure NumPy call. What I think is confusing here is that repr([np.uint64(-1)]) prints the same as Python integers, so you do not notice it, but the input must indeed be typed as NumPy uint64 scalar. But, I do not know immediately what that means in the context of this behaviour change.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 3, 2020

Ah ok! Yes I can confirm that obj contains np.uint64 scalars so ignore previous comment about them being Python integers

@WillAyd
Copy link
Member Author

WillAyd commented Aug 3, 2020

With that in mind you can actually reproduce this at the Python level, i.e.

>>> np.asarray([np.uint64(2 ** 63), np.uint64(2**63 + 1)], dtype="int64") 
array([-9223372036854775808, -9223372036854775807])

Doesn't raise an OverflowError any more though it did previously

@seberg
Copy link
Contributor

seberg commented Aug 3, 2020

I am curious whether you could do something in construct_1d_object_array_from_listlike to fix the issue? Please ping me if I forget about this or you do not find a simple solution. As I mentioned, this tries to get things in line with np.array(np.uint64(-1)) and I agree that raising an error in general is nicer, especially for array-coercion, but maybe also for typical casting. So I do have the plan to raise an error always, but in that case adding the error may need to go through a warning first, so it would not help much pandas probably.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 3, 2020

I am curious whether you could do something in construct_1d_object_array_from_listlike to fix the issue?

I think we would like to avoid the coercion to object if at all possible, since that's really just a fallback for detecting numeric data that can't fit into an int64 appropriately

I take it from your comment that there isn't a canonical way with asarray to guard against a wraparound in its current state?

@seberg
Copy link
Contributor

seberg commented Aug 3, 2020

Yeah, there is not currently for NumPy types. It is possible to re-add, just mildly annoying. Basically, the current way to do it (via normal casting), has to be the "default" method (but was not really). But I could add back the old path for these specific cases.

@WillAyd
Copy link
Member Author

WillAyd commented Aug 4, 2020

Yea I'm wondering long term if the goal is for np.array(np.uint64(-1)) to also raise an error then it would be helpful and maybe make the most sense to add back in the old path for these for now

@jbrockmendel
Copy link
Member

Stumbled on a similar set of test failures affecting test_union_base, also tracing back to lib.clean_index_list. One fix (also fixes the xfailed intersection tests) is in ensure_index

        converted, all_arrays = lib.clean_index_list(index_like)

        if len(converted) > 0 and all_arrays:
            from pandas.core.indexes.multi import MultiIndex

            return MultiIndex.from_arrays(converted)
        else:
+            if isinstance(converted, np.ndarray) and converted.dtype == np.int64:
+                # Check for overflows if we should actually be uint64
+                alt = np.asarray(index_like)
+                if alt.dtype == np.uint64:
+                    converted = alt
+
            index_like = converted

Maybe we can blunt the performance impact by limiting the cases in which we need to do this extra check?

@seberg
Copy link
Contributor

seberg commented Sep 17, 2020

I can look again into undoing it in NumPy for now (it should be easy, modifying the tests is annoying).

In the long term, NumPy could always raise the error, although that might take very long and I am tempted to think that a warning rather than an error may be the way to go.
It seems to me like the correct fix may be to dig all the way down into the IntegerValidator or similar code...

In any case, let me try to "fix" this in NumPy master for now. I do think we have to flip the switch at some point, but there is no need to do it in the next year probably.

@seberg
Copy link
Contributor

seberg commented Sep 17, 2020

Arrg, the problem is that there is currently a discrepancy:

np.array(np.uint64(2**63 + 1), dtype="int64")  # works
np.array([np.uint64(2**63 + 1)], dtype="int64")  # raises error

And I can trivially choose one of those behaviour but retaining both is going to be annoying probably.

@jbrockmendel
Copy link
Member

@WillAyd is this actionable/closeable?

@WillAyd
Copy link
Member Author

WillAyd commented Sep 29, 2020

Yea we only xfailed test in #35502 so depends what comes to light from what @seberg has mentioned above. I do think would be nice if this consistently raised an error from the numpy side just not sure overall what inconsistencies that yields

@seberg
Copy link
Contributor

seberg commented Sep 30, 2020

@WillAyd I have a branch (the numpy tests will fail on it), which tries the "both error" solution. From what I can tell this will likely just create another bunch of similar issues, whether those are easier to deal with or not, is a good question.

I have a branch here, if you have the time to test it: https://github.com/seberg/numpy/tree/force-setitem-for-scalar-to-int-assignment

But, I guess at the moment it may well be that the only solution is to do special handling in NumPy and add that small "inconsistency" back explicitly.

In the longe run, I would love to try if we could just make this whole "dtype discovery" (which it basically is as far as I can tell) using new NumPy dtypes. In principle we could already try that, but it is probably a bit much work and requires making API public that isn't public yet.

seberg added a commit to seberg/numpy that referenced this issue Oct 1, 2020
This removes one of the larger changes to array-coercion, which
meant that NumPy scalars were always coerced like a 0-D array
would be (i.e. using normal casting). When the assignment is
explicitly an integer, now `scalar.__int__()` will be used instead
(as was the case previously).
Since previously this was handled differently, a *single* scalar
is still converted using casting:

    np.array(np.float64(np.nan), dtype=np.int64)

succeeds, but any other thing fails, such as:

    np.array([np.float64(np.nan)], dtype=np.int64)
    arr1d_int64[()] = np.float64(np.nan)
    np.array(np.array(np.nan), dtype=np.int64)

This does not affect Python scalars, that always raise, because
they always are converted using `scalar.__int__()`.

Unsigned integers always supported casting from their signed
equivalent, so the difference is much less visible for them and
this chooses to always use the casting behaviour.

The main reason for this change is to help pands:
pandas-dev/pandas#35481
@seberg
Copy link
Contributor

seberg commented Oct 7, 2020

We have merged the hack into NumPy, so hopefully you can remove the xfail from the test when the next "nightly" build is pushed (I think on Saturday).

In the long run, I do want to change this, and I do think it would be good to not rely on the try/except in pandas. But I admit that seems tricky and likely not something useful to address, maybe in particular before the NumPy dtypes have not evolved into public API.

@mroeschke
Copy link
Member

Since the xfails have been removed in #37082, I think this issue can be closed. If there are follow ups needed from the behavior discussion, we can open a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants