Substraction with UInt64 series resulting in negative values gives TypeError #22023

jorisvandenbossche · 2018-07-23T13:39:27Z

With the extension integer array the below operation errors, with numpy uint not:

In [68]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='UInt64')
...
TypeError: cannot safely cast non-equivalent float64 to uint64

In [69]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='uint64')
Out[69]: 
0    0.0
1   -1.0
2   -2.0
dtype: float64

The text was updated successfully, but these errors were encountered:

hhuuggoo · 2018-09-20T20:02:16Z

So the error has changed into

    def __init__(self, values, mask, copy=False):                                                                                                                                                                                                                               
        if not (isinstance(values, np.ndarray)                                                                                                                                                                                                                                  
                and is_integer_dtype(values.dtype)):                                                                                                                                                                                                                            
>           raise TypeError("values should be integer numpy array. Use "                                                                                                                                                                                                        
                            "the 'integer_array' function instead")                                                                                                                                                                                                             
E           TypeError: values should be integer numpy array. Use the 'integer_array' function instead

But that is because of what happens in _maybe_mask_result.

pandas/pandas/core/arrays/integer.py

Line 532 in 0480f4c

def _maybe_mask_result(self, result, mask, other, op_name):

That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?

If we don't want to rely on the dtype of result, then we would have to add operations between uint64 and int64 to the list of cases where we get floats back from numpy

In [10]: (np.array([1], dtype='uint64') - np.array([1], dtype='int64')).dtype                                                                                                                                                                                                   
Out[10]: dtype('float64')                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                
In [11]: (np.array([1], dtype='uint32') - np.array([1], dtype='int64')).dtype                                                                                                                                                                                                   
Out[11]: dtype('int64')

jorisvandenbossche · 2018-09-21T09:44:40Z

That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?

I can't directly think of cases where numpy will return float, but where we want to convert again to an integer dtype. But since it is written that way, there might be cases (@jreback do you remember?)
You can maybe do the change and see if there are tests failing then

hhuuggoo · 2018-09-21T21:30:22Z

if I just change it to return the float array (apply nan mask) whenever the result dtype is a float, all tests pass.

https://github.com/hhuuggoo/pandas/tree/remove_float_logic

https://github.com/hhuuggoo/pandas/blob/remove_float_logic/pandas/core/arrays/integer.py#L532

seems ok - but I don't completely understand what was there to begin with

dsaxton · 2020-09-27T22:19:12Z

Apparently the bug also exists for addition:

[ins] In [1]: import pandas as pd

[ins] In [2]: left = pd.Series([1, 1, 1])

[ins] In [3]: right = pd.Series([1, 2, 3], dtype="UInt64")

[ins] In [4]: left + right
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-ded06e34d5c6> in <module>
----> 1 left + right

~/pandas/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/pandas/pandas/core/ops/__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

~/pandas/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op)
    184     if should_extension_dispatch(lvalues, rvalues) or isinstance(rvalues, Timedelta):
    185         # Timedelta is included because numexpr will fail on it, see GH#31457
--> 186         res_values = op(lvalues, rvalues)
    187 
    188     else:

~/pandas/pandas/core/arrays/integer.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    400 
    401         # for binary ops, use our custom dunder methods
--> 402         result = ops.maybe_dispatch_ufunc_to_dunder_op(
    403             self, ufunc, method, *inputs, **kwargs
    404         )

~/pandas/pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()
     89         else:
     90             name = REVERSED_NAMES.get(op_name, f"__r{op_name}__")
---> 91             result = getattr(self, name, not_implemented)(inputs[0])
     92             return result
     93     else:

~/pandas/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/pandas/pandas/core/arrays/integer.py in integer_arithmetic_method(self, other)
    659                 )
    660 
--> 661             return self._maybe_mask_result(result, mask, other, op_name)
    662 
    663         name = f"__{op.__name__}__"

~/pandas/pandas/core/arrays/integer.py in _maybe_mask_result(self, result, mask, other, op_name)
    588             return result
    589 
--> 590         return type(self)(result, mask, copy=False)
    591 
    592     @classmethod

~/pandas/pandas/core/arrays/integer.py in __init__(self, values, mask, copy)
    359     def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
    360         if not (isinstance(values, np.ndarray) and values.dtype.kind in ["i", "u"]):
--> 361             raise TypeError(
    362                 "values should be integer numpy array. Use "
    363                 "the 'pd.array' function instead"

TypeError: values should be integer numpy array. Use the 'pd.array' function instead

[ins] In [5]: pd.__version__
Out[5]: '1.2.0.dev0+520.gdca6c7f43'

phofl · 2023-01-15T19:12:48Z

This works now, but may need some tests. We get Float64

jorisvandenbossche added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels Jul 23, 2018

mroeschke added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 21, 2021

jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Dec 21, 2021

phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jan 15, 2023

phofl mentioned this issue Jan 15, 2023

TST: Fixed issues that need tests noatamir/pyladies-berlin-sprints#3

Open

17 tasks

luke396 mentioned this issue Jan 16, 2023

TST: Test series add sub with UInt64 #50768

Merged

5 tasks

phofl closed this as completed in #50768 Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substraction with UInt64 series resulting in negative values gives TypeError #22023

Substraction with UInt64 series resulting in negative values gives TypeError #22023

jorisvandenbossche commented Jul 23, 2018

hhuuggoo commented Sep 20, 2018

jorisvandenbossche commented Sep 21, 2018

hhuuggoo commented Sep 21, 2018

dsaxton commented Sep 27, 2020

phofl commented Jan 15, 2023

Substraction with UInt64 series resulting in negative values gives TypeError #22023

Substraction with UInt64 series resulting in negative values gives TypeError #22023

Comments

jorisvandenbossche commented Jul 23, 2018

hhuuggoo commented Sep 20, 2018

jorisvandenbossche commented Sep 21, 2018

hhuuggoo commented Sep 21, 2018

dsaxton commented Sep 27, 2020

phofl commented Jan 15, 2023