Skip to content

Substraction with UInt64 series resulting in negative values gives TypeError #22023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #3
jorisvandenbossche opened this issue Jul 23, 2018 · 5 comments · Fixed by #50768
Closed
Tracked by #3
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jorisvandenbossche
Copy link
Member

With the extension integer array the below operation errors, with numpy uint not:

In [68]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='UInt64')
...
TypeError: cannot safely cast non-equivalent float64 to uint64

In [69]: pd.Series([1, 1, 1]) - pd.Series([1, 2, 3], dtype='uint64')
Out[69]: 
0    0.0
1   -1.0
2   -2.0
dtype: float64
@jorisvandenbossche jorisvandenbossche added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels Jul 23, 2018
@hhuuggoo
Copy link

So the error has changed into

    def __init__(self, values, mask, copy=False):                                                                                                                                                                                                                               
        if not (isinstance(values, np.ndarray)                                                                                                                                                                                                                                  
                and is_integer_dtype(values.dtype)):                                                                                                                                                                                                                            
>           raise TypeError("values should be integer numpy array. Use "                                                                                                                                                                                                        
                            "the 'integer_array' function instead")                                                                                                                                                                                                             
E           TypeError: values should be integer numpy array. Use the 'integer_array' function instead                                                                                                                                                                           

But that is because of what happens in _maybe_mask_result.

def _maybe_mask_result(self, result, mask, other, op_name):

That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?

If we don't want to rely on the dtype of result, then we would have to add operations between uint64 and int64 to the list of cases where we get floats back from numpy

In [10]: (np.array([1], dtype='uint64') - np.array([1], dtype='int64')).dtype                                                                                                                                                                                                   
Out[10]: dtype('float64')                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                
In [11]: (np.array([1], dtype='uint32') - np.array([1], dtype='int64')).dtype                                                                                                                                                                                                   
Out[11]: dtype('int64')      

@jorisvandenbossche
Copy link
Member Author

That code has some logic for detecting when it should be outputting floats. Does anyone know why we do that instead of just checking the dtype of result?

I can't directly think of cases where numpy will return float, but where we want to convert again to an integer dtype. But since it is written that way, there might be cases (@jreback do you remember?)
You can maybe do the change and see if there are tests failing then

@hhuuggoo
Copy link

if I just change it to return the float array (apply nan mask) whenever the result dtype is a float, all tests pass.

https://github.com/hhuuggoo/pandas/tree/remove_float_logic

https://github.com/hhuuggoo/pandas/blob/remove_float_logic/pandas/core/arrays/integer.py#L532

seems ok - but I don't completely understand what was there to begin with

@dsaxton
Copy link
Member

dsaxton commented Sep 27, 2020

Apparently the bug also exists for addition:

[ins] In [1]: import pandas as pd

[ins] In [2]: left = pd.Series([1, 1, 1])

[ins] In [3]: right = pd.Series([1, 2, 3], dtype="UInt64")

[ins] In [4]: left + right
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-ded06e34d5c6> in <module>
----> 1 left + right

~/pandas/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/pandas/pandas/core/ops/__init__.py in wrapper(left, right)
    341         lvalues = extract_array(left, extract_numpy=True)
    342         rvalues = extract_array(right, extract_numpy=True)
--> 343         result = arithmetic_op(lvalues, rvalues, op)
    344 
    345         return left._construct_result(result, name=res_name)

~/pandas/pandas/core/ops/array_ops.py in arithmetic_op(left, right, op)
    184     if should_extension_dispatch(lvalues, rvalues) or isinstance(rvalues, Timedelta):
    185         # Timedelta is included because numexpr will fail on it, see GH#31457
--> 186         res_values = op(lvalues, rvalues)
    187 
    188     else:

~/pandas/pandas/core/arrays/integer.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
    400 
    401         # for binary ops, use our custom dunder methods
--> 402         result = ops.maybe_dispatch_ufunc_to_dunder_op(
    403             self, ufunc, method, *inputs, **kwargs
    404         )

~/pandas/pandas/_libs/ops_dispatch.pyx in pandas._libs.ops_dispatch.maybe_dispatch_ufunc_to_dunder_op()
     89         else:
     90             name = REVERSED_NAMES.get(op_name, f"__r{op_name}__")
---> 91             result = getattr(self, name, not_implemented)(inputs[0])
     92             return result
     93     else:

~/pandas/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/pandas/pandas/core/arrays/integer.py in integer_arithmetic_method(self, other)
    659                 )
    660 
--> 661             return self._maybe_mask_result(result, mask, other, op_name)
    662 
    663         name = f"__{op.__name__}__"

~/pandas/pandas/core/arrays/integer.py in _maybe_mask_result(self, result, mask, other, op_name)
    588             return result
    589 
--> 590         return type(self)(result, mask, copy=False)
    591 
    592     @classmethod

~/pandas/pandas/core/arrays/integer.py in __init__(self, values, mask, copy)
    359     def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
    360         if not (isinstance(values, np.ndarray) and values.dtype.kind in ["i", "u"]):
--> 361             raise TypeError(
    362                 "values should be integer numpy array. Use "
    363                 "the 'pd.array' function instead"

TypeError: values should be integer numpy array. Use the 'pd.array' function instead

[ins] In [5]: pd.__version__
Out[5]: '1.2.0.dev0+520.gdca6c7f43'

@mroeschke mroeschke added the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 21, 2021
@jbrockmendel jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Dec 21, 2021
@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Jan 15, 2023
@phofl
Copy link
Member

phofl commented Jan 15, 2023

This works now, but may need some tests. We get Float64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. good first issue NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants