Skip to content

BUG: Special-case setting nan into integer series #54527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 28, 2023
1 change: 1 addition & 0 deletions pandas/_libs/lib.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,7 @@ def array_equivalent_object(
right: npt.NDArray[np.object_],
) -> bool: ...
def has_infs(arr: np.ndarray) -> bool: ... # const floating[:]
def has_only_ints_or_nan(arr: np.ndarray) -> bool: ... # const floating[:]
def get_reverse_indexer(
indexer: np.ndarray, # const intp_t[:]
length: int,
Expand Down
16 changes: 16 additions & 0 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -530,6 +530,22 @@ def has_infs(floating[:] arr) -> bool:
return ret


@cython.boundscheck(False)
@cython.wraparound(False)
def has_only_ints_or_nan(floating[:] arr) -> bool:
cdef:
floating val
intp_t i

for i in range(len(arr)):
val = arr[i]
if (val != val) or (val == <int64_t>val):
continue
else:
return False
return True
Comment on lines +535 to +546
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing that this doesn't yet do, is checking for the proper range for lower bitwidth integers.

For example setting [1000.0, np.nan] into a int8 Series should still raise the warning because 1000 is too big, but this helper function will now say it are only integers or NaN.



def maybe_indices_to_slice(ndarray[intp_t, ndim=1] indices, int max_len):
cdef:
Py_ssize_t i, n = len(indices)
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
from pandas.core.dtypes.common import (
ensure_platform_int,
is_1d_only_ea_dtype,
is_integer_dtype,
is_list_like,
is_string_dtype,
)
Expand Down Expand Up @@ -453,6 +454,10 @@ def coerce_to_target_dtype(self, other, warn_on_upcast: bool = False) -> Block:
we can also safely try to coerce to the same dtype
and will receive the same block
"""
if isna(other) and is_integer_dtype(self.values.dtype):
# In a future version of pandas, the default will be that
# setting `nan` into an integer series won't raise.
warn_on_upcast = False
new_dtype = find_result_type(self.values.dtype, other)
if warn_on_upcast:
warnings.warn(
Expand Down