Skip to content

BUG: isin casting to float64 for unsigned int and list #46693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,7 @@ Indexing
- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised instead of casting to a common dtype (:issue:`45070`)
- Bug in :meth:`Series.__setitem__` when setting incompatible values into a ``PeriodDtype`` or ``IntervalDtype`` :class:`Series` raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with :meth:`Series.mask` and :meth:`Series.where` (:issue:`45768`)
- Bug in :meth:`DataFrame.where` with multiple columns with datetime-like dtypes failing to downcast results consistent with other dtypes (:issue:`45837`)
- Bug in :func:`isin` upcasting to ``float64`` with unsigned integer dtype and list-like argument without a dtype (:issue:`46485`)
- Bug in :meth:`Series.loc.__setitem__` and :meth:`Series.loc.__getitem__` not raising when using multiple keys without using a :class:`MultiIndex` (:issue:`13831`)
- Bug in :meth:`Index.reindex` raising ``AssertionError`` when ``level`` was specified but no :class:`MultiIndex` was given; level is ignored now (:issue:`35132`)
- Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`)
Expand Down
8 changes: 7 additions & 1 deletion pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
is_numeric_dtype,
is_object_dtype,
is_scalar,
is_signed_integer_dtype,
is_timedelta64_dtype,
needs_i8_conversion,
)
Expand Down Expand Up @@ -446,7 +447,12 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> npt.NDArray[np.bool_]:
)

if not isinstance(values, (ABCIndex, ABCSeries, ABCExtensionArray, np.ndarray)):
values = _ensure_arraylike(list(values))
if not is_signed_integer_dtype(comps):
# GH#46485 Use object to avoid upcast to float64 later
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the problem is that this comes back with int64, then np.find_common_type for int64+uint64 is float64, which is lossy for big integers.

we face a similar problem elsewhere, and ideally id like to re-use some of the logic we use for those. the place that comes to mind is in Index._find_common_type_compat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should move the logic from _find_common_type_compat into a helper function we can call in both places?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it can be done gracefully thatd be nice. otherwise i guess a TODO to look into sharing sooner or later

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing this is not straightforward right now, so would add a todo and try to refactor later

# TODO: Share with _find_common_type_compat
values = construct_1d_object_array_from_listlike(list(values))
else:
values = _ensure_arraylike(list(values))
elif isinstance(values, ABCMultiIndex):
# Avoid raising in extract_array
values = np.array(values)
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/test_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -1089,6 +1089,13 @@ def test_isin_float_df_string_search(self):
expected_false = DataFrame({"values": [False, False]})
tm.assert_frame_equal(result, expected_false)

def test_isin_unsigned_dtype(self):
# GH#46485
ser = Series([1378774140726870442], dtype=np.uint64)
result = ser.isin([1378774140726870528])
expected = Series(False)
tm.assert_series_equal(result, expected)


class TestValueCounts:
def test_value_counts(self):
Expand Down