Skip to content

BUG: Prevent Series.isin from unwantedly casting isin values from float to integer (GH21804) #37861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,7 @@ Numeric
- Bug in :class:`DataFrame` allowing arithmetic operations with list of array-likes with undefined results. Behavior changed to raising ``ValueError`` (:issue:`36702`)
- Bug in :meth:`DataFrame.std`` with ``timedelta64`` dtype and ``skipna=False`` (:issue:`37392`)
- Bug in :meth:`DataFrame.min` and :meth:`DataFrame.max` with ``datetime64`` dtype and ``skipna=False`` (:issue:`36907`)
- Bug in :meth:`Series.isin` that unwantedly casts isin values from ``float`` to ``integer`` (:issue:`21804`)

Conversion
^^^^^^^^^^
Expand Down
19 changes: 16 additions & 3 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@

from pandas.core.construction import array, extract_array
from pandas.core.indexers import validate_indices
from pandas.core.tools.numeric import to_numeric

if TYPE_CHECKING:
from pandas import Categorical, DataFrame, Series
Expand Down Expand Up @@ -431,6 +432,15 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray:
return cast("Categorical", comps).isin(values)

comps, dtype = _ensure_data(comps)

if is_numeric_dtype(dtype):
# Try downcasting values if comps is numeric to prevent precision
# loss resulting from casting values to comps its exact dtype
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what ensure_data does, and we really don't want to use try/excepts in this routine, everything should be via is_*

values = to_numeric(values, downcast="integer")
dtype = None
except (TypeError, ValueError):
pass
values, _ = _ensure_data(values, dtype=dtype)

# faster for larger cases to use np.in1d
Expand All @@ -445,16 +455,19 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray:
f = lambda c, v: np.logical_or(np.in1d(c, v), np.isnan(c))
else:
f = np.in1d
elif is_integer_dtype(comps):
elif (
is_integer_dtype(comps)
or is_integer_dtype(values)
and not (is_float_dtype(comps) or is_float_dtype(values))
):
try:
values = values.astype("int64", copy=False)
comps = comps.astype("int64", copy=False)
f = htable.ismember_int64
except (TypeError, ValueError, OverflowError):
values = values.astype(object)
comps = comps.astype(object)

elif is_float_dtype(comps):
elif is_float_dtype(comps) or is_float_dtype(values):
try:
values = values.astype("float64", copy=False)
comps = comps.astype("float64", copy=False)
Expand Down
18 changes: 12 additions & 6 deletions pandas/tests/test_algos.py
Original file line number Diff line number Diff line change
Expand Up @@ -942,13 +942,19 @@ def test_different_nans(self):
)
tm.assert_numpy_array_equal(np.array([True]), result)

def test_no_cast(self):
# GH 22160
# ensure 42 is not casted to a string
comps = ["ss", 42]
values = ["42"]
expected = np.array([False, False])
@pytest.mark.parametrize(
"comps, values, expected_values",
[
(["ss", 42], ["42"], [False, False]),
([1, 0], [1, 0.5], [True, False]),
],
)
def test_no_cast(self, comps, values, expected_values):
# GH 22160 ensure 42 is not casted to a string
# GH21804 Prevent Series.isin from unwantedly casting isin values
# from float to integer
result = algos.isin(comps, values)
expected = np.array(expected_values)
tm.assert_numpy_array_equal(expected, result)

@pytest.mark.parametrize("empty", [[], Series(dtype=object), np.array([])])
Expand Down