Skip to content

TST: Isin converted floats unnecessarily to int causing rounding issues #37770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 29, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -557,6 +557,7 @@ Reshaping
- Bug in :meth:`DataFrame.combine_first()` caused wrong alignment with dtype ``string`` and one level of ``MultiIndex`` containing only ``NA`` (:issue:`37591`)
- Fixed regression in :func:`merge` on merging DatetimeIndex with empty DataFrame (:issue:`36895`)
- Bug in :meth:`DataFrame.apply` not setting index of return value when ``func`` return type is ``dict`` (:issue:`37544`)
- Bug in :meth:`Series.isin` cast ``float`` unnecessarily to ``int`` when :class:`Series` to look in was from dtype ``int`` (:issue:`19356`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"cast float unnecessarily" -> "unnecessarily casting float dtypes"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably needs to move to 1.3.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense removing the whatsnew entirely, since your fix went into 1.2. Would be confusing having this in 1.3


Sparse
^^^^^^
Expand Down
12 changes: 10 additions & 2 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
construct_1d_object_array_from_listlike,
infer_dtype_from_array,
maybe_promote,
maybe_upcast,
)
from pandas.core.dtypes.common import (
ensure_float64,
Expand Down Expand Up @@ -431,6 +432,13 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray:
return cast("Categorical", comps).isin(values)

comps, dtype = _ensure_data(comps)
if is_numeric_dtype(comps):
try:
# Try finding a dtype which would not change our values
values, _ = maybe_upcast(values, dtype=dtype)
dtype = values.dtype
except (ValueError, TypeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what cases raise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeError when values is an array with pd.NaT (test_isin_nan_common_float64 in pandas/tests/indexes/test_base.py)
ValueError if values contains strings, which can't be converted to int (test_isin_level_kwarg in same file for example)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of this, can we not be explict here on the dtype checks (as we are elsewhere).?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use maybe_convert_numeric``` but would have to keep a try except`` block too. Is there a function I am not aware of to check, if we can convert to numeric from object without try except?

pass
values, _ = _ensure_data(values, dtype=dtype)

# faster for larger cases to use np.in1d
Expand All @@ -445,7 +453,7 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray:
f = lambda c, v: np.logical_or(np.in1d(c, v), np.isnan(c))
else:
f = np.in1d
elif is_integer_dtype(comps):
elif is_integer_dtype(comps) and is_integer_dtype(values):
try:
values = values.astype("int64", copy=False)
comps = comps.astype("int64", copy=False)
Expand All @@ -454,7 +462,7 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray:
values = values.astype(object)
comps = comps.astype(object)

elif is_float_dtype(comps):
elif is_numeric_dtype(comps) or is_numeric_dtype(values):
try:
values = values.astype("float64", copy=False)
comps = comps.astype("float64", copy=False)
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/series/methods/test_isin.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,14 @@ def test_isin_read_only(self):
expected = Series([True, True, True])
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("values", [[-9.0, 0.0], [-9, 0]])
def test_isin_float_in_int_series(self, values):
# GH: 19356
ser = Series(values)
result = ser.isin([-9, -0.5])
expected = Series([True, False])
tm.assert_series_equal(result, expected)


@pytest.mark.slow
def test_isin_large_series_mixed_dtypes_and_nan():
Expand Down