Skip to content

Commit 1699890

Browse files
lukemanleyim-vinicius
authored and
im-vinicius
committed
BUG/PERF: DataFrame.isin lossy data conversion (pandas-dev#53514)
* BUG/PERF: DataFrame.isin lossy data conversion * gh refs
1 parent 709193b commit 1699890

File tree

3 files changed

+24
-8
lines changed

3 files changed

+24
-8
lines changed

doc/source/whatsnew/v2.1.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,7 @@ Performance improvements
301301
- Performance improvement in :class:`Series` reductions (:issue:`52341`)
302302
- Performance improvement in :func:`concat` when ``axis=1`` and objects have different indexes (:issue:`52541`)
303303
- Performance improvement in :meth:`.DataFrameGroupBy.groups` (:issue:`53088`)
304+
- Performance improvement in :meth:`DataFrame.isin` for extension dtypes (:issue:`53514`)
304305
- Performance improvement in :meth:`DataFrame.loc` when selecting rows and columns (:issue:`53014`)
305306
- Performance improvement in :meth:`Series.add` for pyarrow string and binary dtypes (:issue:`53150`)
306307
- Performance improvement in :meth:`Series.corr` and :meth:`Series.cov` for extension dtypes (:issue:`52502`)
@@ -310,6 +311,7 @@ Performance improvements
310311
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.to_numpy` (:issue:`52525`)
311312
- Performance improvement when doing various reshaping operations on :class:`arrays.IntegerArrays` & :class:`arrays.FloatingArray` by avoiding doing unnecessary validation (:issue:`53013`)
312313
- Performance improvement when indexing with pyarrow timestamp and duration dtypes (:issue:`53368`)
314+
-
313315

314316
.. ---------------------------------------------------------------------------
315317
.. _whatsnew_210.bug_fixes:

pandas/core/frame.py

+14-8
Original file line numberDiff line numberDiff line change
@@ -11731,15 +11731,21 @@ def isin(self, values: Series | DataFrame | Sequence | Mapping) -> DataFrame:
1173111731
"to be passed to DataFrame.isin(), "
1173211732
f"you passed a '{type(values).__name__}'"
1173311733
)
11734-
# error: Argument 2 to "isin" has incompatible type "Union[Sequence[Any],
11735-
# Mapping[Any, Any]]"; expected "Union[Union[ExtensionArray,
11736-
# ndarray[Any, Any]], Index, Series]"
11737-
res_values = algorithms.isin(
11738-
self.values.ravel(),
11739-
values, # type: ignore[arg-type]
11740-
)
11734+
11735+
def isin_(x):
11736+
# error: Argument 2 to "isin" has incompatible type "Union[Series,
11737+
# DataFrame, Sequence[Any], Mapping[Any, Any]]"; expected
11738+
# "Union[Union[Union[ExtensionArray, ndarray[Any, Any]], Index,
11739+
# Series], List[Any], range]"
11740+
result = algorithms.isin(
11741+
x.ravel(),
11742+
values, # type: ignore[arg-type]
11743+
)
11744+
return result.reshape(x.shape)
11745+
11746+
res_values = self._mgr.apply(isin_)
1174111747
result = self._constructor(
11742-
res_values.reshape(self.shape),
11748+
res_values,
1174311749
self.index,
1174411750
self.columns,
1174511751
copy=False,

pandas/tests/frame/methods/test_isin.py

+8
Original file line numberDiff line numberDiff line change
@@ -217,3 +217,11 @@ def test_isin_read_only(self):
217217
result = df.isin(arr)
218218
expected = DataFrame([True, True, True])
219219
tm.assert_frame_equal(result, expected)
220+
221+
def test_isin_not_lossy(self):
222+
# GH 53514
223+
val = 1666880195890293744
224+
df = DataFrame({"a": [val], "b": [1.0]})
225+
result = df.isin([val])
226+
expected = DataFrame({"a": [True], "b": [False]})
227+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)