Skip to content

Commit d555c1d

Browse files
authored
PERF: ArrowExtensionArray comparison methods (#50524)
* PERF: ArrowExtensionArray comparison methods * gh refs
1 parent 2c272e2 commit d555c1d

File tree

2 files changed

+9
-2
lines changed

2 files changed

+9
-2
lines changed

doc/source/whatsnew/v2.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -769,6 +769,7 @@ Performance improvements
769769
- Performance improvement for :class:`~arrays.StringArray` constructor passing a numpy array with type ``np.str_`` (:issue:`49109`)
770770
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.factorize` (:issue:`49177`)
771771
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.__setitem__` when key is a null slice (:issue:`50248`)
772+
- Performance improvement in :class:`~arrays.ArrowExtensionArray` comparison methods when array contains NA (:issue:`50524`)
772773
- Performance improvement in :meth:`~arrays.ArrowExtensionArray.to_numpy` (:issue:`49973`)
773774
- Performance improvement in :meth:`DataFrame.join` when joining on a subset of a :class:`MultiIndex` (:issue:`48611`)
774775
- Performance improvement for :meth:`MultiIndex.intersection` (:issue:`48604`)

pandas/core/arrays/arrow/array.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -406,8 +406,14 @@ def _cmp_method(self, other, op):
406406
f"{op.__name__} not implemented for {type(other)}"
407407
)
408408

409-
result = result.to_numpy()
410-
return BooleanArray._from_sequence(result)
409+
if result.null_count > 0:
410+
# GH50524: avoid conversion to object for better perf
411+
values = pc.fill_null(result, False).to_numpy()
412+
mask = result.is_null().to_numpy()
413+
else:
414+
values = result.to_numpy()
415+
mask = np.zeros(len(values), dtype=np.bool_)
416+
return BooleanArray(values, mask)
411417

412418
def _evaluate_op_method(self, other, op, arrow_funcs):
413419
pc_func = arrow_funcs[op.__name__]

0 commit comments

Comments
 (0)