Skip to content

Commit 47ae3f0

Browse files
authored
BUG: drop_duplicates raising for boolean arrow dtype with missing values (#54670)
* BUG: drop_duplicates raising for boolean arrow dtype with missing values * Fix typing * Update test_drop_duplicates.py
1 parent c233649 commit 47ae3f0

File tree

3 files changed

+17
-3
lines changed

3 files changed

+17
-3
lines changed

doc/source/whatsnew/v2.1.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -627,6 +627,7 @@ Performance improvements
627627
- Performance improvement in :meth:`DataFrame.transpose` when transposing a DataFrame with a single masked dtype, e.g. :class:`Int64` (:issue:`52836`)
628628
- Performance improvement in :meth:`Series.add` for PyArrow string and binary dtypes (:issue:`53150`)
629629
- Performance improvement in :meth:`Series.corr` and :meth:`Series.cov` for extension dtypes (:issue:`52502`)
630+
- Performance improvement in :meth:`Series.drop_duplicates` for ``ArrowDtype`` (:issue:`54667`).
630631
- Performance improvement in :meth:`Series.ffill`, :meth:`Series.bfill`, :meth:`DataFrame.ffill`, :meth:`DataFrame.bfill` with PyArrow dtypes (:issue:`53950`)
631632
- Performance improvement in :meth:`Series.str.get_dummies` for PyArrow-backed strings (:issue:`53655`)
632633
- Performance improvement in :meth:`Series.str.get` for PyArrow-backed strings (:issue:`53152`)
@@ -831,6 +832,7 @@ ExtensionArray
831832
- Bug in :class:`~arrays.ArrowExtensionArray` converting pandas non-nanosecond temporal objects from non-zero values to zero values (:issue:`53171`)
832833
- Bug in :meth:`Series.quantile` for PyArrow temporal types raising ``ArrowInvalid`` (:issue:`52678`)
833834
- Bug in :meth:`Series.rank` returning wrong order for small values with ``Float64`` dtype (:issue:`52471`)
835+
- Bug in :meth:`Series.unique` for boolean ``ArrowDtype`` with ``NA`` values (:issue:`54667`)
834836
- Bug in :meth:`~arrays.ArrowExtensionArray.__iter__` and :meth:`~arrays.ArrowExtensionArray.__getitem__` returning python datetime and timedelta objects for non-nano dtypes (:issue:`53326`)
835837
- Bug where the :class:`DataFrame` repr would not work when a column had an :class:`ArrowDtype` with a ``pyarrow.ExtensionDtype`` (:issue:`54063`)
836838
- Bug where the ``__from_arrow__`` method of masked ExtensionDtypes (e.g. :class:`Float64Dtype`, :class:`BooleanDtype`) would not accept PyArrow arrays of type ``pyarrow.null()`` (:issue:`52223`)

pandas/core/algorithms.py

+8-3
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@
5555
)
5656
from pandas.core.dtypes.concat import concat_compat
5757
from pandas.core.dtypes.dtypes import (
58+
ArrowDtype,
5859
BaseMaskedDtype,
5960
CategoricalDtype,
6061
ExtensionDtype,
@@ -996,9 +997,13 @@ def duplicated(
996997
-------
997998
duplicated : ndarray[bool]
998999
"""
999-
if hasattr(values, "dtype") and isinstance(values.dtype, BaseMaskedDtype):
1000-
values = cast("BaseMaskedArray", values)
1001-
return htable.duplicated(values._data, keep=keep, mask=values._mask)
1000+
if hasattr(values, "dtype"):
1001+
if isinstance(values.dtype, ArrowDtype):
1002+
values = values._to_masked() # type: ignore[union-attr]
1003+
1004+
if isinstance(values.dtype, BaseMaskedDtype):
1005+
values = cast("BaseMaskedArray", values)
1006+
return htable.duplicated(values._data, keep=keep, mask=values._mask)
10021007

10031008
values = _ensure_data(values)
10041009
return htable.duplicated(values, keep=keep)

pandas/tests/series/methods/test_drop_duplicates.py

+7
Original file line numberDiff line numberDiff line change
@@ -249,3 +249,10 @@ def test_drop_duplicates_ignore_index(self):
249249
result = ser.drop_duplicates(ignore_index=True)
250250
expected = Series([1, 2, 3])
251251
tm.assert_series_equal(result, expected)
252+
253+
def test_duplicated_arrow_dtype(self):
254+
pytest.importorskip("pyarrow")
255+
ser = Series([True, False, None, False], dtype="bool[pyarrow]")
256+
result = ser.drop_duplicates()
257+
expected = Series([True, False, None], dtype="bool[pyarrow]")
258+
tm.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)