Skip to content

Commit 06a1581

Browse files
Backport PR #54913 on branch 2.1.x (REGR: drop_duplicates raising for arrow strings) (#54955)
Backport PR #54913: REGR: drop_duplicates raising for arrow strings Co-authored-by: Patrick Hoefler <[email protected]>
1 parent eac5483 commit 06a1581

File tree

3 files changed

+11
-1
lines changed

3 files changed

+11
-1
lines changed

doc/source/whatsnew/v2.1.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Fixed regressions
1616
- Fixed regression in :func:`merge` when merging over a PyArrow string index (:issue:`54894`)
1717
- Fixed regression in :func:`read_csv` when ``usecols`` is given and ``dtypes`` is a dict for ``engine="python"`` (:issue:`54868`)
1818
- Fixed regression in :meth:`DataFrame.__setitem__` raising ``AssertionError`` when setting a :class:`Series` with a partial :class:`MultiIndex` (:issue:`54875`)
19+
- Fixed regression in :meth:`Series.drop_duplicates` for PyArrow strings (:issue:`54904`)
1920
- Fixed regression in :meth:`Series.value_counts` raising for numeric data if ``bins`` was specified (:issue:`54857`)
2021
- Fixed regression when comparing a :class:`Series` with ``datetime64`` dtype with ``None`` (:issue:`54870`)
2122

pandas/core/algorithms.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1000,7 +1000,7 @@ def duplicated(
10001000
duplicated : ndarray[bool]
10011001
"""
10021002
if hasattr(values, "dtype"):
1003-
if isinstance(values.dtype, ArrowDtype):
1003+
if isinstance(values.dtype, ArrowDtype) and values.dtype.kind in "ifub":
10041004
values = values._to_masked() # type: ignore[union-attr]
10051005

10061006
if isinstance(values.dtype, BaseMaskedDtype):

pandas/tests/series/methods/test_drop_duplicates.py

+9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import numpy as np
22
import pytest
33

4+
import pandas as pd
45
from pandas import (
56
Categorical,
67
Series,
@@ -256,3 +257,11 @@ def test_duplicated_arrow_dtype(self):
256257
result = ser.drop_duplicates()
257258
expected = Series([True, False, None], dtype="bool[pyarrow]")
258259
tm.assert_series_equal(result, expected)
260+
261+
def test_drop_duplicates_arrow_strings(self):
262+
# GH#54904
263+
pa = pytest.importorskip("pyarrow")
264+
ser = Series(["a", "a"], dtype=pd.ArrowDtype(pa.string()))
265+
result = ser.drop_duplicates()
266+
expecetd = Series(["a"], dtype=pd.ArrowDtype(pa.string()))
267+
tm.assert_series_equal(result, expecetd)

0 commit comments

Comments
 (0)