Skip to content

Commit 53243e8

Browse files
authored
REGR: drop_duplicates raising for arrow strings (#54913)
1 parent ac8d34d commit 53243e8

File tree

3 files changed

+11
-1
lines changed

3 files changed

+11
-1
lines changed

doc/source/whatsnew/v2.1.1.rst

+1
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Fixed regressions
1717
- Fixed regression in :func:`read_csv` when ``usecols`` is given and ``dtypes`` is a dict for ``engine="python"`` (:issue:`54868`)
1818
- Fixed regression in :meth:`.GroupBy.get_group` raising for ``axis=1`` (:issue:`54858`)
1919
- Fixed regression in :meth:`DataFrame.__setitem__` raising ``AssertionError`` when setting a :class:`Series` with a partial :class:`MultiIndex` (:issue:`54875`)
20+
- Fixed regression in :meth:`Series.drop_duplicates` for PyArrow strings (:issue:`54904`)
2021
- Fixed regression in :meth:`Series.value_counts` raising for numeric data if ``bins`` was specified (:issue:`54857`)
2122
- Fixed regression when comparing a :class:`Series` with ``datetime64`` dtype with ``None`` (:issue:`54870`)
2223

pandas/core/algorithms.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1000,7 +1000,7 @@ def duplicated(
10001000
duplicated : ndarray[bool]
10011001
"""
10021002
if hasattr(values, "dtype"):
1003-
if isinstance(values.dtype, ArrowDtype):
1003+
if isinstance(values.dtype, ArrowDtype) and values.dtype.kind in "ifub":
10041004
values = values._to_masked() # type: ignore[union-attr]
10051005

10061006
if isinstance(values.dtype, BaseMaskedDtype):

pandas/tests/series/methods/test_drop_duplicates.py

+9
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import numpy as np
22
import pytest
33

4+
import pandas as pd
45
from pandas import (
56
Categorical,
67
Series,
@@ -256,3 +257,11 @@ def test_duplicated_arrow_dtype(self):
256257
result = ser.drop_duplicates()
257258
expected = Series([True, False, None], dtype="bool[pyarrow]")
258259
tm.assert_series_equal(result, expected)
260+
261+
def test_drop_duplicates_arrow_strings(self):
262+
# GH#54904
263+
pa = pytest.importorskip("pyarrow")
264+
ser = Series(["a", "a"], dtype=pd.ArrowDtype(pa.string()))
265+
result = ser.drop_duplicates()
266+
expecetd = Series(["a"], dtype=pd.ArrowDtype(pa.string()))
267+
tm.assert_series_equal(result, expecetd)

0 commit comments

Comments
 (0)