Skip to content

Commit cc2f03a

Browse files
Backport PR #55534 on branch 2.1.x (BUG: Ensure "string[pyarrow]" type is preserved when calling extractall) (#55597)
Backport PR #55534: BUG: Ensure "string[pyarrow]" type is preserved when calling extractall Co-authored-by: Amanda Bizzinotto <[email protected]>
1 parent 58d3f1b commit cc2f03a

File tree

3 files changed

+14
-3
lines changed

3 files changed

+14
-3
lines changed

doc/source/whatsnew/v2.1.2.rst

+1
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Bug fixes
3131
- Fixed bug in :meth:`Series.all` and :meth:`Series.any` not treating missing values correctly for ``dtype="string[pyarrow_numpy]"`` (:issue:`55367`)
3232
- Fixed bug in :meth:`Series.floordiv` for :class:`ArrowDtype` (:issue:`55561`)
3333
- Fixed bug in :meth:`Series.rank` for ``string[pyarrow_numpy]`` dtype (:issue:`55362`)
34+
- Fixed bug in :meth:`Series.str.extractall` for :class:`ArrowDtype` dtype being converted to object (:issue:`53846`)
3435
- Silence ``Period[B]`` warnings introduced by :issue:`53446` during normal plotting activity (:issue:`55138`)
3536

3637
.. ---------------------------------------------------------------------------

pandas/core/strings/accessor.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -3449,10 +3449,9 @@ def _result_dtype(arr):
34493449
# when the list of values is empty.
34503450
from pandas.core.arrays.string_ import StringDtype
34513451

3452-
if isinstance(arr.dtype, StringDtype):
3452+
if isinstance(arr.dtype, (ArrowDtype, StringDtype)):
34533453
return arr.dtype
3454-
else:
3455-
return object
3454+
return object
34563455

34573456

34583457
def _get_single_group_name(regex: re.Pattern) -> Hashable:

pandas/tests/strings/test_extract.py

+11
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
import numpy as np
55
import pytest
66

7+
from pandas.core.dtypes.dtypes import ArrowDtype
8+
79
from pandas import (
810
DataFrame,
911
Index,
@@ -706,3 +708,12 @@ def test_extractall_same_as_extract_subject_index(any_string_dtype):
706708
has_match_index = s.str.extractall(pattern_one_noname)
707709
no_match_index = has_match_index.xs(0, level="match")
708710
tm.assert_frame_equal(extract_one_noname, no_match_index)
711+
712+
713+
def test_extractall_preserves_dtype():
714+
# Ensure that when extractall is called on a series with specific dtypes set, that
715+
# the dtype is preserved in the resulting DataFrame's column.
716+
pa = pytest.importorskip("pyarrow")
717+
718+
result = Series(["abc", "ab"], dtype=ArrowDtype(pa.string())).str.extractall("(ab)")
719+
assert result.dtypes[0] == "string[pyarrow]"

0 commit comments

Comments
 (0)