Skip to content

Commit ea64c4a

Browse files
lukemanleyim-vinicius
authored and
im-vinicius
committed
ENH: Series.str.join for ArrowDtype(pa.string()) (pandas-dev#53646)
* Series.str.join to support ArrowDtype(pa.string()) * gh ref
1 parent e92fe43 commit ea64c4a

File tree

3 files changed

+14
-1
lines changed

3 files changed

+14
-1
lines changed

doc/source/whatsnew/v2.1.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ Other enhancements
102102
- :meth:`DataFrame.stack` gained the ``sort`` keyword to dictate whether the resulting :class:`MultiIndex` levels are sorted (:issue:`15105`)
103103
- :meth:`DataFrame.unstack` gained the ``sort`` keyword to dictate whether the resulting :class:`MultiIndex` levels are sorted (:issue:`15105`)
104104
- :meth:`DataFrameGroupby.agg` and :meth:`DataFrameGroupby.transform` now support grouping by multiple keys when the index is not a :class:`MultiIndex` for ``engine="numba"`` (:issue:`53486`)
105+
- :meth:`Series.str.join` now supports ``ArrowDtype(pa.string())`` (:issue:`53646`)
105106
- :meth:`SeriesGroupby.agg` and :meth:`DataFrameGroupby.agg` now support passing in multiple functions for ``engine="numba"`` (:issue:`53486`)
106107
- :meth:`SeriesGroupby.transform` and :meth:`DataFrameGroupby.transform` now support passing in a string as the function for ``engine="numba"`` (:issue:`53579`)
107108
- Added ``engine_kwargs`` parameter to :meth:`DataFrame.to_excel` (:issue:`53220`)

pandas/core/arrays/arrow/array.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -2073,7 +2073,12 @@ def _str_get(self, i: int):
20732073
return type(self)(result)
20742074

20752075
def _str_join(self, sep: str):
2076-
return type(self)(pc.binary_join(self._pa_array, sep))
2076+
if pa.types.is_string(self._pa_array.type):
2077+
result = self._apply_elementwise(list)
2078+
result = pa.chunked_array(result, type=pa.list_(pa.string()))
2079+
else:
2080+
result = self._pa_array
2081+
return type(self)(pc.binary_join(result, sep))
20772082

20782083
def _str_partition(self, sep: str, expand: bool):
20792084
predicate = lambda val: val.partition(sep)

pandas/tests/extension/test_arrow.py

+7
Original file line numberDiff line numberDiff line change
@@ -2026,6 +2026,13 @@ def test_str_join():
20262026
tm.assert_series_equal(result, expected)
20272027

20282028

2029+
def test_str_join_string_type():
2030+
ser = pd.Series(ArrowExtensionArray(pa.array(["abc", "123", None])))
2031+
result = ser.str.join("=")
2032+
expected = pd.Series(["a=b=c", "1=2=3", None], dtype=ArrowDtype(pa.string()))
2033+
tm.assert_series_equal(result, expected)
2034+
2035+
20292036
@pytest.mark.parametrize(
20302037
"start, stop, step, exp",
20312038
[

0 commit comments

Comments
 (0)