Skip to content

Commit b63d35d

Browse files
authored
DOC: Clarify difference between StringDtype(pyarrow) and ArrowDtype(string) (#52184)
1 parent 246a9dd commit b63d35d

File tree

2 files changed

+23
-4
lines changed

2 files changed

+23
-4
lines changed

doc/source/reference/arrays.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,10 @@ PyArrow type pandas extension type NumPy
9393

9494
.. note::
9595

96-
For string types (``pyarrow.string()``, ``string[pyarrow]``), PyArrow support is still facilitated
97-
by :class:`arrays.ArrowStringArray` and ``StringDtype("pyarrow")``. See the :ref:`string section <api.arrays.string>`
98-
below.
96+
Pyarrow-backed string support is provided by both ``pd.StringDtype("pyarrow")`` and ``pd.ArrowDtype(pa.string())``.
97+
``pd.StringDtype("pyarrow")`` is described below in the :ref:`string section <api.arrays.string>`
98+
and will be returned if the string alias ``"string[pyarrow]"`` is specified. ``pd.ArrowDtype(pa.string())``
99+
generally has better interoperability with :class:`ArrowDtype` of different types.
99100

100101
While individual values in an :class:`arrays.ArrowExtensionArray` are stored as a PyArrow objects, scalars are **returned**
101102
as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or :class:`NA` for missing

doc/source/user_guide/pyarrow.rst

+19-1
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,23 @@ which is similar to a NumPy array. To construct these from the main pandas data
3535
df = pd.DataFrame([[1, 2], [3, 4]], dtype="uint64[pyarrow]")
3636
df
3737
38+
.. note::
39+
40+
The string alias ``"string[pyarrow]"`` maps to ``pd.StringDtype("pyarrow")`` which is not equivalent to
41+
specifying ``dtype=pd.ArrowDtype(pa.string())``. Generally, operations on the data will behave similarly
42+
except ``pd.StringDtype("pyarrow")`` can return NumPy-backed nullable types while ``pd.ArrowDtype(pa.string())``
43+
will return :class:`ArrowDtype`.
44+
45+
.. ipython:: python
46+
47+
import pyarrow as pa
48+
data = list("abc")
49+
ser_sd = pd.Series(data, dtype="string[pyarrow]")
50+
ser_ad = pd.Series(data, dtype=pd.ArrowDtype(pa.string()))
51+
ser_ad.dtype == ser_sd.dtype
52+
ser_sd.str.contains("a")
53+
ser_ad.str.contains("a")
54+
3855
For PyArrow types that accept parameters, you can pass in a PyArrow type with those parameters
3956
into :class:`ArrowDtype` to use in the ``dtype`` parameter.
4057

@@ -106,6 +123,7 @@ The following are just some examples of operations that are accelerated by nativ
106123

107124
.. ipython:: python
108125
126+
import pyarrow as pa
109127
ser = pd.Series([-1.545, 0.211, None], dtype="float32[pyarrow]")
110128
ser.mean()
111129
ser + ser
@@ -115,7 +133,7 @@ The following are just some examples of operations that are accelerated by nativ
115133
ser.isna()
116134
ser.fillna(0)
117135
118-
ser_str = pd.Series(["a", "b", None], dtype="string[pyarrow]")
136+
ser_str = pd.Series(["a", "b", None], dtype=pd.ArrowDtype(pa.string()))
119137
ser_str.str.startswith("a")
120138
121139
from datetime import datetime

0 commit comments

Comments
 (0)