diff --git a/doc/source/whatsnew/v1.3.0.rst b/doc/source/whatsnew/v1.3.0.rst index f10b8f602ea56..0680e9d360c8f 100644 --- a/doc/source/whatsnew/v1.3.0.rst +++ b/doc/source/whatsnew/v1.3.0.rst @@ -134,6 +134,56 @@ a copy will no longer be made (:issue:`32960`) The default behavior when not passing ``copy`` will remain unchanged, i.e. a copy will be made. +.. _whatsnew_130.arrow_string: + +PyArrow backed string data type +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +We've enhanced the :class:`StringDtype`, an extension type dedicated to string data. +(:issue:`39908`) + +It is now possible to specify a ``storage`` keyword option to :class:`StringDtype`, use +pandas options or specify the dtype using ``dtype='string[pyarrow]'`` + +.. warning:: + + ``string[pyarrow]`` is currently considered experimental. The implementation + and parts of the API may change without warning. + +The ``'string[pyarrow]'`` extension type solves several issues with NumPy backed arrays: + +1. +2. +3. + + +.. ipython:: python + + pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow")) + +You can use the alias ``"string[pyarrow]"`` as well. + +.. ipython:: python + + s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]") + s + +The usual string accessor methods work. Where appropriate, the return type +of the Series or columns of a DataFrame will also have string dtype. + +.. ipython:: python + + s.str.upper() + s.str.split('b', expand=True).dtypes + +String accessor methods returning integers will return a value with :class:`Int64Dtype` + +.. ipython:: python + + s.str.count("a") + +See :ref:`text.types` for more. + .. _whatsnew_130.enhancements.other: Other enhancements