Skip to content

Commit 9a42cbe

Browse files
h-vetinarijreback
authored andcommitted
API: Series.str-accessor infers dtype (and Index.str does not raise on all-NA) (#23167)
1 parent 73d8f96 commit 9a42cbe

File tree

4 files changed

+233
-79
lines changed

4 files changed

+233
-79
lines changed

doc/source/user_guide/text.rst

+10
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,16 @@ and replacing any remaining whitespaces with underscores:
7070
``.str`` methods which operate on elements of type ``list`` are not available on such a
7171
``Series``.
7272

73+
.. _text.warn_types:
74+
75+
.. warning::
76+
77+
Before v.0.25.0, the ``.str``-accessor did only the most rudimentary type checks. Starting with
78+
v.0.25.0, the type of the Series is inferred and the allowed types (i.e. strings) are enforced more rigorously.
79+
80+
Generally speaking, the ``.str`` accessor is intended to work only on strings. With very few
81+
exceptions, other uses are not supported, and may be disabled at a later point.
82+
7383

7484
Splitting and Replacing Strings
7585
-------------------------------

doc/source/whatsnew/v0.25.0.rst

+38-2
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,43 @@ returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwi
231231
Providing any ``SparseSeries`` or ``SparseDataFrame`` to :func:`concat` will
232232
cause a ``SparseSeries`` or ``SparseDataFrame`` to be returned, as before.
233233

234+
The ``.str``-accessor performs stricter type checks
235+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
236+
237+
Due to the lack of more fine-grained dtypes, :attr:`Series.str` so far only checked whether the data was
238+
of ``object`` dtype. :attr:`Series.str` will now infer the dtype data *within* the Series; in particular,
239+
``'bytes'``-only data will raise an exception (except for :meth:`Series.str.decode`, :meth:`Series.str.get`,
240+
:meth:`Series.str.len`, :meth:`Series.str.slice`), see :issue:`23163`, :issue:`23011`, :issue:`23551`.
241+
242+
*Previous Behaviour*:
243+
244+
.. code-block:: python
245+
246+
In [1]: s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object)
247+
248+
In [2]: s
249+
Out[2]:
250+
0 b'a'
251+
1 b'ba'
252+
2 b'cba'
253+
dtype: object
254+
255+
In [3]: s.str.startswith(b'a')
256+
Out[3]:
257+
0 True
258+
1 False
259+
2 False
260+
dtype: bool
261+
262+
*New Behaviour*:
263+
264+
.. ipython:: python
265+
:okexcept:
266+
267+
s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object)
268+
s
269+
s.str.startswith(b'a')
270+
234271
.. _whatsnew_0250.api_breaking.incompatible_index_unions
235272
236273
Incompatible Index Type Unions
@@ -331,7 +368,6 @@ This change is backward compatible for direct usage of Pandas, but if you subcla
331368
Pandas objects *and* give your subclasses specific ``__str__``/``__repr__`` methods,
332369
you may have to adjust your ``__str__``/``__repr__`` methods (:issue:`26495`).
333370

334-
335371
.. _whatsnew_0250.api_breaking.deps:
336372

337373
Increased minimum versions for dependencies
@@ -537,7 +573,7 @@ Conversion
537573
Strings
538574
^^^^^^^
539575

540-
-
576+
- Bug in the ``__name__`` attribute of several methods of :class:`Series.str`, which were set incorrectly (:issue:`23551`)
541577
-
542578
-
543579

0 commit comments

Comments
 (0)