Skip to content

Commit 3b832d0

Browse files
committed
DOC: Expanded section on string methods in wake of extract/match change.
1 parent 75dd0f2 commit 3b832d0

File tree

2 files changed

+36
-8
lines changed

2 files changed

+36
-8
lines changed

doc/source/basics.rst

+28-8
Original file line numberDiff line numberDiff line change
@@ -960,6 +960,9 @@ importantly, these methods exclude missing/NA values automatically. These are
960960
accessed via the Series's ``str`` attribute and generally have names matching
961961
the equivalent (scalar) build-in string methods:
962962

963+
Splitting and Replacing Strings
964+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
965+
963966
.. ipython:: python
964967
965968
s = Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
@@ -990,11 +993,12 @@ Methods like ``replace`` and ``findall`` take regular expressions, too:
990993
s3
991994
s3.str.replace('^.a|dog', 'XX-XX ', case=False)
992995
993-
The method ``match`` returns the groups in a regular expression in one tuple.
994-
Starting in pandas version 0.13.0, the method ``extract`` is available to
995-
accomplish this more conveniently.
996+
Extracting Substrings
997+
~~~~~~~~~~~~~~~~~~~~~
996998

997-
Extracting a regular expression with one group returns a Series of strings.
999+
The method ``extract`` (introduced in version 0.13) accepts regular expressions
1000+
with match groups. Extracting a regular expression with one group returns
1001+
a Series of strings.
9981002

9991003
.. ipython:: python
10001004
@@ -1016,18 +1020,34 @@ Named groups like
10161020

10171021
.. ipython:: python
10181022
1019-
Series(['a1', 'b2', 'c3']).str.match('(?P<letter>[ab])(?P<digit>\d)')
1023+
Series(['a1', 'b2', 'c3']).str.extract('(?P<letter>[ab])(?P<digit>\d)')
10201024
10211025
and optional groups like
10221026

10231027
.. ipython:: python
10241028
1025-
Series(['a1', 'b2', '3']).str.match('(?P<letter>[ab])?(?P<digit>\d)')
1029+
Series(['a1', 'b2', '3']).str.extract('(?P<letter>[ab])?(?P<digit>\d)')
10261030
10271031
can also be used.
10281032

1029-
Methods like ``contains``, ``startswith``, and ``endswith`` takes an extra
1030-
``na`` arguement so missing values can be considered True or False:
1033+
Testing for Strings that Match or Contain a Pattern
1034+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1035+
1036+
In previous versions, *extracting* match groups was accomplished by ``match``,
1037+
which returned a not-so-convenient Series of tuples. Starting in version 0.14,
1038+
the default behavior of match will change. It will return a boolean
1039+
indexer, analagous to the method ``contains``.
1040+
1041+
The distinction between
1042+
``match`` and ``contains`` is strictness: ``match`` relies on
1043+
strict ``re.match`` while ``contains`` relies on ``re.search``.
1044+
1045+
In version 0.13, ``match`` performs its old, deprecated behavior by default,
1046+
but the new behavior is availabe through the keyword argument
1047+
``as_indexer=True``.
1048+
1049+
Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
1050+
an extra ``na`` arguement so missing values can be considered True or False:
10311051

10321052
.. ipython:: python
10331053

doc/source/v0.13.0.txt

+8
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,14 @@ Deprecated in 0.13.0
102102
- deprecated ``iterkv``, which will be removed in a future release (this was
103103
an alias of iteritems used to bypass ``2to3``'s changes).
104104
(:issue:`4384`, :issue:`4375`, :issue:`4372`)
105+
- deprecated the string method ``match``, whose role is now performed more
106+
idiomatically by ``extract``. In a future release, the default behavior
107+
of ``match`` will change to become analogous to ``contains``, which returns
108+
a boolean indexer. (Their
109+
distinction is strictness: ``match`` relies on ``re.match`` while
110+
``contains`` relies on ``re.serach``.) In this release, the deprecated
111+
behavior is the default, but the new behavior is available through the
112+
keyword argument ``as_indexer=True``.
105113

106114
Indexing API Changes
107115
~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)