@@ -960,6 +960,9 @@ importantly, these methods exclude missing/NA values automatically. These are
960
960
accessed via the Series's ``str `` attribute and generally have names matching
961
961
the equivalent (scalar) build-in string methods:
962
962
963
+ Splitting and Replacing Strings
964
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
965
+
963
966
.. ipython :: python
964
967
965
968
s = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
@@ -990,11 +993,12 @@ Methods like ``replace`` and ``findall`` take regular expressions, too:
990
993
s3
991
994
s3.str.replace(' ^.a|dog' , ' XX-XX ' , case = False )
992
995
993
- The method ``match `` returns the groups in a regular expression in one tuple.
994
- Starting in pandas version 0.13.0, the method ``extract `` is available to
995
- accomplish this more conveniently.
996
+ Extracting Substrings
997
+ ~~~~~~~~~~~~~~~~~~~~~
996
998
997
- Extracting a regular expression with one group returns a Series of strings.
999
+ The method ``extract `` (introduced in version 0.13) accepts regular expressions
1000
+ with match groups. Extracting a regular expression with one group returns
1001
+ a Series of strings.
998
1002
999
1003
.. ipython :: python
1000
1004
@@ -1016,18 +1020,34 @@ Named groups like
1016
1020
1017
1021
.. ipython :: python
1018
1022
1019
- Series([' a1' , ' b2' , ' c3' ]).str.match (' (?P<letter>[ab])(?P<digit>\d)' )
1023
+ Series([' a1' , ' b2' , ' c3' ]).str.extract (' (?P<letter>[ab])(?P<digit>\d)' )
1020
1024
1021
1025
and optional groups like
1022
1026
1023
1027
.. ipython :: python
1024
1028
1025
- Series([' a1' , ' b2' , ' 3' ]).str.match (' (?P<letter>[ab])?(?P<digit>\d)' )
1029
+ Series([' a1' , ' b2' , ' 3' ]).str.extract (' (?P<letter>[ab])?(?P<digit>\d)' )
1026
1030
1027
1031
can also be used.
1028
1032
1029
- Methods like ``contains ``, ``startswith ``, and ``endswith `` takes an extra
1030
- ``na `` arguement so missing values can be considered True or False:
1033
+ Testing for Strings that Match or Contain a Pattern
1034
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1035
+
1036
+ In previous versions, *extracting * match groups was accomplished by ``match ``,
1037
+ which returned a not-so-convenient Series of tuples. Starting in version 0.14,
1038
+ the default behavior of match will change. It will return a boolean
1039
+ indexer, analagous to the method ``contains ``.
1040
+
1041
+ The distinction between
1042
+ ``match `` and ``contains `` is strictness: ``match `` relies on
1043
+ strict ``re.match `` while ``contains `` relies on ``re.search ``.
1044
+
1045
+ In version 0.13, ``match `` performs its old, deprecated behavior by default,
1046
+ but the new behavior is availabe through the keyword argument
1047
+ ``as_indexer=True ``.
1048
+
1049
+ Methods like ``match ``, ``contains ``, ``startswith ``, and ``endswith `` take
1050
+ an extra ``na `` arguement so missing values can be considered True or False:
1031
1051
1032
1052
.. ipython :: python
1033
1053
0 commit comments