@@ -1029,7 +1029,7 @@ with more than one group returns a DataFrame with one column per group.
1029
1029
1030
1030
Series([' a1' , ' b2' , ' c3' ]).str.extract(' ([ab])(\d)' )
1031
1031
1032
- Elements that do not match return a row of ``NaN``s .
1032
+ Elements that do not match return a row filled with ``NaN ``.
1033
1033
Thus, a Series of messy strings can be "converted" into a
1034
1034
like-indexed Series or DataFrame of cleaned-up or more useful strings,
1035
1035
without necessitating ``get() `` to access tuples or ``re.match `` objects.
@@ -1051,18 +1051,35 @@ can also be used.
1051
1051
Testing for Strings that Match or Contain a Pattern
1052
1052
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1053
1053
1054
- In previous versions, *extracting * match groups was accomplished by ``match ``,
1055
- which returned a not-so-convenient Series of tuples. Starting in version 0.14,
1056
- the default behavior of match will change. It will return a boolean
1057
- indexer, analagous to the method ``contains ``.
1058
1054
1059
- The distinction between
1060
- ``match `` and ``contains `` is strictness: ``match `` relies on
1061
- strict ``re.match `` while ``contains `` relies on ``re.search ``.
1055
+ You can check whether elements contain a pattern:
1062
1056
1063
- In version 0.13, ``match `` performs its old, deprecated behavior by default,
1064
- but the new behavior is availabe through the keyword argument
1065
- ``as_indexer=True ``.
1057
+ .. ipython :: python
1058
+
1059
+ pattern = r ' [a-z ][0-9 ]'
1060
+ Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).contains(pattern)
1061
+
1062
+ or match a pattern:
1063
+
1064
+
1065
+ .. ipython :: python
1066
+
1067
+ Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).match(pattern, as_indexer = True )
1068
+
1069
+ The distinction between ``match `` and ``contains `` is strictness: ``match ``
1070
+ relies on strict ``re.match ``, while ``contains `` relies on ``re.search ``.
1071
+
1072
+ .. warning ::
1073
+
1074
+ In previous versions, ``match `` was for *extracting * groups,
1075
+ returning a not-so-convenient Series of tuples. The new method ``extract ``
1076
+ (described in the previous section) is now preferred.
1077
+
1078
+ This old, deprecated behavior of ``match `` is still the default. As
1079
+ demonstrated above, use the new behavior by setting ``as_indexer=True ``.
1080
+ In this mode, ``match `` is analagous to ``contains ``, returning a boolean
1081
+ Series. The new behavior will become the default behavior in a future
1082
+ release.
1066
1083
1067
1084
Methods like ``match ``, ``contains ``, ``startswith ``, and ``endswith `` take
1068
1085
an extra ``na `` arguement so missing values can be considered True or False:
0 commit comments