11
11
* with the main goal of pulling out information from those matches, or replacing
12
12
* them with something else.
13
13
*
14
- * There are four classes and three objects, with most of them being members of
15
- * Regex companion object. [[scala.util.matching.Regex ]] is the class users instantiate
16
- * to do regular expression matching.
14
+ * [[scala.util.matching.Regex ]] is the class users instantiate to do regular expression matching.
17
15
*
18
- * The remaining classes and objects in the package are used in the following way:
19
- *
20
- * * The companion object to [[scala.util.matching.Regex ]] just contains the other members.
16
+ * The companion object to [[scala.util.matching.Regex ]] contains supporting members:
21
17
* * [[scala.util.matching.Regex.Match ]] makes more information about a match available.
22
- * * [[scala.util.matching.Regex.MatchIterator ]] is used to iterate over multiple matches .
18
+ * * [[scala.util.matching.Regex.MatchIterator ]] is used to iterate over matched strings .
23
19
* * [[scala.util.matching.Regex.MatchData ]] is just a base trait for the above classes.
24
20
* * [[scala.util.matching.Regex.Groups ]] extracts group from a [[scala.util.matching.Regex.Match ]]
25
21
* without recomputing the match.
26
- * * [[scala.util.matching.Regex.Match ]] converts a [[scala.util.matching.Regex.Match ]]
27
- * into a [[java.lang.String ]].
28
- *
29
22
*/
30
23
package scala .util .matching
31
24
@@ -35,6 +28,7 @@ import java.util.regex.{ Pattern, Matcher }
35
28
/** A regular expression is used to determine whether a string matches a pattern
36
29
* and, if it does, to extract or transform the parts that match.
37
30
*
31
+ * === Usage ===
38
32
* This class delegates to the [[java.util.regex ]] package of the Java Platform.
39
33
* See the documentation for [[java.util.regex.Pattern ]] for details about
40
34
* the regular expression syntax for pattern strings.
@@ -53,6 +47,7 @@ import java.util.regex.{ Pattern, Matcher }
53
47
* Since escapes are not processed in multi-line string literals, using triple quotes
54
48
* avoids having to escape the backslash character, so that `"\\d"` can be written `"""\d"""`.
55
49
*
50
+ * === Extraction ===
56
51
* To extract the capturing groups when a `Regex` is matched, use it as
57
52
* an extractor in a pattern match:
58
53
*
@@ -92,48 +87,68 @@ import java.util.regex.{ Pattern, Matcher }
92
87
* }
93
88
* }}}
94
89
*
90
+ * === Find Matches ===
95
91
* To find or replace matches of the pattern, use the various find and replace methods.
96
- * There is a flavor of each method that produces matched strings and
97
- * another that produces `Match` objects.
92
+ * For each method, there is a version for working with matched strings and
93
+ * another for working with `Match` objects.
98
94
*
99
95
* For example, pattern matching with an unanchored `Regex`, as in the previous example,
100
- * is the same as using `findFirstMatchIn`, except that the findFirst methods return an `Option`,
101
- * or `None` for no match:
96
+ * can also be accomplished using `findFirstMatchIn`. The ` findFirst` methods return an `Option`
97
+ * which is non-empty if a match is found, or `None` for no match:
102
98
*
103
99
* {{{
104
100
* val dates = "Important dates in history: 2004-01-20, 1958-09-05, 2010-10-06, 2011-07-15"
105
- * val firstDate = date findFirstIn dates getOrElse "No date found."
106
- * val firstYear = for (m <- date findFirstMatchIn dates) yield m group 1
101
+ * val firstDate = date. findFirstIn( dates). getOrElse( "No date found.")
102
+ * val firstYear = for (m <- date. findFirstMatchIn( dates)) yield m. group(1)
107
103
* }}}
108
104
*
109
105
* To find all matches:
110
106
*
111
107
* {{{
112
- * val allYears = for (m <- date findAllMatchIn dates) yield m group 1
108
+ * val allYears = for (m <- date. findAllMatchIn( dates)) yield m. group(1)
113
109
* }}}
114
110
*
115
- * But `findAllIn` returns a special iterator of strings that can be queried for the `MatchData`
116
- * of the last match:
111
+ * To iterate over the matched strings, use `findAllIn`, which returns a special iterator
112
+ * that can be queried for the `MatchData` of the last match:
117
113
*
118
114
* {{{
119
- * val mi = date findAllIn dates
120
- * val oldies = mi filter (_ => (mi group 1).toInt < 1960) map (s => s"$s: An oldie but goodie.")
115
+ * val mi = date.findAllIn(dates)
116
+ * while (mi.hasNext) {
117
+ * val d = mi.next
118
+ * if (mi.group(1).toInt < 1960) println(s"$d: An oldie but goodie.")
121
119
* }}}
122
120
*
123
121
* Note that `findAllIn` finds matches that don't overlap. (See [[findAllIn ]] for more examples.)
124
122
*
125
123
* {{{
126
124
* val num = """(\d+)""".r
127
- * val all = (num findAllIn "123").toList // List("123"), not List("123", "23", "3")
125
+ * val all = num.findAllIn("123").toList // List("123"), not List("123", "23", "3")
126
+ * }}}
127
+ *
128
+ * Also, the "current match" of a `MatchIterator` may be advanced by either `hasNext` or `next`.
129
+ * By comparison, the `Iterator[Match]` returned by `findAllMatchIn` or `findAllIn.matchData`
130
+ * produces `Match` objects that remain valid after the iterator is advanced.
131
+ *
132
+ * {{{
133
+ * val ns = num.findAllIn("1 2 3")
134
+ * ns.start // 0
135
+ * ns.hasNext // true
136
+ * ns.start // 2
137
+ * val ms = num.findAllMatchIn("1 2 3")
138
+ * val m = ms.next()
139
+ * m.start // 0
140
+ * ms.hasNext // true
141
+ * m.start // still 0
128
142
* }}}
129
143
*
144
+ * === Replace Text ===
130
145
* Text replacement can be performed unconditionally or as a function of the current match:
131
146
*
132
147
* {{{
133
- * val redacted = date replaceAllIn (dates, "XXXX-XX-XX")
134
- * val yearsOnly = date replaceAllIn (dates, m => m group 1 )
135
- * val months = (0 to 11) map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
136
- * val reformatted = date replaceAllIn (dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })
148
+ * val redacted = date. replaceAllIn(dates, "XXXX-XX-XX")
149
+ * val yearsOnly = date. replaceAllIn(dates, m => m. group(1) )
150
+ * val months = (0 to 11). map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
151
+ * val reformatted = date. replaceAllIn(dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })
137
152
* }}}
138
153
*
139
154
* Pattern matching the `Match` against the `Regex` that created it does not reapply the `Regex`.
@@ -142,7 +157,7 @@ import java.util.regex.{ Pattern, Matcher }
142
157
*
143
158
* {{{
144
159
* val docSpree = """2011(?:-\d{2}){2}""".r
145
- * val docView = date replaceAllIn (dates, _ match {
160
+ * val docView = date. replaceAllIn(dates, _ match {
146
161
* case docSpree() => "Historic doc spree!"
147
162
* case _ => "Something else happened"
148
163
* })
@@ -338,22 +353,22 @@ class Regex private[matching](val pattern: Pattern, groupNames: String*) extends
338
353
* {{{
339
354
* val hat = "hat[^a]+".r
340
355
* val hathaway = "hathatthattthatttt"
341
- * val hats = ( hat findAllIn hathaway).toList // List(hath, hattth)
342
- * val pos = ( hat findAllMatchIn hathaway map (_.start)) .toList // List(0, 7)
356
+ * val hats = hat. findAllIn( hathaway).toList // List(hath, hattth)
357
+ * val pos = hat. findAllMatchIn( hathaway). map(_.start).toList // List(0, 7)
343
358
* }}}
344
359
*
345
360
* To return overlapping matches, it is possible to formulate a regular expression
346
361
* with lookahead (`?=`) that does not consume the overlapping region.
347
362
*
348
363
* {{{
349
364
* val madhatter = "(h)(?=(at[^a]+))".r
350
- * val madhats = ( madhatter findAllMatchIn hathaway map {
365
+ * val madhats = madhatter. findAllMatchIn( hathaway). map {
351
366
* case madhatter(x,y) => s"$x$y"
352
- * }) .toList // List(hath, hatth, hattth, hatttt)
367
+ * }.toList // List(hath, hatth, hattth, hatttt)
353
368
* }}}
354
369
*
355
- * Attempting to retrieve match information before performing the first match
356
- * or after exhausting the iterator results in [[java.lang.IllegalStateException ]].
370
+ * Attempting to retrieve match information after exhausting the iterator
371
+ * results in [[java.lang.IllegalStateException ]].
357
372
* See [[scala.util.matching.Regex.MatchIterator ]] for details.
358
373
*
359
374
* @param source The text to match against.
@@ -743,49 +758,76 @@ object Regex {
743
758
744
759
/** A class to step through a sequence of regex matches.
745
760
*
746
- * All methods inherited from [[scala.util.matching.Regex.MatchData ]] will throw
747
- * a [[java.lang.IllegalStateException ]] until the matcher is initialized. The
748
- * matcher can be initialized by calling `hasNext` or `next()` or causing these
749
- * methods to be called, such as by invoking `toString` or iterating through
750
- * the iterator's elements.
761
+ * This is an iterator that returns the matched strings.
762
+ *
763
+ * Queries about match data pertain to the current state of the underlying
764
+ * matcher, which is advanced by calling `hasNext` or `next`.
765
+ *
766
+ * When matches are exhausted, queries about match data will throw
767
+ * [[java.lang.IllegalStateException ]].
751
768
*
752
769
* @see [[java.util.regex.Matcher ]]
753
770
*/
754
771
class MatchIterator (val source : CharSequence , val regex : Regex , val groupNames : Seq [String ])
755
772
extends AbstractIterator [String ] with Iterator [String ] with MatchData { self =>
756
773
757
774
protected [Regex ] val matcher = regex.pattern.matcher(source)
758
- private var nextSeen = false
759
775
760
- /** Is there another match? */
776
+ // 0 = not yet matched, 1 = matched, 2 = advanced to match, 3 = no more matches
777
+ private [this ] var nextSeen = 0
778
+
779
+ /** Return true if `next` will find a match.
780
+ * As a side effect, advance the underlying matcher if necessary;
781
+ * queries about the current match data pertain to the underlying matcher.
782
+ */
761
783
def hasNext : Boolean = {
762
- if (! nextSeen) nextSeen = matcher.find()
763
- nextSeen
784
+ nextSeen match {
785
+ case 0 => nextSeen = if (matcher.find()) 1 else 3
786
+ case 1 => ()
787
+ case 2 => nextSeen = 0 ; hasNext
788
+ case 3 => ()
789
+ }
790
+ nextSeen == 1 // otherwise, 3
764
791
}
765
792
766
- /** The next matched substring of `source`. */
793
+ /** The next matched substring of `source`.
794
+ * As a side effect, advance the underlying matcher if necessary.
795
+ */
767
796
def next (): String = {
768
- if (! hasNext) throw new NoSuchElementException
769
- nextSeen = false
797
+ nextSeen match {
798
+ case 0 => if (! hasNext) throw new NoSuchElementException ; next()
799
+ case 1 => nextSeen = 2
800
+ case 2 => nextSeen = 0 ; next()
801
+ case 3 => throw new NoSuchElementException
802
+ }
770
803
matcher.group
771
804
}
772
805
806
+ /** Report emptiness. */
773
807
override def toString = super [AbstractIterator ].toString
774
808
809
+ // ensure we're at a match
810
+ private [this ] def ensure (): Unit = nextSeen match {
811
+ case 0 => if (! hasNext) throw new IllegalStateException
812
+ case 1 => ()
813
+ case 2 => ()
814
+ case 3 => throw new IllegalStateException
815
+ }
816
+
775
817
/** The index of the first matched character. */
776
- def start : Int = matcher.start
818
+ def start : Int = { ensure() ; matcher.start }
777
819
778
820
/** The index of the first matched character in group `i`. */
779
- def start (i : Int ): Int = matcher.start(i)
821
+ def start (i : Int ): Int = { ensure() ; matcher.start(i) }
780
822
781
823
/** The index of the last matched character. */
782
- def end : Int = matcher.end
824
+ def end : Int = { ensure() ; matcher.end }
783
825
784
826
/** The index following the last matched character in group `i`. */
785
- def end (i : Int ): Int = matcher.end(i)
827
+ def end (i : Int ): Int = { ensure() ; matcher.end(i) }
786
828
787
829
/** The number of subgroups. */
788
- def groupCount = matcher.groupCount
830
+ def groupCount = { ensure() ; matcher.groupCount }
789
831
790
832
/** Convert to an iterator that yields MatchData elements instead of Strings. */
791
833
def matchData : Iterator [Match ] = new AbstractIterator [Match ] {
0 commit comments