Skip to content

Commit 2f75e05

Browse files
authored
Merge pull request scala#5261 from som-snytt/issue/9827
SI-9827 MatchIterator advances itself
2 parents 31db427 + 905b526 commit 2f75e05

File tree

2 files changed

+161
-51
lines changed

2 files changed

+161
-51
lines changed

src/library/scala/util/matching/Regex.scala

Lines changed: 92 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -11,21 +11,14 @@
1111
* with the main goal of pulling out information from those matches, or replacing
1212
* them with something else.
1313
*
14-
* There are four classes and three objects, with most of them being members of
15-
* Regex companion object. [[scala.util.matching.Regex]] is the class users instantiate
16-
* to do regular expression matching.
14+
* [[scala.util.matching.Regex]] is the class users instantiate to do regular expression matching.
1715
*
18-
* The remaining classes and objects in the package are used in the following way:
19-
*
20-
* * The companion object to [[scala.util.matching.Regex]] just contains the other members.
16+
* The companion object to [[scala.util.matching.Regex]] contains supporting members:
2117
* * [[scala.util.matching.Regex.Match]] makes more information about a match available.
22-
* * [[scala.util.matching.Regex.MatchIterator]] is used to iterate over multiple matches.
18+
* * [[scala.util.matching.Regex.MatchIterator]] is used to iterate over matched strings.
2319
* * [[scala.util.matching.Regex.MatchData]] is just a base trait for the above classes.
2420
* * [[scala.util.matching.Regex.Groups]] extracts group from a [[scala.util.matching.Regex.Match]]
2521
* without recomputing the match.
26-
* * [[scala.util.matching.Regex.Match]] converts a [[scala.util.matching.Regex.Match]]
27-
* into a [[java.lang.String]].
28-
*
2922
*/
3023
package scala.util.matching
3124

@@ -35,6 +28,7 @@ import java.util.regex.{ Pattern, Matcher }
3528
/** A regular expression is used to determine whether a string matches a pattern
3629
* and, if it does, to extract or transform the parts that match.
3730
*
31+
* === Usage ===
3832
* This class delegates to the [[java.util.regex]] package of the Java Platform.
3933
* See the documentation for [[java.util.regex.Pattern]] for details about
4034
* the regular expression syntax for pattern strings.
@@ -53,6 +47,7 @@ import java.util.regex.{ Pattern, Matcher }
5347
* Since escapes are not processed in multi-line string literals, using triple quotes
5448
* avoids having to escape the backslash character, so that `"\\d"` can be written `"""\d"""`.
5549
*
50+
* === Extraction ===
5651
* To extract the capturing groups when a `Regex` is matched, use it as
5752
* an extractor in a pattern match:
5853
*
@@ -92,48 +87,68 @@ import java.util.regex.{ Pattern, Matcher }
9287
* }
9388
* }}}
9489
*
90+
* === Find Matches ===
9591
* To find or replace matches of the pattern, use the various find and replace methods.
96-
* There is a flavor of each method that produces matched strings and
97-
* another that produces `Match` objects.
92+
* For each method, there is a version for working with matched strings and
93+
* another for working with `Match` objects.
9894
*
9995
* For example, pattern matching with an unanchored `Regex`, as in the previous example,
100-
* is the same as using `findFirstMatchIn`, except that the findFirst methods return an `Option`,
101-
* or `None` for no match:
96+
* can also be accomplished using `findFirstMatchIn`. The `findFirst` methods return an `Option`
97+
* which is non-empty if a match is found, or `None` for no match:
10298
*
10399
* {{{
104100
* val dates = "Important dates in history: 2004-01-20, 1958-09-05, 2010-10-06, 2011-07-15"
105-
* val firstDate = date findFirstIn dates getOrElse "No date found."
106-
* val firstYear = for (m <- date findFirstMatchIn dates) yield m group 1
101+
* val firstDate = date.findFirstIn(dates).getOrElse("No date found.")
102+
* val firstYear = for (m <- date.findFirstMatchIn(dates)) yield m.group(1)
107103
* }}}
108104
*
109105
* To find all matches:
110106
*
111107
* {{{
112-
* val allYears = for (m <- date findAllMatchIn dates) yield m group 1
108+
* val allYears = for (m <- date.findAllMatchIn(dates)) yield m.group(1)
113109
* }}}
114110
*
115-
* But `findAllIn` returns a special iterator of strings that can be queried for the `MatchData`
116-
* of the last match:
111+
* To iterate over the matched strings, use `findAllIn`, which returns a special iterator
112+
* that can be queried for the `MatchData` of the last match:
117113
*
118114
* {{{
119-
* val mi = date findAllIn dates
120-
* val oldies = mi filter (_ => (mi group 1).toInt < 1960) map (s => s"$s: An oldie but goodie.")
115+
* val mi = date.findAllIn(dates)
116+
* while (mi.hasNext) {
117+
* val d = mi.next
118+
* if (mi.group(1).toInt < 1960) println(s"$d: An oldie but goodie.")
121119
* }}}
122120
*
123121
* Note that `findAllIn` finds matches that don't overlap. (See [[findAllIn]] for more examples.)
124122
*
125123
* {{{
126124
* val num = """(\d+)""".r
127-
* val all = (num findAllIn "123").toList // List("123"), not List("123", "23", "3")
125+
* val all = num.findAllIn("123").toList // List("123"), not List("123", "23", "3")
126+
* }}}
127+
*
128+
* Also, the "current match" of a `MatchIterator` may be advanced by either `hasNext` or `next`.
129+
* By comparison, the `Iterator[Match]` returned by `findAllMatchIn` or `findAllIn.matchData`
130+
* produces `Match` objects that remain valid after the iterator is advanced.
131+
*
132+
* {{{
133+
* val ns = num.findAllIn("1 2 3")
134+
* ns.start // 0
135+
* ns.hasNext // true
136+
* ns.start // 2
137+
* val ms = num.findAllMatchIn("1 2 3")
138+
* val m = ms.next()
139+
* m.start // 0
140+
* ms.hasNext // true
141+
* m.start // still 0
128142
* }}}
129143
*
144+
* === Replace Text ===
130145
* Text replacement can be performed unconditionally or as a function of the current match:
131146
*
132147
* {{{
133-
* val redacted = date replaceAllIn (dates, "XXXX-XX-XX")
134-
* val yearsOnly = date replaceAllIn (dates, m => m group 1)
135-
* val months = (0 to 11) map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
136-
* val reformatted = date replaceAllIn (dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })
148+
* val redacted = date.replaceAllIn(dates, "XXXX-XX-XX")
149+
* val yearsOnly = date.replaceAllIn(dates, m => m.group(1))
150+
* val months = (0 to 11).map { i => val c = Calendar.getInstance; c.set(2014, i, 1); f"$c%tb" }
151+
* val reformatted = date.replaceAllIn(dates, _ match { case date(y,m,d) => f"${months(m.toInt - 1)} $d, $y" })
137152
* }}}
138153
*
139154
* Pattern matching the `Match` against the `Regex` that created it does not reapply the `Regex`.
@@ -142,7 +157,7 @@ import java.util.regex.{ Pattern, Matcher }
142157
*
143158
* {{{
144159
* val docSpree = """2011(?:-\d{2}){2}""".r
145-
* val docView = date replaceAllIn (dates, _ match {
160+
* val docView = date.replaceAllIn(dates, _ match {
146161
* case docSpree() => "Historic doc spree!"
147162
* case _ => "Something else happened"
148163
* })
@@ -338,22 +353,22 @@ class Regex private[matching](val pattern: Pattern, groupNames: String*) extends
338353
* {{{
339354
* val hat = "hat[^a]+".r
340355
* val hathaway = "hathatthattthatttt"
341-
* val hats = (hat findAllIn hathaway).toList // List(hath, hattth)
342-
* val pos = (hat findAllMatchIn hathaway map (_.start)).toList // List(0, 7)
356+
* val hats = hat.findAllIn(hathaway).toList // List(hath, hattth)
357+
* val pos = hat.findAllMatchIn(hathaway).map(_.start).toList // List(0, 7)
343358
* }}}
344359
*
345360
* To return overlapping matches, it is possible to formulate a regular expression
346361
* with lookahead (`?=`) that does not consume the overlapping region.
347362
*
348363
* {{{
349364
* val madhatter = "(h)(?=(at[^a]+))".r
350-
* val madhats = (madhatter findAllMatchIn hathaway map {
365+
* val madhats = madhatter.findAllMatchIn(hathaway).map {
351366
* case madhatter(x,y) => s"$x$y"
352-
* }).toList // List(hath, hatth, hattth, hatttt)
367+
* }.toList // List(hath, hatth, hattth, hatttt)
353368
* }}}
354369
*
355-
* Attempting to retrieve match information before performing the first match
356-
* or after exhausting the iterator results in [[java.lang.IllegalStateException]].
370+
* Attempting to retrieve match information after exhausting the iterator
371+
* results in [[java.lang.IllegalStateException]].
357372
* See [[scala.util.matching.Regex.MatchIterator]] for details.
358373
*
359374
* @param source The text to match against.
@@ -743,49 +758,76 @@ object Regex {
743758

744759
/** A class to step through a sequence of regex matches.
745760
*
746-
* All methods inherited from [[scala.util.matching.Regex.MatchData]] will throw
747-
* a [[java.lang.IllegalStateException]] until the matcher is initialized. The
748-
* matcher can be initialized by calling `hasNext` or `next()` or causing these
749-
* methods to be called, such as by invoking `toString` or iterating through
750-
* the iterator's elements.
761+
* This is an iterator that returns the matched strings.
762+
*
763+
* Queries about match data pertain to the current state of the underlying
764+
* matcher, which is advanced by calling `hasNext` or `next`.
765+
*
766+
* When matches are exhausted, queries about match data will throw
767+
* [[java.lang.IllegalStateException]].
751768
*
752769
* @see [[java.util.regex.Matcher]]
753770
*/
754771
class MatchIterator(val source: CharSequence, val regex: Regex, val groupNames: Seq[String])
755772
extends AbstractIterator[String] with Iterator[String] with MatchData { self =>
756773

757774
protected[Regex] val matcher = regex.pattern.matcher(source)
758-
private var nextSeen = false
759775

760-
/** Is there another match? */
776+
// 0 = not yet matched, 1 = matched, 2 = advanced to match, 3 = no more matches
777+
private[this] var nextSeen = 0
778+
779+
/** Return true if `next` will find a match.
780+
* As a side effect, advance the underlying matcher if necessary;
781+
* queries about the current match data pertain to the underlying matcher.
782+
*/
761783
def hasNext: Boolean = {
762-
if (!nextSeen) nextSeen = matcher.find()
763-
nextSeen
784+
nextSeen match {
785+
case 0 => nextSeen = if (matcher.find()) 1 else 3
786+
case 1 => ()
787+
case 2 => nextSeen = 0 ; hasNext
788+
case 3 => ()
789+
}
790+
nextSeen == 1 // otherwise, 3
764791
}
765792

766-
/** The next matched substring of `source`. */
793+
/** The next matched substring of `source`.
794+
* As a side effect, advance the underlying matcher if necessary.
795+
*/
767796
def next(): String = {
768-
if (!hasNext) throw new NoSuchElementException
769-
nextSeen = false
797+
nextSeen match {
798+
case 0 => if (!hasNext) throw new NoSuchElementException ; next()
799+
case 1 => nextSeen = 2
800+
case 2 => nextSeen = 0 ; next()
801+
case 3 => throw new NoSuchElementException
802+
}
770803
matcher.group
771804
}
772805

806+
/** Report emptiness. */
773807
override def toString = super[AbstractIterator].toString
774808

809+
// ensure we're at a match
810+
private[this] def ensure(): Unit = nextSeen match {
811+
case 0 => if (!hasNext) throw new IllegalStateException
812+
case 1 => ()
813+
case 2 => ()
814+
case 3 => throw new IllegalStateException
815+
}
816+
775817
/** The index of the first matched character. */
776-
def start: Int = matcher.start
818+
def start: Int = { ensure() ; matcher.start }
777819

778820
/** The index of the first matched character in group `i`. */
779-
def start(i: Int): Int = matcher.start(i)
821+
def start(i: Int): Int = { ensure() ; matcher.start(i) }
780822

781823
/** The index of the last matched character. */
782-
def end: Int = matcher.end
824+
def end: Int = { ensure() ; matcher.end }
783825

784826
/** The index following the last matched character in group `i`. */
785-
def end(i: Int): Int = matcher.end(i)
827+
def end(i: Int): Int = { ensure() ; matcher.end(i) }
786828

787829
/** The number of subgroups. */
788-
def groupCount = matcher.groupCount
830+
def groupCount = { ensure() ; matcher.groupCount }
789831

790832
/** Convert to an iterator that yields MatchData elements instead of Strings. */
791833
def matchData: Iterator[Match] = new AbstractIterator[Match] {

test/junit/scala/util/matching/RegexTest.scala

Lines changed: 69 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,8 +85,9 @@ class RegexTest {
8585
assertFalse(ms.hasNext)
8686
}
8787

88-
//type NoGroup = NoSuchElementException
8988
type NoGroup = IllegalArgumentException
89+
type NoMatch = NoSuchElementException
90+
type NoData = IllegalStateException
9091

9192
@Test def `SI-9666: throw on bad name`(): Unit = {
9293
assertThrows[NoGroup] {
@@ -108,4 +109,71 @@ class RegexTest {
108109
ms group "Bee"
109110
}
110111
}
112+
113+
@Test def `SI-9827 MatchIterator ergonomics`(): Unit = {
114+
val r = "(ab)(cd)".r
115+
val s = "xxxabcdyyyabcdzzz"
116+
assertEquals(3, r.findAllIn(s).start)
117+
assertEquals(5, r.findAllIn(s).start(2))
118+
locally {
119+
val mi = r.findAllIn(s)
120+
assertTrue(mi.hasNext)
121+
assertEquals(3, mi.start)
122+
assertEquals("abcd", mi.next())
123+
assertEquals(3, mi.start)
124+
assertTrue(mi.hasNext)
125+
assertEquals(10, mi.start)
126+
}
127+
locally {
128+
val mi = r.findAllIn(s)
129+
assertEquals("abcd", mi.next())
130+
assertEquals(3, mi.start)
131+
assertEquals("abcd", mi.next())
132+
assertEquals(10, mi.start)
133+
assertThrows[NoMatch] { mi.next() }
134+
assertThrows[NoData] { mi.start }
135+
}
136+
locally {
137+
val mi = r.findAllIn("")
138+
assertThrows[NoData] { mi.start }
139+
assertThrows[NoMatch] { mi.next() }
140+
}
141+
locally {
142+
val mi = r.findAllMatchIn(s)
143+
val x = mi.next()
144+
assertEquals("abcd", x.matched)
145+
assertEquals(3, x.start)
146+
val y = mi.next()
147+
assertEquals("abcd", y.matched)
148+
assertEquals(10, y.start)
149+
assertThrows[NoMatch] { mi.next() }
150+
assertEquals(3, x.start)
151+
assertEquals(10, y.start)
152+
}
153+
locally {
154+
val regex = "(foo)-(.*)".r
155+
val s = "foo-abc-def"
156+
val result = regex.findAllIn(s)
157+
//result.toString // comment this line to make it not work
158+
val r = (result.group(1), result.group(2))
159+
assertEquals(("foo", "abc-def"), r)
160+
}
161+
locally {
162+
val t = "this is a test"
163+
val rx = " ".r
164+
val m = rx.findAllIn(t)
165+
assertEquals(5, rx.findAllIn(t).end)
166+
}
167+
locally {
168+
val data = "<a>aaaaa</a><b>bbbbbb</b><c>ccccccc</c>"
169+
val p = "^<a>(.+)</a><b>(.+)</b><c>(.+)</c>$".r
170+
val parts = p.findAllIn(data)
171+
val aes = parts.group(1)
172+
val bes = parts.group(2)
173+
val ces = parts.group(3)
174+
assertEquals("ccccccc", ces)
175+
assertEquals("bbbbbb", bes)
176+
assertEquals("aaaaa", aes)
177+
}
178+
}
111179
}

0 commit comments

Comments
 (0)