Fix offset of token EOF #5136

allanrenucci · 2018-09-21T13:15:40Z

Previously was in.length - 1, now in.length

Blaisorblade · 2018-09-21T13:21:25Z

compiler/src/dotty/tools/dotc/parsing/CharArrayReader.scala

    if (idx >= buf.length) {
      ch = SU
    } else {
      val c = buf(idx)
      ch = c
-      charOffset = idx + 1


Have you considered applying this fix to nextRawChar as well?

allanrenucci · 2018-09-21T16:21:36Z

This patch has a serious impact on the parser stability tests:

    tests/neg/parser-stability-11.scala failed
    tests/neg/parser-stability-10.scala failed
    tests/neg/i4373b.scala failed
    tests/neg/parser-stability-16.scala failed
    tests/neg/i2494.scala failed
    tests/neg/parser-stability-20.scala failed
    tests/neg/parser-stability-1.scala failed
    tests/neg/i4373c.scala failed
    tests/neg/parser-stability-21.scala failed
    tests/neg/i4373a.scala failed
    tests/neg/parser-stability-12.scala failed
    tests/neg/parser-stability-15.scala failed
    tests/neg/parser-stability-19.scala failed
    tests/neg/parser-stability.scala failed
    tests/neg/parser-stability-3.scala failed
    tests/neg/parser-stability-22.scala failed
    tests/neg/parser-stability-18.scala failed
    tests/neg/parser-stability-4.scala failed
    tests/neg/parser-stability-2.scala failed
    tests/neg/parser-stability-14.scala failed
    tests/neg/parser-stability-9.scala failed
    tests/neg/parser-stability-5.scala failed

Mostly due to errors reported on different lines as before. I'll go through them one by one and see if this patch improves the reported positions

Previously was `in.length - 1`, now `in.length`

Blaisorblade · 2018-09-28T12:58:28Z

So this looks internally consistent to me, but the changes in the test files make me wonder. Files are supposed to end in a newline (newlines are terminators, not separators), but the new line does not exist for the users; so it seems that, for such files, the new behavior is worse. So maybe the old behavior was intentional and needs a comment?

So, what motivated this fix?

allanrenucci · 2018-09-28T14:11:32Z

So, what motivated this fix?

There are a lot of situations where the source does not end with a new line (as you are writing code in the REPL for example). In these situations, this can lead to trees with invalid positions. For example, the parser makes up a lot of synthetic trees when the input is incomplete:
https://github.com/lampepfl/dotty/blob/db2d5aaf7fa6bc62aeb7fd2d554c1e5f41d9343d/compiler/src/dotty/tools/dotc/parsing/Parsers.scala#L317

Here is an example:

Input: def, parsed tree: DefDef(<error>,List(),List(),TypeTree,Literal(Constant(null))).

Here is the position of Literal(Constant(null)): [2..3]. This is clearly wrong, it should be [3..3]. That's because the offset of the last token is in.length - 1 = 3 - 1 = 2.

It is very inconvenient not to be able to rely on the position of a tree. I had to workaround this issue in the syntax highlighter: 01768ca.

As you pointed out, the new behavior can be seen as a regression when you report incomplete input errors on files ending with a newline. The error used to be reported at the position in.length - 1 and with this patch at in.length. However, I believe reporting the error at in.length - 1 is only OK when your file ends with a newline, otherwise it is wrong. Here is an example:

21 |class Foo {
   |          ^
   |          '}' expected, but eof found

and now with a new line:

21 |class Foo {
   |           ^
   |           '}' expected, but eof found

Maybe when we report errors, we should take into if a file ends with a new line

Blaisorblade · 2018-10-01T08:28:20Z

Maybe when we report errors, we should take into if a file ends with a new line

Ah, interesting. Without having studied at the code, if we normalized inputs to either always or never end in newline, we'd avoid the issues you described. WDYT? See below for specifics.

Specifics

At least, changing the EOF position to the old one could be easy (untested):

 if (idx >= buf.length) {
   ch = SU
   // Pretend we "chomped" away any final newline in the position for EOF token.
   // Should work for LF and CR;LF
+  if (buf(idx - 1) == LF) charOffset -= 1
 } else {

(Tho this mixes CharArrayReader with the *Scanners logic).

This leaves the actual newline in the input; removing might be cleaner, but the existing code doesn't seem to mind it.
I guess one could also alter other places (say, offset-to-line-and-column translation in SourceFile.offsetToLine/column), but that seems worse.

I'm trusting nobody uses CR only this days, since it was used in Macs before OS X.

smarter · 2018-10-01T08:36:02Z

Maybe when we report errors, we should take into if a file ends with a new line

That makes sense to me, because we could also special case the error message, "but eof found" is a bit cryptic, "but the end of the file was reached" would be better.

Blaisorblade reviewed Sep 21, 2018

View reviewed changes

allanrenucci force-pushed the fix-EOF-offset branch 2 times, most recently from f631bea to d840a87 Compare September 21, 2018 15:55

allanrenucci self-assigned this Sep 22, 2018

allanrenucci force-pushed the fix-EOF-offset branch from d840a87 to 36df834 Compare September 25, 2018 16:18

allanrenucci requested a review from Blaisorblade September 25, 2018 16:40

allanrenucci assigned Blaisorblade and unassigned allanrenucci Sep 25, 2018

allanrenucci added the stat:needs review label Sep 26, 2018

Fix offset of token EOF

47f4e3f

Previously was `in.length - 1`, now `in.length`

allanrenucci force-pushed the fix-EOF-offset branch from 36df834 to 47f4e3f Compare September 26, 2018 09:15

allanrenucci merged commit 776bf47 into scala:master Oct 1, 2018

allanrenucci deleted the fix-EOF-offset branch October 1, 2018 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix offset of token EOF #5136

Fix offset of token EOF #5136

Uh oh!

allanrenucci commented Sep 21, 2018

Uh oh!

Blaisorblade Sep 21, 2018

Uh oh!

allanrenucci Sep 21, 2018

Uh oh!

allanrenucci commented Sep 21, 2018 •

edited

Loading

Uh oh!

Blaisorblade commented Sep 28, 2018

Uh oh!

allanrenucci commented Sep 28, 2018 •

edited

Loading

Uh oh!

Blaisorblade commented Oct 1, 2018

Uh oh!

smarter commented Oct 1, 2018

Uh oh!

Uh oh!

Fix offset of token EOF #5136

Fix offset of token EOF #5136

Uh oh!

Conversation

allanrenucci commented Sep 21, 2018

Uh oh!

Blaisorblade Sep 21, 2018

Choose a reason for hiding this comment

Uh oh!

allanrenucci Sep 21, 2018

Choose a reason for hiding this comment

Uh oh!

allanrenucci commented Sep 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Blaisorblade commented Sep 28, 2018

Uh oh!

allanrenucci commented Sep 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Blaisorblade commented Oct 1, 2018

Specifics

Uh oh!

smarter commented Oct 1, 2018

Uh oh!

Uh oh!

allanrenucci commented Sep 21, 2018 •

edited

Loading

allanrenucci commented Sep 28, 2018 •

edited

Loading