Skip to content

Fix offset of token EOF #5136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 1, 2018
Merged

Conversation

allanrenucci
Copy link
Contributor

Previously was in.length - 1, now in.length

if (idx >= buf.length) {
ch = SU
} else {
val c = buf(idx)
ch = c
charOffset = idx + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered applying this fix to nextRawChar as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

@allanrenucci allanrenucci force-pushed the fix-EOF-offset branch 2 times, most recently from f631bea to d840a87 Compare September 21, 2018 15:55
@allanrenucci
Copy link
Contributor Author

allanrenucci commented Sep 21, 2018

This patch has a serious impact on the parser stability tests:

    tests/neg/parser-stability-11.scala failed
    tests/neg/parser-stability-10.scala failed
    tests/neg/i4373b.scala failed
    tests/neg/parser-stability-16.scala failed
    tests/neg/i2494.scala failed
    tests/neg/parser-stability-20.scala failed
    tests/neg/parser-stability-1.scala failed
    tests/neg/i4373c.scala failed
    tests/neg/parser-stability-21.scala failed
    tests/neg/i4373a.scala failed
    tests/neg/parser-stability-12.scala failed
    tests/neg/parser-stability-15.scala failed
    tests/neg/parser-stability-19.scala failed
    tests/neg/parser-stability.scala failed
    tests/neg/parser-stability-3.scala failed
    tests/neg/parser-stability-22.scala failed
    tests/neg/parser-stability-18.scala failed
    tests/neg/parser-stability-4.scala failed
    tests/neg/parser-stability-2.scala failed
    tests/neg/parser-stability-14.scala failed
    tests/neg/parser-stability-9.scala failed
    tests/neg/parser-stability-5.scala failed

Mostly due to errors reported on different lines as before. I'll go through them one by one and see if this patch improves the reported positions

Previously was `in.length - 1`, now `in.length`
@Blaisorblade
Copy link
Contributor

So this looks internally consistent to me, but the changes in the test files make me wonder. Files are supposed to end in a newline (newlines are terminators, not separators), but the new line does not exist for the users; so it seems that, for such files, the new behavior is worse. So maybe the old behavior was intentional and needs a comment?

So, what motivated this fix?

@allanrenucci
Copy link
Contributor Author

allanrenucci commented Sep 28, 2018

So, what motivated this fix?

There are a lot of situations where the source does not end with a new line (as you are writing code in the REPL for example). In these situations, this can lead to trees with invalid positions. For example, the parser makes up a lot of synthetic trees when the input is incomplete:
https://github.com/lampepfl/dotty/blob/db2d5aaf7fa6bc62aeb7fd2d554c1e5f41d9343d/compiler/src/dotty/tools/dotc/parsing/Parsers.scala#L317

Here is an example:

Input: def, parsed tree: DefDef(<error>,List(),List(),TypeTree,Literal(Constant(null))).

Here is the position of Literal(Constant(null)): [2..3]. This is clearly wrong, it should be [3..3]. That's because the offset of the last token is in.length - 1 = 3 - 1 = 2.

It is very inconvenient not to be able to rely on the position of a tree. I had to workaround this issue in the syntax highlighter: 01768ca.

As you pointed out, the new behavior can be seen as a regression when you report incomplete input errors on files ending with a newline. The error used to be reported at the position in.length - 1 and with this patch at in.length. However, I believe reporting the error at in.length - 1 is only OK when your file ends with a newline, otherwise it is wrong. Here is an example:

21 |class Foo {
   |          ^
   |          '}' expected, but eof found

and now with a new line:

21 |class Foo {
   |           ^
   |           '}' expected, but eof found

Maybe when we report errors, we should take into if a file ends with a new line

@Blaisorblade
Copy link
Contributor

Maybe when we report errors, we should take into if a file ends with a new line

Ah, interesting. Without having studied at the code, if we normalized inputs to either always or never end in newline, we'd avoid the issues you described. WDYT? See below for specifics.

Specifics

At least, changing the EOF position to the old one could be easy (untested):

 if (idx >= buf.length) {
   ch = SU
   // Pretend we "chomped" away any final newline in the position for EOF token.
   // Should work for LF and CR;LF
+  if (buf(idx - 1) == LF) charOffset -= 1
 } else {

(Tho this mixes CharArrayReader with the *Scanners logic).

This leaves the actual newline in the input; removing might be cleaner, but the existing code doesn't seem to mind it.
I guess one could also alter other places (say, offset-to-line-and-column translation in SourceFile.offsetToLine/column), but that seems worse.

  • I'm trusting nobody uses CR only this days, since it was used in Macs before OS X.

@smarter
Copy link
Member

smarter commented Oct 1, 2018

Maybe when we report errors, we should take into if a file ends with a new line

That makes sense to me, because we could also special case the error message, "but eof found" is a bit cryptic, "but the end of the file was reached" would be better.

@allanrenucci allanrenucci merged commit 776bf47 into scala:master Oct 1, 2018
@allanrenucci allanrenucci deleted the fix-EOF-offset branch October 1, 2018 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants