-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix offset of token EOF #5136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix offset of token EOF #5136
Conversation
if (idx >= buf.length) { | ||
ch = SU | ||
} else { | ||
val c = buf(idx) | ||
ch = c | ||
charOffset = idx + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered applying this fix to nextRawChar as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
f631bea
to
d840a87
Compare
This patch has a serious impact on the parser stability tests:
Mostly due to errors reported on different lines as before. I'll go through them one by one and see if this patch improves the reported positions |
d840a87
to
36df834
Compare
Previously was `in.length - 1`, now `in.length`
36df834
to
47f4e3f
Compare
So this looks internally consistent to me, but the changes in the test files make me wonder. Files are supposed to end in a newline (newlines are terminators, not separators), but the new line does not exist for the users; so it seems that, for such files, the new behavior is worse. So maybe the old behavior was intentional and needs a comment? So, what motivated this fix? |
There are a lot of situations where the source does not end with a new line (as you are writing code in the REPL for example). In these situations, this can lead to trees with invalid positions. For example, the parser makes up a lot of synthetic trees when the input is incomplete: Here is an example: Input: Here is the position of It is very inconvenient not to be able to rely on the position of a tree. I had to workaround this issue in the syntax highlighter: 01768ca. As you pointed out, the new behavior can be seen as a regression when you report incomplete input errors on files ending with a newline. The error used to be reported at the position 21 |class Foo {
| ^
| '}' expected, but eof found and now with a new line: 21 |class Foo {
| ^
| '}' expected, but eof found Maybe when we report errors, we should take into if a file ends with a new line |
Ah, interesting. Without having studied at the code, if we normalized inputs to either always or never end in newline, we'd avoid the issues you described. WDYT? See below for specifics. SpecificsAt least, changing the EOF position to the old one could be easy (untested): if (idx >= buf.length) {
ch = SU
// Pretend we "chomped" away any final newline in the position for EOF token.
// Should work for LF and CR;LF
+ if (buf(idx - 1) == LF) charOffset -= 1
} else { (Tho this mixes CharArrayReader with the *Scanners logic). This leaves the actual newline in the input; removing might be cleaner, but the existing code doesn't seem to mind it.
|
That makes sense to me, because we could also special case the error message, "but eof found" is a bit cryptic, "but the end of the file was reached" would be better. |
Previously was
in.length - 1
, nowin.length