Skip to content

Fix letterOrDigitNoDollarSigns regex #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 29, 2021

Conversation

MaximeKjaer
Copy link
Contributor

@MaximeKjaer MaximeKjaer commented Jan 28, 2021

On master, this regex evaluates to [[A-Z\\p{Lt}\\p{Lu}_a-z\\p{Lo}\\p{Nl}\\p{Ll}0-9]]. The way this regex should be parsed is that the first [ opens a character set which includes [. The first ] then closes the character set. The character set should then be followed by ]. In simpleInterpolatedVariable, the * then applies to the second ].

This PR fixes that to remove the duplicated [ and ].

Here's something weird though: I cannot find an example where the current version produces a wrong highlighting with this regex. The simpleInterpolatedVariable regex is currently [A-Z\\p{Lt}\\p{Lu}_a-z\\$\\p{Lo}\\p{Nl}\\p{Ll}][[A-Z\\p{Lt}\\p{Lu}_a-z\\p{Lo}\\p{Nl}\\p{Ll}0-9]]* on master. Running this in the command line confirms that the description of the regex I gave above is correct. However, when creating a test for this (and looking at the highlighting in VS Code), I cannot reproduce a bug. I would expect the following to fail on master, but it doesn't...

   s"$a[ $bc]"
// ^ keyword.interpolation.scala
//  ^ punctuation.definition.string.begin.scala
//   ^ punctuation.definition.template-expression.begin.scala
//    ^ meta.template.expression.scala
//     ^^ source.scala string.quoted.double.interpolated.scala
//       ^ punctuation.definition.template-expression.begin.scala
//        ^^ meta.template.expression.scala
//          ^ source.scala string.quoted.double.interpolated.scala
//           ^ punctuation.definition.string.end.scala

However, the above highlighting is wrong on GitHub, exactly in the way I would expect it to be based on the bug that this PR fixes. To see this in action:

s"test $test test $test] $te] $te]]]]]]"

image

@MaximeKjaer
Copy link
Contributor Author

Note that if you change letterOrDigitNoDollarSign to the following, all our tests still pass. I would expect at least some tests to fail.

const letterOrDigitNoDollarSign = `[[[[[${letterOrDigit.replace("\\$", "")}]]]]]`

@nicolasstucki nicolasstucki merged commit 8dee609 into scala:master Jan 29, 2021
@MaximeKjaer MaximeKjaer deleted the fix-regex branch January 29, 2021 14:03
@MaximeKjaer
Copy link
Contributor Author

I think I've found out why there is a difference in the output of VS Code and GitHub for this example.

The VS Code documentation for syntax highlighting mentions that they use Oniguruma regexes. In that documentation, in section 20.3 A-4, they mention that "[ must be escaped as an usual char in character class"; this is unlike other regex engines (e.g. RegExp in JS allows non-escaped [ in a character set).

My guess about that what happened here is:

  • GitHub are not using Oniguruma regexes, but are using another regex engine which allows non-escaped [ in a character set. This caused a wrong highlighting on GitHub, which this PR fixed.
  • Oniguruma treats [[x]] as a single character set, instead of a character set containing [ and x followed by ], hence there is no highlighting error in VS Code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants