Skip to content

Add support for all token types with the new semantic highlighting #2852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MartinGC94 opened this issue Aug 2, 2020 · 11 comments
Open

Add support for all token types with the new semantic highlighting #2852

MartinGC94 opened this issue Aug 2, 2020 · 11 comments
Labels

Comments

@MartinGC94
Copy link
Contributor

Summary of the new feature

Powershell has 20 tokentypes and it would be nice if the semantic highlighting supported all of them. See: https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.pstokentype

The current semantic highlighting implementation is currently missing the following types:

  • Attribute
  • CommandArgument
  • GroupEnd/GroupStart
  • LineContinuation (backtick)
  • LoopLabel
  • StatementSeparator (Semicolon)

If/When these gets added I would suggest changing the following mappings:

  • Value in Using namespace <Value> should use CommandArgument scope. Currently it uses Function.
  • Attributes like [cmdletbinding()] should use Attribute scope. Currently it uses Type.
  • Names in functions and configurations should use CommandArgument scope. Currently they use the Function scope except functions with no dashes which has no semantic token type.
  • Unquoted parameter values like "C:\SomePath" in Get-ChildItem C:\SomePath should use CommandArgument scope. Currently they use the function scope.

The following screenshot hopefully shows what I mean:
Mapping issue

@ghost ghost added the Needs: Triage Maintainer attention needed! label Aug 2, 2020
@TylerLeonhardt
Copy link
Member

PSTokenType is the old type... It's a Windows PowerShell v2 relic...

Token is the new type:
https://docs.microsoft.com/dotnet/api/system.management.automation.language.token

Which has a TokenKind:
https://docs.microsoft.com/dotnet/api/system.management.automation.language.tokenkind

@MartinGC94
Copy link
Contributor Author

Interesting, I didn't know that. I still think it's worth taking inspiration from PSTokenType when mapping TokenKind to the VS code tokens because 20 tokens is a lot more manageable than 100+ and the old TokenType is basically logical groupings of the newer more granular TokenKind.
Personally I'm mostly interested in getting separate tokens for Attributes and unquoted parameters so I can make them stick out a bit more from everything else.

@TylerLeonhardt
Copy link
Member

TylerLeonhardt commented Aug 2, 2020

The best way to see if we can do anything about this is to run it through the PowerShell parser/tokenizer:

$tokens = $null
$errs = $null
[System.management.automation.language.Parser]::ParseInput($myScriptAsAStr, [ref] $tokens, [ref] $errs)

Then look at $tokens. If there's something distinguishing about Attributes or un-quoted parameters, we can do something!... But if there isn't, then we can't, I'm afraid.

(I didn't test that code, but it should be something along those lines... The docs: https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.language.parser.parseinput?view=powershellsdk-7.0.0#System_Management_Automation_Language_Parser_ParseInput_System_String_System_Management_Automation_Language_Token____System_Management_Automation_Language_ParseError____)

@MartinGC94
Copy link
Contributor Author

Then you can do something :) Attributes have the "AttributeName" token flag which as far as I can tell is completely unique to attributes.
Unquoted parameters have the kind+TokenFlag combination of "Generic" + "None" which would affect the same things I suggested "CommandArgument" would affect in the original post above.

For convenience here's a table showing the relevant types and their Kind+TokenFlag combinations:

Token Kind TokenFlags
Command Generic CommandName
Unquoted Parameter Generic None
Type Identifier TypeName
Attribute Identifier TypeName,AttributeName

@TylerLeonhardt
Copy link
Member

@MartinGC94
Copy link
Contributor Author

I don't know how I would do it. Updating the if statements and/or switch would be simple enough, but none of the standard semantic token types seem to really fit with the Powershell tokens so I would have to define a new type/sub type like it describes here: https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#semantic-token-classification

@TylerLeonhardt
Copy link
Member

Yeah I'd rather not add a new token type... Maybe use one of the token types we don't actually use.

@SydneyhSmith SydneyhSmith removed the Needs: Triage Maintainer attention needed! label Aug 4, 2020
@TylerLeonhardt
Copy link
Member

@SeeminglyScience
Copy link
Collaborator

Type is the most accurate (it is a type, but it is also an attribute). If there's a way to do inheritance or something where you have a new token type called Attribute that falls back to Type if the theme doesn't support it that might be cool. Otherwise I'm not sure it's a great idea to use a classification that isn't accurate (when an accurate one exists).

@ghost ghost added the Needs: Maintainer Attention Maintainer attention needed! label Aug 4, 2020
@rjmholt
Copy link
Contributor

rjmholt commented Aug 5, 2020

If there's a way to do inheritance or something where you have a new token type called Attribute that falls back to Type if the theme doesn't support it that might be cool

The problem is that no popular theme will support it, and we've just increased complexity without fixing the issue. Most people will continue using their Dark+/Monokai/Solarized theme and see attributes and types the same. So it'll end up being like in #2856 (comment), where we correctly categorise the token differently and the colour is the same.

Otherwise I'm not sure it's a great idea to use a classification that isn't accurate (when an accurate one exists).

This is the real problem with themes. The themes themselves tend to be hacks designed to achieve nice coloration in the face of bad token categorisation. Which in turns forces the token classifiers to lie. I seem to recall there was a contentious change in the EditorSyntax repo where tokens were categorised more correctly and people complained because highlighting got worse.

I think the ideal here would be for us to define a correct token type, and a hack fallback, like: attribute -> label. That way the ISE theme could colour things with high specificity but users of popular themes will still get the color differentiation they're looking for.

@TylerLeonhardt
Copy link
Member

@rjmholt if that's possible, I like that plan. I'll probably open an issue on vscode to start this discussion.

Also, I was skimming through the LSP docs and I noticed there's also a section for modifiers:

https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

export enum SemanticTokenModifiers {
	declaration = 'declaration',
	definition = 'definition',
	readonly = 'readonly',
	static = 'static',
	deprecated = 'deprecated',
	abstract = 'abstract',
	async = 'async',
	modification = 'modification',
	documentation = 'documentation',
	defaultLibrary = 'defaultLibrary'
}

I believe these can be added on to the TokenType to provide additional context. At first glance nothing jumps at me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants