Fix keyword loading to use any whitespace as separator #6693

bhagman · 2017-09-04T05:07:40Z

Instead of forcing keywords.txt to use tabs, let library developers use spaces too.

bperrybap · 2017-10-15T01:00:46Z

So is this finally going to be fixed after years of neglect?

facchinm · 2017-10-16T07:19:04Z

This PR was merged in beta-1.9 branch as 13824d7 , if no user will report any problem we are going to proceed merging it into master soon.

cmaglie · 2017-10-20T10:01:03Z

merged in eed9e70

cmaglie · 2018-02-19T11:57:06Z

@bhagman @bperrybap
unfortunately it's not as easy as it seems, the "old" keyword.txt format uses \t as a separator (not as a whitespace), if a row contains two consecutive tabs like A\t\tB it means that there are 3 fields: "A", empty string "" and "B".

Some examples are:

operator	LITERAL1		RESERVED_WORD_2
enum	LITERAL1		RESERVED_WORD_2
delete	LITERAL1		RESERVED_WORD_2
bool	LITERAL1		RESERVED_WORD_2
double	LITERAL1		RESERVED_WORD_2
null	LITERAL1		RESERVED_WORD_2
NULL	LITERAL1		RESERVED_WORD_2
new	LITERAL1		RESERVED_WORD_2
private	LITERAL1		RESERVED_WORD_2
protected	LITERAL1		RESERVED_WORD_2
public	LITERAL1		RESERVED_WORD_2
short	LITERAL1		RESERVED_WORD_2
signed	LITERAL1		RESERVED_WORD_2
true	LITERAL1		LITERAL_BOOLEAN
unsigned	LITERAL1		RESERVED_WORD_2
word	LITERAL1		RESERVED_WORD_2
Keyboard	KEYWORD1		DATA_TYPE
Mouse	KEYWORD1		DATA_TYPE
override	KEYWORD3		RESERVED_WORD
final	KEYWORD3		RESERVED_WORD
goto	KEYWORD3		RESERVED_WORD
throw	KEYWORD3		RESERVED_WORD
try	KEYWORD3		RESERVED_WORD
export	KEYWORD3		RESERVED_WORD

this is the IDE before the patch:

and this is after the patch:

as you can see the color coding is not taken but instead is interpreted as a reference-link (3rd position instead of 4th) this is confirmed if I right-click on the operator word and select "Find in reference":

this failure is also reported by the PdeKeywordsTest test:

java.lang.AssertionError: expected:<IncrementCompound> but was:<null>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotEquals(Assert.java:743)
	at org.junit.Assert.assertEquals(Assert.java:118)
	at org.junit.Assert.assertEquals(Assert.java:144)
->	at processing.app.syntax.PdeKeywordsTest.testKeywordsTxtParsing(PdeKeywordsTest.java:49)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:86)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:539)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:761)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:461)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:207)

I'm going to revert this patch, until we find a different solution.

At this point I don't know if a backward-compatible solution may be find, maybe the best way forward is to keep the current format and provide a better documentation?

This reverts commit eed9e70. See #6693

bperrybap · 2018-02-19T18:04:07Z

@cmaglie
I'm not understanding this:

if a row contains two consecutive tabs like A\t\tB it means that there are 3 fields: "A", empty string "" and "B"

Regardless of how the current code is parsing it, shouldn't a line like that actually mean two fields "A" and "B" ?

So are you saying that the keywords file format intentionally supports the ability to skip fields or just that the current parsing code to parse it is broken?

Off topic but, but your comments seem to suggest that there are fields that I'm not familiar with.
Is there any documentation on the keywords file?
I've not been able to find any.

Regardless of the number of fields in the keywords file,
it would seem better for the keywords file format to NOT support blank/null fields and use whitespace as a field separator.
Supporting null/blank fields is mess and forever precludes the ability to use whitespace as a field separator.

When using whitespace as a field seperator, you won't have empty fields anymore since whitespace means 1 or more of either or

It appears that the patch used didn't use whitespace as a separator but rather simply used a single space or a single tab which is not the same as whitespace.

cmaglie · 2018-02-19T18:15:22Z

So are you saying that the keywords file format intentionally supports the ability to skip fields or just that the current parsing code to parse it is broken?

Yes, there is the ability to skip a field using two tabs as in my A\t\tB example. This is how the parser works now, it's counter-intuitive I know, in fact I discovered it after merging this PR.

We can argue if this is nice or bad, but this is not something we can change now because hundreds of keywords.txt are based on that parsing, so we should support the old format.

Anyway let's continue the discussion in #7245 that looks more promising.

bperrybap · 2018-02-19T18:22:50Z

What is most important is the intended file format not the way the code may be parsing it.
Was the old format intentionally allowing blank fields? And that old format must be supported?
If so, then it is over since you can never use whitespace as a field separator.
IMO, supporting blank fields is a bad idea.

per1234 · 2018-02-19T18:26:21Z

The only documentation I know of is here:
https://github.com/arduino/Arduino/wiki/Arduino-IDE-1.5:-Library-specification#keywords
and a brief mention here:
https://www.arduino.cc/en/Hacking/LibraryTutorial

Note they both do mention a tab separator. After looking at hundreds of keywords.txt files I can tell you that the use of spaces instead of tabs is very common (~15% of libraries). Of the rest, the great majority use multiple tabs as a separator (even though the documentation doesn't specify that being supported) and disturbingly high number have a random mixture of tabs and spaces.

I don't understand the purpose of the multiple identifiers per keyword used in the IDE's global keywords.txt. If relevant to libraries, it would be nice if that usage was documented on the library specification page. I'm happy to do the writing but can't without understanding how it works.

bperrybap · 2018-02-19T19:18:04Z

It is really messy since depending on the version of the IDE the parsers were changed/modified which alters the expected format and what is supported.
I think the library spec page should be updated to reflect fields supported and the expected format, even if the format (field separators) actually supported under the hood is a bit more liberal.
Currently, I've not found any information other than Cristian's recent comment here about the file format and fields supported:
#7245 (comment)

References: - #1 - arduino/Arduino#6693

CieNTi · 2019-05-28T21:43:02Z

Thanks to @per1234 to point me to this issue.

I didn't realize about the fact that is completely mandatory, inside keywords file, to use tabulator as member separator. First by the fact that the IDE provides a poor variety of code editors capabilities, and as I'm used to Sublime Text, it add 0 work to see a function as a function, or a variable as a variable.

Being a tabulator a (extremely old) type of CSV file, and giving tons of headaches along lot and lot of years because it's a width-variable and non-visible character. Why is still in use, and moreover as a unique and a mandatory way to create keywords in Arduino?

As a CSV-like file (which means positional members with the error-prone that it directly inferes), Why is not at least selectable which character to use for fields separators? If keywords does not allow spaces, what is the sense of use a non-space character for that?

There is a lot of ways to define keywords, in use in top editors:

Not a single one use positional fields for that.

And just offtopic (as we're not talking about indentation), but nice to read:

Come on Arduino! Great devices, great software, great libraries ... but this as the unique way for syntax highlight of custom keywords?

My support to @per1234, hard work he does!

bperrybap · 2019-05-28T22:18:57Z

In a bit over 35 years of using s/w on many operating systems. The only tool and files that I have ever seen that have used such a rigid format for separators is make. makefiles are stupidly rigid and use a single tab field as separator for rules.

The format of the keyword file is even worse, in that not only is it using an invisible field separator, but the format of the file is not formally defined and its format has changed over the years making it impossible to create something that is fully compatible with all possible file formats from the past.

IMO, it is time to move to a new/extended format.
A format that is simple and mostly compatible with the existing format.

My suggestion would be to put a version #, key, or identifier on the first line so that the new format could be detected and then used.
If the identifier is not there, fall back to the older/current form of parsing.

The new format should be VERY liberal, and allow whitespace or a comma as field separators and NOT allow blank/missing/skipped fields.

That way, nearly everybody should be able to get something they like and get things to look "pretty" in the file.
single tab, multiple tabs, single space, multiple spaces, single comma, or any combination of those would be a single field separator.
i.e.

<tab><space><space><tab><comma>

would be a single field separator.

If you want/need to skip a field or have a null field, use commas and put nothing between the two commas for that field.
That way it is obvious that the field is not being used.

The point is to be very liberal in the format to support a few different styles of field separation all while keeping the parsing very simple.
But mainly to totally eliminate the use of hidden/invisible non used fields which is what is creating all the issues.

bperrybap · 2019-05-28T22:28:44Z

Oh and once there is a version identifier in the file, it would be easy to change formats in the future and or add support for additional formats.
The other thing to consider is that this file is not really intended for end user consumption so it doesn't necessarily have to be a simple field separated format like it is today.

Fix keyword loading to use any whitespace as separator

7f2e52a

Instead of forcing keywords.txt to use tabs, let library developers use spaces too.

bhagman mentioned this pull request Sep 4, 2017

Use correct separator in keywords.txt bhagman/Tone#6

Open

facchinm added the feature request A request to make an enhancement (not a bug fix) label Sep 4, 2017

facchinm added this to the Next milestone Sep 7, 2017

per1234 mentioned this pull request Oct 14, 2017

Multiple white spaces can not be used in keywords.txt #2435

Closed

cmaglie closed this Oct 20, 2017

cmaglie modified the milestones: Next, Release 1.8.6 Oct 20, 2017

cmaglie removed this from the Release 1.8.6 milestone Feb 19, 2018

cmaglie added a commit that referenced this pull request Feb 19, 2018

Revert "Fix keyword loading to use any whitespace as separator"

0175a4e

This reverts commit eed9e70. See #6693

bhagman mentioned this pull request Feb 19, 2018

Allow janky empty fields within double tabs in keywords.txt files #7245

Closed

mfalkvidd mentioned this pull request Apr 1, 2018

TinyGSM Gateway for MySensors mysensors/MySensors#1097

Merged

per1234 mentioned this pull request May 28, 2019

Use correct field separator in keywords.txt CieNTi/arduino-IoTesla-client#1

Closed

CieNTi added a commit to CieNTi/arduino-IoTesla-client that referenced this pull request May 28, 2019

Removing keywords.txt until TAB is not required anymore

5128a80

References: - #1 - arduino/Arduino#6693

per1234 added the Type: Invalid Off topic for this repository, or a bug report determined to not actually represent a bug label Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix keyword loading to use any whitespace as separator #6693

Fix keyword loading to use any whitespace as separator #6693

bhagman commented Sep 4, 2017

bperrybap commented Oct 15, 2017

facchinm commented Oct 16, 2017

cmaglie commented Oct 20, 2017

cmaglie commented Feb 19, 2018

bperrybap commented Feb 19, 2018

cmaglie commented Feb 19, 2018

bperrybap commented Feb 19, 2018

per1234 commented Feb 19, 2018

bperrybap commented Feb 19, 2018

CieNTi commented May 28, 2019 •

edited

Loading

bperrybap commented May 28, 2019 •

edited

Loading

bperrybap commented May 28, 2019

Fix keyword loading to use any whitespace as separator #6693

Fix keyword loading to use any whitespace as separator #6693

Conversation

bhagman commented Sep 4, 2017

bperrybap commented Oct 15, 2017

facchinm commented Oct 16, 2017

cmaglie commented Oct 20, 2017

cmaglie commented Feb 19, 2018

bperrybap commented Feb 19, 2018

cmaglie commented Feb 19, 2018

bperrybap commented Feb 19, 2018

per1234 commented Feb 19, 2018

bperrybap commented Feb 19, 2018

CieNTi commented May 28, 2019 • edited Loading

bperrybap commented May 28, 2019 • edited Loading

bperrybap commented May 28, 2019

CieNTi commented May 28, 2019 •

edited

Loading

bperrybap commented May 28, 2019 •

edited

Loading