Skip to content

Adding encoding support to serial monitor #4801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

PaulMurrayCbr
Copy link

This pull request adds enoding support to the serial monitor.

The core Serial class is altered to work primarily with java bytes rather than java chars. However, the existing methods are retained for backward compatibility.

Combo boxes for Tx and Rx character sets are added. The serial monitor works in the IDE on my mac, and was tested with a simple echo sketch on an Arduino Uno. As expected, when accented characters are typed into the serial monitor and sent

Tx: US-ASCII, Rx: US-ASCII, accented characters are replaced with qmarks by the encoder
Tx: US-ASCII, Rx: UTF-8, accented characters are replaced with qmarks by the encoder
Tx: UTF-8, Rx: US-ASCII, accented characters are recieved back as two bytes of rubbish
Tx: UTF-8, Rx: UTF-8, accented characters come through correctly

Note that line breaks do odd things when, for instance, UTF-16 is selected as the encoding, as does sending one character at a time when transmitting US-ASCII and receiving UTF-16. This is correct … ish. I did not implement a full encoding/decoding buffer. The messages are assumed to be complete as they arrive. This may cause glitches when reading bulk UTF-8 output from the Arduino, and may need to be addressed.

I don't know how to exercise the other classes - Network Monitor, Serial Plotter.

A difficulty is the use of the MessageConsumer interface. Messages are text, not bytes, but NetworkMonitor used that interface to do its job. I am not sure that my handling is correct, although I have tried to do the sane thing.

Missed as spot in NetworkMonitor.

More general issue is that the Message interface 
talks in strings, and should not be being used 
among things that want to talk in bytes.
@cousteaulecommandant
Copy link
Contributor

I'd consider using Windows-1252 rather than ISO-8859-1, as this one replaces the control characters from 128 to 159 with printing characters (except for 5 of them), so it's mostly a superset of it (not counting the control characters).
By the way, isn't "plain" UTF-16 just like UTF-16BE/LE but with a byte order mask at the beginning? Isn't it a bad idea to send a BOM every time you send a line?

@PaulMurrayCbr
Copy link
Author

Re: Windows-1252

ISO-8859-1 is defined in the java spec as always being available. Every JVM must support it, and that's why I pulled it out and put it at the top with the others.

Java 8 java.nio.charset.Charset docs.

I suppose it might be fair to search to see if Windows-1252 is available and to put it at the top if it is, but then we have to start asking "so, how many others go at the top?" What we really want is to remember the user's preferred encoding in settings. That wouls take care of the "I have to scroll through this enormous list" problem. But that can be added as another merge once this is in.

Additionally, encodings are only important when the arduino is sending and receiving characters in a particular encoding. The only case where I can think where an arduino is generating Windows-1252 is when someone is producing HTML pages meant to work nicely with windows in particular.

Really - encoding isn't something that people need all that much. But it's a nice-to-have, it's currently missing, and people ask questions occasionally about accented characters over Serial. Having encoding selectable is more about education than utility. But then again: I don't speak European or Asian :) . Maybe I'm wrong about how needed it is. Maybe it's going to be a life-saver for a lot of people.

@PaulMurrayCbr
Copy link
Author

Re: byte order mark

Yes, it would be nice for the serial monitor dooverlacky to treat the incoming and outgoing more as a stream - to have an encoder object that hangs around for as long as the encoding is as it is. I didn't do it that way because too hard, and also because it doesn't play nice with the notion of line ending marks,

If this was software sending megabytes down the pipe, then yes it would be an issue. But that's not what serial monitor is really for.

It might, however, be a bit of a concern the other way around. The BOM is required at the start of a UTF-16 stream, but bytes read from an arduino are put into blocks at the IDE end and treated as separate messages. This means that UTF-16 and other encodings that auto-detect will not work properly when an arduino sends an extended or bursty stream to the serial.

But this doesn't break anything new. It could be made a known issue and put on the list.

@cmaglie
Copy link
Member

cmaglie commented Apr 28, 2016

I tried this example (on an Arduino Due):

void setup() {  Serial.begin(9600); }
void loop() {  Serial.println("한글");  delay(1000); }

but this PR doesn't fix the character receiving encoding problem (from time to time I get garbage).
For reference here my analisys of the problem: #2430 (comment)

Before adding support for different types of encodings, we should really fix the rx issue that is much more important, otherwise what's the purpose of setting a different encoding when the serial monitor is not able to correctly decode it?

@cmaglie
Copy link
Member

cmaglie commented Apr 28, 2016

Besides the problem above, I'm wondering how much wide is the use case for this PR?

IMHO it's not worth adding two big list box (with some settings that are obscure to the majority of our users) to set encoding for TX and RX. In this case seems more appropriate to use a proper serial terminal, like Putty, where the character encoding is supported together with a lot of other options/features that will never fit into the small serial monitor tool of the Arduino IDE.

@cmaglie cmaglie added the Waiting for feedback More information must be provided before we can proceed label Apr 28, 2016
@agdl
Copy link
Member

agdl commented Jul 12, 2016

I'm closing this for lack of feedback and since conflicts are present. Please reopen this if needed

@agdl agdl closed this Jul 12, 2016
@per1234 per1234 added Component: IDE Serial monitor Tools > Serial Monitor feature request A request to make an enhancement (not a bug fix) labels Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: IDE Serial monitor Tools > Serial Monitor feature request A request to make an enhancement (not a bug fix) Waiting for feedback More information must be provided before we can proceed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants