-
-
Notifications
You must be signed in to change notification settings - Fork 7k
Adding encoding support to serial monitor #4801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding encoding support to serial monitor #4801
Conversation
Missed as spot in NetworkMonitor. More general issue is that the Message interface talks in strings, and should not be being used among things that want to talk in bytes.
I'd consider using Windows-1252 rather than ISO-8859-1, as this one replaces the control characters from 128 to 159 with printing characters (except for 5 of them), so it's mostly a superset of it (not counting the control characters). |
Keeping branch up to date
Re: Windows-1252 ISO-8859-1 is defined in the java spec as always being available. Every JVM must support it, and that's why I pulled it out and put it at the top with the others. Java 8 java.nio.charset.Charset docs. I suppose it might be fair to search to see if Windows-1252 is available and to put it at the top if it is, but then we have to start asking "so, how many others go at the top?" What we really want is to remember the user's preferred encoding in settings. That wouls take care of the "I have to scroll through this enormous list" problem. But that can be added as another merge once this is in. Additionally, encodings are only important when the arduino is sending and receiving characters in a particular encoding. The only case where I can think where an arduino is generating Windows-1252 is when someone is producing HTML pages meant to work nicely with windows in particular. Really - encoding isn't something that people need all that much. But it's a nice-to-have, it's currently missing, and people ask questions occasionally about accented characters over Serial. Having encoding selectable is more about education than utility. But then again: I don't speak European or Asian :) . Maybe I'm wrong about how needed it is. Maybe it's going to be a life-saver for a lot of people. |
Re: byte order mark Yes, it would be nice for the serial monitor dooverlacky to treat the incoming and outgoing more as a stream - to have an encoder object that hangs around for as long as the encoding is as it is. I didn't do it that way because too hard, and also because it doesn't play nice with the notion of line ending marks, If this was software sending megabytes down the pipe, then yes it would be an issue. But that's not what serial monitor is really for. It might, however, be a bit of a concern the other way around. The BOM is required at the start of a UTF-16 stream, but bytes read from an arduino are put into blocks at the IDE end and treated as separate messages. This means that UTF-16 and other encodings that auto-detect will not work properly when an arduino sends an extended or bursty stream to the serial. But this doesn't break anything new. It could be made a known issue and put on the list. |
I tried this example (on an Arduino Due): void setup() { Serial.begin(9600); }
void loop() { Serial.println("한글"); delay(1000); } but this PR doesn't fix the character receiving encoding problem (from time to time I get garbage). Before adding support for different types of encodings, we should really fix the rx issue that is much more important, otherwise what's the purpose of setting a different encoding when the serial monitor is not able to correctly decode it? |
Besides the problem above, I'm wondering how much wide is the use case for this PR? IMHO it's not worth adding two big list box (with some settings that are obscure to the majority of our users) to set encoding for TX and RX. In this case seems more appropriate to use a proper serial terminal, like Putty, where the character encoding is supported together with a lot of other options/features that will never fit into the small serial monitor tool of the Arduino IDE. |
I'm closing this for lack of feedback and since conflicts are present. Please reopen this if needed |
This pull request adds enoding support to the serial monitor.
The core Serial class is altered to work primarily with java bytes rather than java chars. However, the existing methods are retained for backward compatibility.
Combo boxes for Tx and Rx character sets are added. The serial monitor works in the IDE on my mac, and was tested with a simple echo sketch on an Arduino Uno. As expected, when accented characters are typed into the serial monitor and sent
Tx: US-ASCII, Rx: US-ASCII, accented characters are replaced with qmarks by the encoder
Tx: US-ASCII, Rx: UTF-8, accented characters are replaced with qmarks by the encoder
Tx: UTF-8, Rx: US-ASCII, accented characters are recieved back as two bytes of rubbish
Tx: UTF-8, Rx: UTF-8, accented characters come through correctly
Note that line breaks do odd things when, for instance, UTF-16 is selected as the encoding, as does sending one character at a time when transmitting US-ASCII and receiving UTF-16. This is correct … ish. I did not implement a full encoding/decoding buffer. The messages are assumed to be complete as they arrive. This may cause glitches when reading bulk UTF-8 output from the Arduino, and may need to be addressed.
I don't know how to exercise the other classes - Network Monitor, Serial Plotter.
A difficulty is the use of the MessageConsumer interface. Messages are text, not bytes, but NetworkMonitor used that interface to do its job. I am not sure that my handling is correct, although I have tried to do the sane thing.