-
-
Notifications
You must be signed in to change notification settings - Fork 7k
Input Method to compose complex characters #2430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm, I wonder if this is something that the Arduino code can influence, or if we're just dependent on Java to do the right thing here... |
I believe these 2 problems are related with sources(source codes) processing Java Input Method. So it will not influence the Arduino code. |
This should be fixed with the new editor, available with the latest hourly build http://www.arduino.cc/en/Main/Software#hourly |
May you cut&paste your sketch here? |
Nevermind, I reproduced (more or less) the issue with this sketch on an Arduino Due: void setup() { Serial.begin(9600); }
void loop() { Serial.println("한글"); delay(1000); } Before giving false expectations let me say that the strings functions in Arduino are designed to work with plain ASCII characters, so if you try to use UTF8 characters it may work in simple cases but you may encounter random faulty behaviours on more complex sketches for example if you try to concatenate two strings or extract a substring from a bigger one. Said that, it happens that the above sketch works if I connect to the serial port with an external terminal program like Putty but it prints random garbage with the serial monitor of the Arduino IDE. So my conclusion is that something weird is happening on the Arduino Serial Monitor. My guess is that the issue is in how the incoming chars are buffered here: byte[] buf = port.readBytes(serialEvent.getEventValue());
if (buf.length > 0) {
String msg = new String(buf);
char[] chars = msg.toCharArray();
message(chars, chars.length);
} an UTF8 char may be composed of many bytes, and the String object can extract the correct UTF8 char only if a complete UTF8 char is received in one single read. If the a multi-byte UTF8 char is fragmented the two consecutive calls to String constructor are not able to build the correct character. This is a tricky issue, because JSSC doesn't implement the InputStream interface but, instead, has this weird readBytes() method that returns an array of bytes. See https://github.com/scream3r/java-simple-serial-connector/issues/17 The best fix would be to implement an InputStream interface in JSSC and feed the InputStream into an InputStreamReader or a BufferedReader that will do all the correct buffering and decoding. An alternative is to write an anonymous-InputStream wrappen around the JSSC's Serial object to obtain the same result. |
Here is the sketch:
|
With UTF-8 it is possible to detect whether a "chunk" of bytes ends in a single-byte (ASCII) character or a multi-byte sequence, and it is relatively easy to manually check whether this multi-byte sequence is complete or not (also the number of bytes that are in this chunk and the number of bytes that are missing). Therefore if a chunk ends in an incomplete multi-byte sequence, this sequence could be stripped and "saved for later", either "pushed back" with something like C's This involves the serial monitor being a bit smart though; plus the fix I'm mentioning is specific to UTF-8. If the InputStream solution is easy to implement and already takes care of this, it's probably a better solution. |
Yeah - the issue is a design one. The serial monitor uses this "message" interface that works with strings, because when sending stuff via the monitor you type something and then hit return. But this doesn't work for receiving bytes. The "message" model is inappropriate for the serial monitor altogether. |
As noted over at 4452: The String-constructor documentation advises to use a I think that's good advice. This would give control over the encoding used, which is the point of arduino/arduino-ide#1728 . Clean UTF-8 decoding even in the split character case is also a feature included in So using this would be an easy fix, with no need to completely redo the "message" model. (FWIW: I think that model is not that bad a choice, actually.) |
There are 2 bugs to use Korean characters:
The text was updated successfully, but these errors were encountered: