Skip to content

Serial monitor output cut short with multibyte (UTF-8) data #9808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthijskooijman opened this issue Feb 26, 2020 · 1 comment · Fixed by #9809
Closed

Serial monitor output cut short with multibyte (UTF-8) data #9808

matthijskooijman opened this issue Feb 26, 2020 · 1 comment · Fixed by #9809

Comments

@matthijskooijman
Copy link
Collaborator

I noticed an issue with the serial monitor, where some data might get stuck and not be displayed in the monitor as expected.

To reproduce, run the following sketch (probably works on any native USB Arduino, I tried on an STM32 board which I had at hand):

void setup() {
  Serial.begin(9600);
  for (uint8_t i = 0; i < 12; ++i) {
    Serial.print("0123456789");
  }
  Serial.print("°0123456789");
  delay(10000);
  Serial.print("X");
}

void loop() { }

This prints just over 128 bytes of data, with a multi-byte character in the first 128 bytes. Then it waits a bit, and prints one more byte.

When running this, the serial monitor will first print:

012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789°0123456

Then, after 10 seconds, it adds the missing part plus the X:

789X

For this to occur, the Arduino must send its data quite fast (hence the native USB requirement). I reproduced this once on a Uno at 2Mbps, but could not reliably do that.

The problem is in the code that converts data from bytes to chars:

byte[] buf = port.readBytes(serialEvent.getEventValue());
int next = 0;
while(next < buf.length) {
while(next < buf.length && outToMessage.hasRemaining()) {
int spaceInIn = inFromSerial.remaining();
int copyNow = buf.length - next < spaceInIn ? buf.length - next : spaceInIn;
inFromSerial.put(buf, next, copyNow);
next += copyNow;
inFromSerial.flip();
bytesToStrings.decode(inFromSerial, outToMessage, false);
inFromSerial.compact();
}
outToMessage.flip();
if(outToMessage.hasRemaining()) {
char[] chars = new char[outToMessage.remaining()];
outToMessage.get(chars);
message(chars, chars.length);
}
outToMessage.clear();

This code reads an arbitrary amount of data using port.readBytes into buf. It then puts as much bytes as possible into inFromSerial, which is then converted to chars which are stored into outToMessage, continuing until outToMessage is full, which is then passed on to the serial monitor, and the entire thing is continued until buf is fully read.

However, in the case simulated with this sketch:

  • buf contains 132 bytes (12 x 01234567890 + °0123456789, not that ° is 2 bytes).
  • On the first inner loop, 128 bytes are read into inFromSerial, which are decoded into 127 characters in outToMessage.
  • On the second inner loop, the remaining 4 bytes are read into inFromSerial, but just one is decoded into outToMessage because then outToMessage is full.
  • These first 128 characters are sent to the serial monitor, as shown above. The last three bytes/characters (789) are still in inFromSerial.
  • The outer loop checks next < buf.length, which is false (since everything has been read from buf, so processing stops.
  • When later the X is sent, that is copied into inFromSerial as well, where the 789 are still waiting, and everything together is decoded and displayed.

To fix this, processing should probably just continue until inFromSerial is empty (or actually until no bytes can be read anymore, since there might be a partial UTF-8 character left in inFromSerial that must be left there until the rest is received).

matthijskooijman added a commit to matthijskooijman/Arduino that referenced this issue Feb 26, 2020
This fixes a problem with the Serial UTF-8 decoder. This decoding moves
data from char[] buf, into a ByteBuffer inFromSerial, then decodes them
into a CharBuffer outToMessage and converts to a char[] to pass on.

When the buf read contained just over a full buffer worth of bytes and
contained some multi-byte characters, a situation could arise where two
decodes were needed to fill up outToMessage, leaving some data in
inFromSerial. If in this case no data would be left in buf, decoding
would stop until more data came in from serial.

This commit fixes this problem by:
 - Changing the outer loop to continue running when buf is empty, but
   inFromSerial is not.
 - Changing the inner loop to run at least once (so it runs when buf is
   empty, but inFromSerial is no).
 - Breaking out of the outer loop when no characters were produced (this
   handles the case where only an incomplete UTF-8 character remains in
   inFromSerial, which would otherwise prevent the loop from
   terminating.
 - Removes a `if (outToMessage.hasRemaining()` check that is now
   necessarily true if the break was not done.

This fixes arduino#9808.
@matthijskooijman
Copy link
Collaborator Author

The same sketch can be used to reproduce this on an MKRZERO as well.

I've submitted a PR to fix this.

cmaglie pushed a commit to matthijskooijman/Arduino that referenced this issue Mar 24, 2020
This fixes a problem with the Serial UTF-8 decoder. This decoding moves
data from char[] buf, into a ByteBuffer inFromSerial, then decodes them
into a CharBuffer outToMessage and converts to a char[] to pass on.

When the buf read contained just over a full buffer worth of bytes and
contained some multi-byte characters, a situation could arise where two
decodes were needed to fill up outToMessage, leaving some data in
inFromSerial. If in this case no data would be left in buf, decoding
would stop until more data came in from serial.

This commit fixes this problem by:
 - Changing the outer loop to continue running when buf is empty, but
   inFromSerial is not.
 - Changing the inner loop to run at least once (so it runs when buf is
   empty, but inFromSerial is no).
 - Breaking out of the outer loop when no characters were produced (this
   handles the case where only an incomplete UTF-8 character remains in
   inFromSerial, which would otherwise prevent the loop from
   terminating.
 - Removes a `if (outToMessage.hasRemaining()` check that is now
   necessarily true if the break was not done.

This fixes arduino#9808.
@cmaglie cmaglie added this to the Release 1.8.13 milestone Mar 24, 2020
cmaglie pushed a commit that referenced this issue Mar 24, 2020
This fixes a problem with the Serial UTF-8 decoder. This decoding moves
data from char[] buf, into a ByteBuffer inFromSerial, then decodes them
into a CharBuffer outToMessage and converts to a char[] to pass on.

When the buf read contained just over a full buffer worth of bytes and
contained some multi-byte characters, a situation could arise where two
decodes were needed to fill up outToMessage, leaving some data in
inFromSerial. If in this case no data would be left in buf, decoding
would stop until more data came in from serial.

This commit fixes this problem by:
 - Changing the outer loop to continue running when buf is empty, but
   inFromSerial is not.
 - Changing the inner loop to run at least once (so it runs when buf is
   empty, but inFromSerial is no).
 - Breaking out of the outer loop when no characters were produced (this
   handles the case where only an incomplete UTF-8 character remains in
   inFromSerial, which would otherwise prevent the loop from
   terminating.
 - Removes a `if (outToMessage.hasRemaining()` check that is now
   necessarily true if the break was not done.

This fixes #9808.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants