-
-
Notifications
You must be signed in to change notification settings - Fork 86
Serial[x].print/write - Could use some major performance enhancement. #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Well, that's a bummer. Thank you @KurtE for finding and reporting this. @facchinm take a look here. There is in fact a safe ring buffer object for transmission but it's not used inside the driver ... |
Mmmh I don't think I'm understanding the issue... |
Sorry, but I believe that is completely wrong. All of the output functions, like Serial1.write(char), Serial1.write(buffer, cnt), Serial1.print(), Serial1.println(), all get funneled to the two virtual methods. That is:
Where Print class has:
If you look at your AVR version of the simple virtual function that outputs one char, it starts off like:
This method, more or less just stuffs the character into software fifo queue... It has some extra stuff, that if the queue is empty and the hardware port output register is empty, it will simply output the data directly to the register... Plus this code will detect if the queue is full and wait until there is room to put that character into the queue. Also it includes the detection that the call was done while interrupts were disabled and deal with it... Last part is another open issue I mentioned earlier. Note: to help a sketch to not have the SerialX.write call to not have to wait, the Stream class has another virtual method: If the user wanted to wait until the output was complete, there is another method: flush(), which should do that.
The first is that the current code is not using the txBuffer. Sorry I am just learning this code base and hardware. But I believe that with all of these layers, you would need to know if you are then into the TX_COMPLETE state...
Note: I am not sure you are going to want to process those two states in the same way. That is maybe in both cases if your FIFO is not empty you would want to queue up new data to the hardware. But for Flush you only want to return after the
will break. Currently some of these issues are masked by other issues. Hope this makes sense, and that I was not too verbose. |
Quick update: I did some more hacking on the Serial.h/cpp files to get a better understanding of what is going on. Also learned a few things: For example: I naively thought that UART1 would be SCI0 and UART2 would be SCI1 and as such would have FIFO queues. Nope: found out that they use SCI2 and SCI9... So no FIFO... Also found that my fix for UART::Write(const ... really showed up problems..
Characters were being tossed left and right. Or more particularly that a character or two might be lossed per Serial.print/write.. For my current running, I have completely neutered the UART::write calls (at least for these). And did similar to your writeRaw call, except instead of wait for each write to complete, I wait at the start for the TDRE condition...
Flush was still broken, so I hacked that as well:
Note the code changes completely don't use interrupts, so won't hang if called in an ISR. The loop code is only slightly changed to add a flush()
Which appears to work. These current things are up in my own fork/branch(SerialX_fixes) if anyone wishes to play along. Next up see how to use your TX Queue. And then decide if I will do a PR... Still uncomfortable to sign any legal document, which states that this is my Original work... As none of this is overly original... But that is a different topic. |
KurtE makes very good points about the UART performance. Having worked on many different boards, keep it simple, I would neglect the availability of the FIFO in SC0 and SC1 (just disable it) and always use a TX ring buffer approach. That is what I see many other board libraries do as well: keep clean, easy to maintain, and straightforward. |
@paulvha - thanks for the feedback. I also have worked on lots of different boards as well and have implemented or updated or suggested updates, to the Serial/UART processing in several of them including Teensy, Trossen, Robotis... I have a Draft PR (#59) which so far, I have only touched the TX side, and it is now using the ring buffer that was already built into the __ UART __ objects. I still try to make use of the FIFO if available. In the right places, it should considerably cut down on the number of interrupts needed by the device. That is for example, if the user code outputs, several lines of output, like 256 characters, without the FIFO, the device will generate something like 256 interrupts. With the fifo maybe only something like 16, which is a good win. And on the RX side, I like having FIFO in place as to reduce the likelihood of losing RX data. On the WIFI board only the connection between the board and the ESP32 for Serial2 (Wifi and BT) is on SCI1, which has the FIFO, and hard to experiment as much on that connection. But I can create a UART object on pins 18,19, which is on SCI0. Which so far my changes appear to be working. I have put some of my work(play) on hold, waiting for some feedback on things like is it allowed to bypass the FSP code? If so should I try to totally bypass it for the UART code, or is it OK to pick and choose. But i the mean time I might experiment, with talking to Dynamixel servos at 1MBS maybe higher if supported. Thanks again for you input. |
Fully agree that a TX-buffer approach should be applied. I understand that you prefer to continue to use the HW FIFO, still I wonder how much time benefit that will bring and rather go for simplicity and a common UART approach... but it is a choice :-) Changing Serial (D1 = P302 & D0 = P301) to use SCI0 instead of SCI2 does not require a change in FSP code. It is defined in the file : variants/UNOWIFIR4/pinmux.inc
and line 111 change
|
Thanks for the feedback:
To me this is similar to the Teensy 3.x boards, for example T3.2. The first two USARTS have fifo capability, others do not. These are used for Serial1 and Serial2 (USB is its own sub-system). And it is far better to use these two Serial objects for high speed data. Like the Robotis servos which I was testing yesterday at 1MBS. In some cases, I have bumped up the servo baud rates to 2 or 3mbs, I have tried up to 3.5.
I could be wrong: But I think which SCI channel is used is dictated by the hardware.
For what it is worth, my excel document I mentioned is up at https://github.com/KurtE/UNOR4-stuff/blob/main/arduino_uno_r4.xlsx Which I used Excel to extract from the Renesas RA4M1 Datasheet Table 1.7 pages 23-26 |
Hi Kurt I had seen (and downloaded) your spreadsheet already. Thanks for making and sharing that !!! The way it works is that in serial.cpp, when you do begin(), it will call for cfg_pins(). Here based on the provided pins (D0, D1), the real processor port & pin will be obtained for a structure defined in variant.cpp in the variants/UNOWIFIR4-folder. It will also obtain a list of pin options (defined in pinmux.inc in the same variants folder). That channel will then be used in begin() to initialize and use as SCI-channel. regards, |
I am going to correct myself. Having studied the datasheet a bit more in detail, SCI0 can only be applied to a limited number of port / pin (PSEL as part of PSF-register) SCI0 -RX : P101 (D18 / A4/SDA), P104 (D2), P206 (Led Matrix), P410 (NC) The combination of the pins D18/D19 as you used before will work, as well as using D2/D11. So the earlier mentioned change to pinmux.inc will not work. regards, |
This issue is a bug, not a potential "enhancement". And a rather hefty one IMO. |
It appears like any calls to Serial.print(), including Serial.println and likewise Serial.write(buffer, cnt), will remain in these calls, until the last byte is queued to the underlying UART.
Here is a simple sketch:
Note, I instrumented the UART::write methods in Serial.cpp:
Likewise the callback method:
Here are some images of the Logic analyzer output.

Observations:
it appears like in all cases, the write method: size_t UART::write(uint8_t c)
is being called, I would have expected the version: size_t UART::write(uint8_t* c, size_t len)
to be called for the print of a string as well as the simple write of a buffer.
I would have expected that there would be some form of software FIFO queue, like is on most of the other platforms, and only have to wait if you are writing more data than will fit. Would also expect that availableForWrite)_ would be implemented to give you a clue of how much I can write.
Hardware FIFO? I believe the first two UARTS have 16 byte queues? If so why does it look like only one bytes at a type is sent...
Maybe part of 1).
Why are the write methods waiting at the end at all? Especially when they are waiting for an ISR to be called to set the tx_done?
The only wait that should happen is if the hardware and/or software queues are full at the start of the call, should they wait for space to become available. And this waiting code should understand and deal with if the code calling write has a higher priority than the Serial UART ISR handler, but that is a different issue Serial.print() - from Interrupt callbacks hangs the processor. #38
Maybe more as I try to better understand all of the layers of this code.
The text was updated successfully, but these errors were encountered: