-
Notifications
You must be signed in to change notification settings - Fork 74
No recovery after Timed out waiting for SPI char #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for the example! I'm hoping to look at the ESP32SPI stuff today. I'm reworking requests a bit and need to test them together. |
Note: I'm reworking a lot of these APIs. I talk the changes over on my stream here: https://www.youtube.com/watch?v=jfRjlu3oCDA (Newer streams will show more progress.) |
FYI, there's a post which happens to mention |
And a question about how to handle/recover in an application: Adafruit Forums: How to gracefully handle "Timed out waiting for SPI char"?. |
I'm hitting the same issue while using the adafruit_minimqtt library:
Attempting to to catch the exception and doing wifi.reset() followed by mqtt.reconnect() does Looking a the code: Adafruit_CircuitPython_ESP32SPI/adafruit_esp32spi/adafruit_esp32spi.py Lines 233 to 238 in fce466b
I wonder how it came up with 1 second being the max time we should wait for. The overall timeout value in mqtt is 30 seconds, so I would think that waiting more than a sec should not be a big deal. @tannewt : Is there a particular log/info I should try to fetch in order to see how we can fix this issue? I also wonder if the Thanks, -- flaviof |
Not that I can think of. There is a lot of complexity here between the ESP32SPI driver and the nina firmware that I don't know how to simplify. I think your idea to be more resilient is a good one. Networking can cause errors at any time and that's tricky to handle well. |
Ack. I opened an issue in PyPortal : adafruit/Adafruit_CircuitPython_PyPortal#98 . I hope to be able to solve this from that angle. I see you are also very active there, so any help will be much appreciated. :) Thanks! |
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets as part of _wait_for_msg(), together with all other MQTT messages; - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets as part of _wait_for_msg(), together with all other MQTT messages; - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
@kevinjwalters @tannewt ^^ ;) |
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets as part of _wait_for_msg(), together with all other MQTT messages; - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets as part of _wait_for_msg(), together with all other MQTT messages; - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets as part of _wait_for_msg(), together with all other MQTT messages; - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wrapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets as part of _wait_for_msg(), together with all other MQTT messages; - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wrapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets via _wait_for_msg(); - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
What's the status of this issue? Is there anything I can do to move this forward? |
AFAIK, this was actually an issue in dafruit_CircuitPython_MiniMQTT, where receive code was expecting
I could be out of context here if this is not related to CircuitPython_MiniMQTT. Apologies, if that it the case. |
Got it. I installed from master. Now I just need to find out how to fix |
You may want to upgrade your ESP32 Airlift's firmware. |
@brentru that worked! Thanks! |
What's the workaround for this? Currently having this issue as well - sending multiple GET requests often leaves me with |
Update all of your libraries, install the CircuitPython beta, and update your ESP32 firmware. |
Sorry, the symptoms are very slightly different but the test code above still fails in the same way. The only difference I can see is that first timeout on the bad URL produces the message "ESP32 not responding" but the stack is still broken and unrecoverable. Have changes gone into the beta yet? The underlying network stack doesn't seem to be anymore reliable either. My application still needs to reboot every few minutes to deal with this (I'm polling internal data endpoints every 10s). |
Are you sure it's unrecoverable? What happens if you try to catch it and retry? |
Yes, I am. My production code catches and retries 5 times and then gives up and reboots. Can you please have a look at the example I posted when I raised the issue. Why does the second request fail? |
@flavio-fernandes I am right in understanding that you closed this issue because a defect in the mqtt library was fixed? |
same with me not using the mqtt i use PyPortal fetch |
Yes, kinda. ;) From my understanding, the changes on how SPI handles receive in PR112 affected CircuitPython_MiniMQTT. Moreover, @tannewt convinced me that the handling due to that change is best if done at a higher layer, which ultimately became the issue 54 in CircuitPython_MiniMQTT that I worked on. I apologize for the confusion this may have caused. It sounds to me that there are still other libraries/code that remains broken due to that change. Please reopen this bug if that is the case. I hope this gets resolved for all cases. |
I can't reopen it. |
I have been plagued by the 'Timed out waiting for SPI char' exception for a long time, both on a Matrix Portal and PyPortal. To try to understand what was going on, I added code to ESP_SPIcontrol._wait_spi_char to increase the timeout to 1 second and to record the times for each request. Here is what I found: Running for 8 hours with a simple application on a PyPortal, talking via a socket to another system, in 8 hours there were 2.2 million calls to _wait_spi_char. They were essentially all from my application's calls to socket.connected and socket.available, waiting for messages. Of those 2.2 million calls, 99.5% of them completed in less than 1 ms (average was actually less than 5 us). 10,000 of them took between 1 and 10 ms. But 30 of them took more than 100ms (which is the timeout value coded in _wait_spi_char). All of those would have generated the exception. The longest of those 30 was just 176 ms. Since I had increased the timeout to 1000 ms, none of them failed and the application had no problems. Of interest is that there were none in the range between 10ms and 100ms. Something, (garbage collect perhaps) occasionally causes a relatively long wait. Running with the timeout at 1 second has eliminated this as a problem for me. |
@mikejc58 Which nina-fw version is running on the ESP32? |
It is Nina-fw-1.7.1 And circuitpython 6.1.0 BTW, I am doing some more experiments, and will let you know when done (a few days). But, I had an error in my comment of 2 days ago. I said the average time for calls to wait_spi_char that were less than 1ms was less than 5 microseconds. That was completely wrong (silly math mistake), the actual average of those requests seems to be about 115 microseconds. |
I've concluded my experiments and have attached a file here with details. In short, my conclusion is that all of the 'Timed out waiting for SPI Char' exceptions that I've gotten are due to garbage collect. I believe the best course of action is to abandon timing these requests and substitute a simple retry counter. For the millions of requests that I've tracked there were none that failed on the first call to _read_byte(spi) but worked on a subsequent call. Any call to _read_byte(spi) that failed the first time, failed for the next 50 tries before finally timing out. So a simple retry loop eliminates the issues with timing (garbage collect and the use of time.monotonic() for short duration timeouts) |
@mikejc58 good sciencing :) want to submit a PR? |
Sure. I am going to run with the code that I will push for a while first though. |
Never mind! I wasn't running the latest version of the library. I will send another comment here if I am still experiencing this error, but consider this resolved. |
Should this be closed now that #132 is merged? |
Sure! |
This change addresses a few issues in the handling of the MQTT messages that caused the library to become unstable: - Add wrapper for socket.recv() so that an exact number of bytes are read into the buffer before attempting to parse the MQTT message; - Fix handling of ping response packets via _wait_for_msg(); - Fix disconnect so it can gracefully handle cases when socket writes are not possible. Also re-init _subscribed_topics as an empty list instead of None. Related-to adafruit/Adafruit_CircuitPython_ESP32SPI#102 Fixes adafruit/Adafruit_CircuitPython_PyPortal#98 Fixes adafruit#54 Signed-off-by: Flavio Fernandes <[email protected]>
I have an application that polls a couple of endpoints, with adafruit_requests, every few seconds and makes a nice display. Every few minutes adafruit_requests fails with: "Timed out waiting for SPI char" in adafruit_esp32spi.py line 271. Once this happens the stack becomes permanently borked with all subsequent calls failing with the same error. The only way to recover that I've found is to reboot.
This happens with 2 pyportals, 2 physically different networks and multiple USB cables.
Firmware: 1.2.2 and 1.6.1 (current stable)
Circuit python: 4.something and 5.3.0 (current stable)
Libraries: 4.something and 5.x (current stable)
The following code reproduces the problem about 90% of the time with an occasional success in the second GET and occasionally a different error.
The text was updated successfully, but these errors were encountered: