Skip to content

I2C/Wire library: changes for issue #42 #107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed

I2C/Wire library: changes for issue #42 #107

wants to merge 17 commits into from

Conversation

greyltc
Copy link
Contributor

@greyltc greyltc commented Sep 16, 2019

I2C comms currently seem unusable because the Arduino master may get stuck in one of several while loops during normal usage. This locks up execution with no way to recover (besides having a watchdog timer), see issue #42.
This PR uses @wmarkow's approach of putting timers in all the places we might get stuck while handling the I2C bus, then resetting the interface if one of those timers expires.

I've modified @wmarkow's approach in the following ways:

  • it now takes timeout values in microseconds instead of milliseconds
  • it doesn't forget about the user's previous calls to setClock() or begin() (with an address argument)

I've tested this code in my application and it works fine there and prevents my code from locking up (with the stock Wire library it would lock up about once per 200,000 I2C messages).

This adds one new user-facing function to the Wire library: setTimeoutInMicros() which takes a uint16 microsecond value. If this function is not called, or called with 0, Wire acts like it previously did, otherwise the I2C interface will be reset if it spends more than the given number of microseconds waiting for some change from an ISR.

This timeout approach might be incompatible with clock stretching and a multimaster bus, but it sure is better than an unusable I2C master. Of course, any previous Wire library functionality is maintained if setTimeoutInMicros() is not called.

@greyltc
Copy link
Contributor Author

greyltc commented Sep 18, 2019

Some very strange thing just happened, did a bunch of commits to the master branch just vaporize? Now this PR has a bunch of commits in it that I have nothing to do with. Do I need to rebase this now?

@lxrobotics Force-pushing to this repo feels pretty dangerous, if these commits were bad, why weren't they properly reverted instead of rewriting history?

@greyltc
Copy link
Contributor Author

greyltc commented Sep 18, 2019

Hang on, what branch should I even be making this PR for?

@aentinger
Copy link
Contributor

Sorry for the mixup 😳 Everything should be restored now.

@greyltc
Copy link
Contributor Author

greyltc commented Sep 18, 2019

@lxrobotics all good! :-) Thanks for fixing it up 👍

@sraillard
Copy link

Any idea when this PR will be included in the Arduino IDE distribution? Having an I2C timeout is mandatory for robust operation.

@sraillard
Copy link

Any news regarding when this pull request will be merged?

@greyltc
Copy link
Contributor Author

greyltc commented Jan 21, 2020

Has anyone besides me run this code in practice? Would be great to get some feedback from testers (besides myself) before it's merged. Especially nice would be testers using clock stretching and multimaster buses.

Copy link
Collaborator

@matthijskooijman matthijskooijman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good change overall, but not ready to merge. I left some comments inline, including some minor improvements but also some bugs and a possible ommission.

There's one more loop that did not get a timeout in twi_readFrom():

do {
  TWDR = twi_slarw;
} while(TWCR & _BV(TWWC));

Did you intentionally leave that one out? Is there some guarantee that that does not lock up?

Also, in my version I also handled arbitration errors by resetting the hardware, since if those occur due to noise rather than a second master, the hardware ends up waiting for a second master's stop condition indefinitely. Of course, unconditionally resetting on arbitration error is not ok, since that breaks multi-master setups. I suspect that no special handling is really needed: If an arbitration error occurs, this returns an error right now. In the background, the hardware will end up (invisibly :-S) locked, waiting for a stop condition, but that should be detected on the next transaction (which will run into a timeout on the start condition and reset the hardware). This would result in two failed transactions, but at least no lockup. Did you happen to test this scenario?

@greyltc
Copy link
Contributor Author

greyltc commented May 22, 2020

Hi @facchinm, I think the setTimeout name is already taken by the stream library, right? That's why the word Wire ended up in there.

Also, before it gets merged, would you be okay with me modifying the Wire library usage examples to enable the timeout?

And I would be very keen to hear from matthijskooijman before this gets merged, because he's been very involved in its development.

@facchinm
Copy link
Member

@greyltc overloading (but also shadowing) it is ok I think, since it applies to a situation that cannot happen with I2C (it's normally needed by APIs like readStringUntil which need an async medium to work, while in i2c the master is always in control of the communication)

I tested this patch on the top of yours calling Wire.setTimeout(40000); and it behaves correctly

diff --git a/libraries/Wire/src/Wire.cpp b/libraries/Wire/src/Wire.cpp
index b446f9a..63062ea 100644
--- a/libraries/Wire/src/Wire.cpp
+++ b/libraries/Wire/src/Wire.cpp
@@ -94,7 +94,7 @@ void TwoWire::setClock(uint32_t clock)
  * @param reset_with_timeout if true then TWI interface will be automatically reset on timeout
  *                           if false then TWI interface will not be reset on timeout
  */
-void TwoWire::setWireTimeoutUs(uint32_t timeout, bool reset_with_timeout){
+void TwoWire::setTimeout(uint32_t timeout, bool reset_with_timeout){
   twi_setTimeoutInMicros(timeout, reset_with_timeout);
 }
 
@@ -103,14 +103,14 @@ void TwoWire::setWireTimeoutUs(uint32_t timeout, bool reset_with_timeout){
  *
  * @return true if timeout has occured
  */
-bool TwoWire::getWireTimeoutFlag(void){
+bool TwoWire::getTimeoutFlag(void){
   return(twi_manageTimeoutFlag(false));
 }
 
 /***
  * Clears the TWI timeout flag.
  */
-void TwoWire::clearWireTimeoutFlag(void){
+void TwoWire::clearTimeoutFlag(void){
   twi_manageTimeoutFlag(true);
 }
 
diff --git a/libraries/Wire/src/Wire.h b/libraries/Wire/src/Wire.h
index c8b9a3e..64e09d3 100644
--- a/libraries/Wire/src/Wire.h
+++ b/libraries/Wire/src/Wire.h
@@ -55,9 +55,9 @@ class TwoWire : public Stream
     void begin(int);
     void end();
     void setClock(uint32_t);
-    void setWireTimeoutUs(uint32_t, bool);
-    bool getWireTimeoutFlag(void);
-    void clearWireTimeoutFlag(void);
+    void setTimeout(uint32_t timeout = 25000, bool reset_with_timeout = false);
+    bool getTimeoutFlag(void);
+    void clearTimeoutFlag(void);
     void beginTransmission(uint8_t);
     void beginTransmission(int);
     uint8_t endTransmission(void);

  */
-void TwoWire::clearWireTimeoutFlag(void){
+void TwoWire::clearTimeoutFlag(void){
   twi_manageTimeoutFlag(true);
 }
 
diff --git a/libraries/Wire/src/Wire.h b/libraries/Wire/src/Wire.h
index c8b9a3e..64e09d3 100644
--- a/libraries/Wire/src/Wire.h
+++ b/libraries/Wire/src/Wire.h
@@ -55,9 +55,9 @@ class TwoWire : public Stream
     void begin(int);
     void end();
     void setClock(uint32_t);
-    void setWireTimeoutUs(uint32_t, bool);
-    bool getWireTimeoutFlag(void);
-    void clearWireTimeoutFlag(void);
+    void setTimeout(uint32_t timeout = 25000, bool reset_with_timeout = false);
+    bool getTimeoutFlag(void);
+    void clearTimeoutFlag(void);
     void beginTransmission(uint8_t);
     void beginTransmission(int);
     uint8_t endTransmission(void);

@wmarkow
Copy link

wmarkow commented May 22, 2020

But does the Wire.setTimeout() - which comes from the Stream - assumes that the passed argument should be in milliseconds while for I2C we would like to have it as microseconds? Would that be a bit confusing?

@facchinm please also take a look at this comment:
https://github.com/arduino/ArduinoCore-avr/pull/107/files/3fc5fb88280789753eacc4e099ce1814f34c76d0#r399780104

There is a conversation between @matthijskooijman, @greyltc and me about the Wire.setTimeout() I do not know how to put a direct link to this here on github. It is in the line I posted above and related to the libraries/Wire/src/Wire.h file.

Copy link
Collaborator

@matthijskooijman matthijskooijman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I would be very keen to hear from matthijskooijman before this gets merged, because he's been very involved in its development.

Sorry, had been a bit busy :-)

I've left some inline remarks and suggestions (mostly docs, one code improvement that I did not actually test) on the PR, I'll respond to the rest of the discussion separately.

In the latest commit, da7d85a, I took the liberty of fixing some of what I thought were the more severe syntax issues especially in the ISR in twi.c, I hope that's alright.

I think the changes themselves are good, but I would really prefer them to be in a separate commit, to make this change easier to review. In that regard, it would probably be good to rebase this commit on top of latest master and squash it, leaving just one commit for the formatting fixes and one commit for the reset.

/***
* Returns the TWI timeout flag.
*
* @return true if timeout has occured
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @return true if timeout has occured
* @return true if timeout has occured since the flag was last cleared.

Comment on lines +91 to +95
* Sets the TWI timeout.
*
* @param timeout a timeout value in microseconds, if zero then timeout checking is disabled
* @param reset_with_timeout if true then TWI interface will be automatically reset on timeout
* if false then TWI interface will not be reset on timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Sets the TWI timeout.
*
* @param timeout a timeout value in microseconds, if zero then timeout checking is disabled
* @param reset_with_timeout if true then TWI interface will be automatically reset on timeout
* if false then TWI interface will not be reset on timeout
* Sets the TWI timeout.
*
* This limits the maximum time to wait for the TWI hardware. If more time passes, the bus is assumed
* to have locked up (e.g. due to noise-induced glitches or faulty slaves) and the transaction is aborted.
* Optionally, the TWI hardware is also reset, which can be required to allow subsequent transactions to
* succeed in some cases (in particular when noise has made the TWI hardware think there is a second
* master that has claimed the bus).
*
* When a timeout is triggered, a flag is set that can be queried with `getWireTimeoutFlag()` and is cleared
* when `clearWireTimeoutFlag()` or `setWireTimeoutUs()` is called.
*
* Note that this timeout can also trigger while waiting for clock stretching or waiting for a second master
* to complete its transaction. So make sure to adapt the timeout to accomodate for those cases if needed.
* A typical timeout would be 25ms (which is the maximum clock stretching allowed by the SMBus protocol),
* but (much) shorter values will usually also work.
*
* In the future, a timeout will be enabled by default, so if you require the timeout to be disabled, it is
* recommended you disable it by default using `setWireTimeoutUs(0)`, even though that is currently
* the default.
*
* @param timeout a timeout value in microseconds, if zero then timeout checking is disabled
* @param reset_with_timeout if true then TWI interface will be automatically reset on timeout
* if false then TWI interface will not be reset on timeout

@@ -373,8 +413,17 @@ void twi_stop(void)

// wait for stop condition to be exectued on bus
// TWINT is not set after a stop condition!
volatile uint32_t counter = twi_timeout_us/10ul; // approximate the timeout
Copy link
Collaborator

@matthijskooijman matthijskooijman May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
volatile uint32_t counter = twi_timeout_us/10ul; // approximate the timeout
// We cannot use micros() from an ISR, so approximate the timeout with cycle-counted delays
const uint8_t us_per_loop = 8;
uint32_t counter = (twi_timeout_us + us_per_loop - 1)/us_per_loop; // Round up

No need to make this volatile AFAICS, there will not be an ISR interrupting this code and writing to counter. Also, round up, to make sure the timeout is never shorter than configured. And also use a constant, to prevent duplicating the 10.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to use 8 as a value for us_per_loop variable? Is division by 8 simple to perform as it is mostly like shifting the register by three bits?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, seems like a good suggestion to me. Suggestion updated.

continue;
if(twi_timeout_us > 0ul){
if (counter > 0ul){
_delay_us(10);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_delay_us(10);
_delay_us(us_per_loop);

* Input reset: true causes this function to reset the twi hardware interface
* Output none
*/
void twi_handleTimeout(bool reset){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a reset parameter here? It seems it is always passed twi_do_reset_on_timeout, so might as well use that global variable directly here?

twi_state = TWI_READY;
}
break;
if (twi_sendStop){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a space before the { I think.

@matthijskooijman
Copy link
Collaborator

@greyltc overloading (but also shadowing) it is ok I think, since it applies to a situation that cannot happen with I2C (it's normally needed by APIs like readStringUntil which need an async medium to work, while in i2c the master is always in control of the communication)

Please don't do that (shadowing one API with another), I'm afraid that might cause confusion and needlessly entangles these two APIs. It might not cause problems, but consider some code like this:

Stream wireStream = Wire;
... then sometime later
wireStream.setTimeout(123);

Rather than setting the Wire timeout, as a user might expect, this would set the Stream timeout (which is unused, so effectively this is just silently throwing a way the timeout). Without shadowing, this would properly produce a compile error.

After we manage to get this merged, I'll submit another PR that picks a timeout value and enables it by default, we can discuss if/what that timeout should be and if the timeout should cause the interface to be reset there.

That seems like a good approach to me too.

Also, an important part of this, I think, is also to:

  • Properly document this timeout for the Wire library (though I'm afraid library docs do not live in a github repo yet), and recommend everyone to enable the timeout.
  • Document that some timeout will be enabled by default in the future, so sketches that do not need the timeout (e.g. multi-master setup, indefinite clock stretching) should disable it explicitly or set a sufficiently long timeout.

In my review, I've added some suggestions for more documentation in function comments, which can later be moved into the reference docs and should handle the above points.

Also, before it gets merged, would you be okay with me modifying the Wire library usage examples to enable the timeout?

Yes, please. This would also help with the "documentation" stuff I mentioned above.

For the Us suffix, it didn't bother me much, but I've no strong opinion, it's fine either way.

I like explicitness in this sense, making code easier to read, but maybe it's more consistent with other code without the Us suffix. No strong preference from my end.

@wmarkow
Copy link

wmarkow commented May 25, 2020

The solution from @greyltc branch works nice for me.

@functionpointer
Copy link

Any idea when this will be merged? The branch works well for me, too.

@facchinm
Copy link
Member

Aaaand, it is finally merged 🙂 Thanks everyone for the involvement; we'll be updating the reference documentation in the next days, applying the same approach on other architectures too.

@facchinm facchinm closed this Jun 11, 2020
@matthijskooijman
Copy link
Collaborator

For future reference, relevant commits are deea929, 38ff552 and f1fe5e8.

AFAICS @facchinm has squashed all commits of this PR together, renamed setWireTimeoutUs to setWireTimeout and added default parameter values and added my suggested comments, loop counter improvement and @wmarkow's divide by 8 suggestion.

@facchinm you missed this change:

-        _delay_us(10);
+        _delay_us(us_per_loop);

(making the timeout actually slightly off now, since us_per_loop is 8, not 10).

@facchinm
Copy link
Member

Ah right, I totally missed that suggestion 😬 Do you want to push a commit on top with that change?

@matthijskooijman
Copy link
Collaborator

I'm in a different project now, so if you could pick it up, that would be great.

@facchinm
Copy link
Member

Sure, on it

@bperrybap
Copy link

While it is nice to have API functions to set/get a timeout, there is no way for the application code to know that they exist.
This is needed for some applications that must work on multiple IDE versions and cores which may or may not have support for these new functions.
A #define macro is needed to indicate to the application that these API functions exist so it can adjust accordingly if necessary.
Something similar to the WIRE_HAS_END macro would be useful.
Perhaps WIRE_HAS_TIMEOUT which would indicate the existence of:

    void setWireTimeout(uint32_t timeout = 25000, bool reset_with_timeout = false);
    bool getWireTimeoutFlag(void);
    void clearWireTimeoutFlag(void);

@matthijskooijman
Copy link
Collaborator

Good plan. Adding LIBRARY_HAS_FOO macros seems to be the way we've done this in the past (WIRE_HAS_END as you suggest, or SPI_HAS_TRANSACTION). If we add this, the comments should probably be specific about which functions need to exist when it is defined, for third-party cores. Also, the reference docs should also indicate the macro and recommend checking the macro to make sketches and libraries portable.

Also, the reference docs (https://github.com/arduino/reference-en) still need to be updated (which can probably mostly use the comments from the source files, I think).

Anyone care to make these 2 pullrequests?

@bperrybap
Copy link

On the documentation, should this be written from the perspective of the user or as information for future core developers?
I assume it is kind of for both audiences.

I'm assuming the documentation for a WIRE_HAS_TIMEOUT macro would go in constants.adoc ?

There are many constants that are not currently in there like:
WIRE_HAS_END, SPI_HAS_TRANSACTION, LED_BUILTIN (that it is a macro not a const int) , NUM_DIGITAL_PINS, NUM_ANALOG_INPUTS, SDA, SCL, PIN_An, etc....
And all of these must be macros and should not be const int values.

Would it be better to create a separate documentation issue that dealt with all of these so they get fixed together as part of the same PR to fix the documentation for WIRE macro?

I've been wanting to get all these macros documented for years as I've worked with several 3rd party platform developers have either left them out, used different names or even implemented them.
One popular 3rd party platform has refused to implement them since in the owner/developer's words they are not documented.

I'll be happy to do it. I'd just like some guidance as to the best way to proceed with fixing this issue and getting the documentation updated in terms of what the PR(s) should cover.
i.e. is it one PR to fix the code to add the WIRE macro, and another PR to update the documentation for the WIRE macro or a single PR that does both?

@paynterf
Copy link

paynterf commented Jul 4, 2020

Just yesterday (literally) I received yet another email from someone new to Arduino/I2C who was experiencing the I2C lockup problem, asking for my help due to one of my blog posts from two years ago. As I was helping him, we both discovered that the new version (1.8.13) now contains the timeout feature (thanks again greyltc). This is wonderful news, but the fact that it isn't implemented by default doesn't help anyone - new users will still spend an infinite number of hours chasing what appears to be a hardware problem, only to discover (again) that the issue is a well-known bug in the Wire library.

FWIW, I have been using the SBWire library for the last two years in all my I2C projects, and I have yet to experience any lockups, and I was getting them daily (and sometimes hourly) beforehand. The SBWire library uses a simple loopcount designed to produce an approximaely 100 uSec timeout, and it has been working fine for me in several different configurations.

While I commend everyone involved for finally driving a stake through the heart of this 'problem-that-would-not-die', please understand that without a default timeout all your efforts will have been in vain, except possibly for the few experts who know enough to enable the timeout (and they have all moved to SBWire by now anyway).

Frank

@utis82
Copy link

utis82 commented Aug 1, 2020

Hello everyone,
First I want to say that I'm very new at arduino. For one of my project i'm using a MPU6050 giro and LCD screen and times to times I get freeze and I need to reboot my program with a whatchdog. But i'm not really happy with it.
So I want to say THANK YOU, for all the work you've done on the wire library , I read some poeple have waited almost 9 years, ..lucky me only 2 months.

I 've started to try your solution, I put Wire.setWireTimeout(250,true) in the setup loop, then in the main loop, I also changed the time from 25 to 2500, but the communication with the MPU6050 is not working anymore ,( but LCD works well). So I wonder if i'm not using it in the wrong way, or maybe it's not compatible with the MPU6050.
For the giro I use the library of Jeff Rowberg "MPU6050_6Axis_MotionApps20.h".

Do you have any idea of what should I try ?

Anyway, than you again for all the work you are doing.

@wmarkow
Copy link

wmarkow commented Aug 1, 2020

@utis82 , maybe your MPU6050 chip is the cause of the freeze? At least now with the fix your program can go on - however communication with MPU6050 doesn't work. Try to do software reset MPU6050 after freeze - maybe this will help:

  • in your setup() put Wire.setWireTimeout(250, false) this configure the Wire library to not to automatically reset the TWI interface
  • in your loop() do something like:
if(Wire.getWireTimeoutFlag())
{
    Wire.clearWireTimeoutFlag();
    Wire.end();
    Wire.begin();
    // put the code to reinit MPU6050 
}

If this doesn't work then I think you need to do a hardware reset of MPU6050 by pulling its power supply off and on. You ca use a MOSFET transistor to do that.

@paynterf
Copy link

paynterf commented Aug 1, 2020

FWIW, I also have an MPU6050 module in my wall-following robot project, along with a host of other I2C devices, and they all seem to be perking along happily with the new Wire library.

As always when dealing with hardware/software issues, it is important to simplify the problem to the point where the cause of the problem becomes obvious (often easier to say than do, I know).

I would suggest setting up another Arduino controller with just the minimum software needed to communicate with the MPU6050 (not necessarily even JRowbergs stuff - maybe just an example program from wherever you got the MPU6050 module) and no other hardware at all. Keep this configuration until you have either solved the problem or convinced yourself that the MPU6050 isn't the source of the problem. Then change JUST ONE THING! and see what happens.

Here are a couple of posts to give you the idea:

https://www.fpaynter.com/2019/10/basic-arduino-mpu6050-gy-521-test/
https://www.fpaynter.com/2018/06/integrating-time-memory-and-heading-capability-part-ii/

dodo5522 added a commit to dodo5522/vl53l0x-arduino that referenced this pull request Dec 7, 2020
last_status is reasonable solution to get the last status of i2c
trnasmittion, but in some cases, the last_status is overwritten by
calling read/write register API more than once. For instance, init(),
startContinuous(), etc.

Especially on Arduino IDE 1.8.13, new error code `5` (i2c timeout) is
added to Wire.endTransmission() by merging this PR
arduino/ArduinoCore-avr#107 .

You can distinguish what error occurs and use them depending on the
situation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.