Skip to content

I2C/TWI library: Race Condition with Repeated STARTs #2173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
earlyprogrammer opened this issue Jul 9, 2014 · 3 comments
Closed

I2C/TWI library: Race Condition with Repeated STARTs #2173

earlyprogrammer opened this issue Jul 9, 2014 · 3 comments
Assignees
Labels
Library: Wire The Wire Arduino library
Milestone

Comments

@earlyprogrammer
Copy link

I believe I've identified a race condition with repeated STARTS in the TWI library.
(As Master, doesn't seem related to the TWI slave reports I found searching online)
I'm using a modified version of Wire and TWI, I'll discuss my changes later.

The Flow:
Instead of sending a stop, send a repeated start. Note: TWIE is set to 0, so the ISR should not get called when the hardware flags that sending the START has completed.
(ISR)

if (twi_sendStop)
    twi_stop();
else {
  twi_inRepStart = true;    // we're gonna send the START
  // don't enable the interrupt. We'll generate the start, but we 
  // avoid handling the interrupt until we're in the next transaction,
  // at the point where we would normally issue the start.
  TWCR = _BV(TWINT) | _BV(TWSTA)| _BV(TWEN) ;
  twi_state = TWI_READY;
}

Next, the flow skips the wait loop, believing that the state is READY.
(readFrom or writeTo)

while(TWI_READY != twi_state){
    continue;
  }
//various prep variable assignments

Finally, put the address in TWDR, then initiate the transmission and reenable the interrupts
(readFrom or writeTo)

if (true == twi_inRepStart) {
    // if we're in the repeated start state, then we've already sent the start,
    // (@@@ we hope), and the TWI statemachine is just waiting for the address byte.
    // We need to remove ourselves from the repeated start state before we enable interrupts,
    // since the ISR is ASYNC, and we could get confused if we hit the ISR before cleaning
    // up. Also, don't enable the START interrupt. There may be one pending from the 
    // repeated start that we sent outselves, and that would really confuse things.
    twi_inRepStart = false;         // remember, we're dealing with an ASYNC ISR
    TWDR = twi_slarw;               
    TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE);  // enable INTs, but not START
  }

CASE 1: intended

The hardware finishes the repeated start before TWDR = twi_slarw;

Hardware sets TWINT to 0, but because TWIE is disabled, ISR will not be called.
Address will be put in the TWDR, then the hardware will be initiated by setting TWINT to 1.

CASE 2: works

The hardware finishes the repeated start after TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE);

The attempt to TWDR = twi_slarw; broke because TWINT was still 1, flagging TWWC.
The TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE); was presumably ignored because the hardware was already working on a request.
Note: TWIE has been set.
When the hardware returns, the ISR will be called because interrupts were enabled.
leading to...

    case TW_REP_START: // sent repeated start condition
      // copy device address and r/w bit to output register and ack
      TWDR = twi_slarw;
      twi_reply(1);
      break;

twi_reply(1) == TWCR = _BV(TWEN) | _BV(TWIE) | _BV(TWINT) | _BV(TWEA);
meaning the address is put in TWDR, then hardware initiated, just like CASE 1.
This will also clear TWWC, meaning no error is noticed.
It works, so that's fine.

CASE 3: broken

The hardware finishes the repeated start between cases 1 and 2.

The attempt to TWDR = twi_slarw; broke because TWINT was still 1, flagging TWWC.
Hardware returns, and because ISR was disabled when the repeated start was constructed, the TWINT flag is ignored as it comes in.
Finally, TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE); runs, starting the hardware.
But the TWDR was never updated, so the wrong data gets sent (whatever was previously in the TWDR).
TWWC will also stay flagged.

My sniffer would notice an address nack once in about 50 repeat starts. (the address is the first byte sent after the repeated start, so that's the one that is sent incorrectly, and there was no slave at that wrong address.)

My Fix:
I ended up adding this...

twi_inRepStart = false;         // remember, we're dealing with an ASYNC ISR
do
  {
    TWDR = twi_slarw;
  } while (TWCR & _BV(TWWC));               
TWCR = _BV(TWINT) | _BV(TWEA) | _BV(TWEN) | _BV(TWIE);  // enable INTs, but not START

polling TWWC until it accepts the TWDR change, effectively removing cases 2 and 3.
(I tried polling TWINT with while (TWCR & _BV(TWINT)); before updating the data register, but I kept hitting an infinite loop. Not sure why, not completely relevant, but if anyone knows, I am curious.)
This fix doesn't quite fit the nice interrupt driven design, but I was having trouble fixing it in a good way.

The end, I'd love to hear from you, hope this was understandable and accurate.

PS: Why I was exploring the TWI and Wire libraries.
We wanted to fit PMBus protocol, meaning we needed 256 byte block write/read. changing the static buffers in Wire and TWI to 256 increases the static memory load from 32_5 to 256_5 > 1250 bytes, a big hit while using the uno.
We don't intend to fully use the asynchronous capabilities of the TWI library, and Wire blocks anyway, so I hacked out the buffers, made TWI write to a buffer created by the application. I also removed most of the slave code (using as demo board for other products, only master is needed.)
I'll attach those files as soon as I figure out how. The simplified versions might be necessary to consistently recreate the issue. I'm hoping a theoretical conversation will be useful either way though.

@earlyprogrammer
Copy link
Author

Here's the modified files.
https://gist.github.com/earlyprogrammer/f0261eed3e812e25dbd0

edit: right, so if you try to recreate this, please remove the fix. You won't get the issue if you don't.
(do...while loop)

@sandeepmistry
Copy link
Contributor

@earlyprogrammer thanks for the explanation and suggestion!

I was able to reproduce and test out the patch using a slightly modified Master sketch, as is Slave sketch from: http://forum.arduino.cc/index.php?topic=172296.msg1289288#msg1289288

I'll prepare a pull request with your suggestion shortly.

Master

// TEST_i2c_Master
// Arduino 1.0.5
// http://forum.arduino.cc/index.php?topic=172296.0

#include <Wire.h>

void setup()
{
  Serial.begin(115200);
  Serial.println("Start");
  Wire.begin();
}  

unsigned long okcount=0, errcount=0;

void loop() 
{
  Wire.beginTransmission(0x2B);
  Wire.write(0x11); // for the (simulated) register address.

  byte error = Wire.endTransmission(false); // do not stop, to trigger repeated start

  if( error != 0) {
    Serial.print("endTransmission error = ");
    Serial.print(error);
    errcount++;
  } else {
    Wire.requestFrom( 0x2B, 4, true);  // release bus after this
    byte avail = Wire.available();
    if (avail != 4) {
      Serial.print("avail = ");
      Serial.print( avail);
      errcount++;
    } else {
      char buffer[10];
      buffer[0] = Wire.read();
      buffer[1] = Wire.read();
      buffer[2] = Wire.read();
      buffer[3] = Wire.read();
      if( buffer[0] != 'A' || buffer[1] != 'B' || buffer[2] != 'C' || buffer[3] != 'D') {
        Serial.print("bad data : ");
        buffer[4] = '\0';
        Serial.print(buffer);
        errcount++;
      } else {
        Serial.print("good response");
        okcount++;
      }
    }
  }
  Serial.print(", ok=");
  Serial.print(okcount);
  Serial.print(", err=");
  Serial.print(errcount);
  Serial.println();
}

sandeepmistry added a commit to sandeepmistry/Arduino that referenced this issue Aug 28, 2015
as suggested by @earlyprogrammer in arduino#2173, to ensure TWDR value is set
if there is a write collision
sandeepmistry added a commit that referenced this issue Oct 22, 2015
as suggested by @earlyprogrammer in #2173, to ensure TWDR value is set
if there is a write collision
@sandeepmistry
Copy link
Contributor

Closed via #3742

@ffissore ffissore modified the milestone: Release 1.6.6 Oct 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Library: Wire The Wire Arduino library
Projects
None yet
Development

No branches or pull requests

5 participants