Skip to content

Double I2C read in one transaction skips a clock pulse (#5528) (based on I2C reduce iRAM) #6592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 27 commits into from

Conversation

TD-er
Copy link
Contributor

@TD-er TD-er commented Oct 3, 2019

See #5528 and the more elaborate description
where @maarten-pennings did all the hard work, but as far as I could see, no PR was made.

I also noticed some code duplication, which I moved to separate functions.

According to this documentation on I2C clock stretching it is not allowed to have some slave keep the clock low during transmission of a byte, only after an ack.
So that's why it is not done in the while loop.
But I wondered if there should be an extra delay between the sequence of pulses and the extra added clock "valey". See my comment in the code.

Have not tested it myself, this is just the long awaited PR.

N.B. This PR was suggested by @Jason2866 here: #6579 (comment)
It is based on this PR by @earlephilhower : #6326
This PR is almost the same as #6579 but now following the changes made in #6326

earlephilhower and others added 18 commits July 21, 2019 17:04
The I2C code takes a large chunk of IRAM space.  Attempt to reduce the
size of the routines without impacting functionality.

First, remove the `static` classifier on the sda/scl variables in the
event handlers.  The first instructions in the routines overwrite the
last value stored in them, anyway, and their addresses are never taken.
Where it doesn't make a functional difference, make global variables
ints and not unit8_t.  Bytewide updates and extracts require multiple
instructions and hence increase IRAM usage as well as runtime.
Sketch uses 270855 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27940 bytes (34%) of dynamic memory, leaving 53980 bytes for local variables. Maximum is 81920 bytes.
./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t  -j .text1 /tmp/arduino_build_9615/*elf | sort -k1 | head -20
401000cc l     F .text1	00000014 twi_delay
401000ec l     F .text1	00000020 twi_reply$part$1
4010010c g     F .text1	00000035 twi_reply
4010014c g     F .text1	00000052 twi_stop
401001a0 g     F .text1	0000003b twi_releaseBus
40100204 g     F .text1	000001e6 twi_onTwipEvent
40100404 l     F .text1	000001f7 onSdaChange
40100608 l     F .text1	000002fd onSclChange
40100908 l     F .text1	0000003b onTimer
If SCL is low then all branches of the case are no-ops, so factor that
portion outo to remove some redundant logic in each case.

Sketch uses 270843 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.

401000cc l     F .text1	00000014 twi_delay
401000ec l     F .text1	00000020 twi_reply$part$1
4010010c g     F .text1	00000035 twi_reply
4010014c g     F .text1	00000052 twi_stop
401001a0 g     F .text1	0000003b twi_releaseBus
40100204 g     F .text1	000001e6 twi_onTwipEvent
40100404 l     F .text1	000001e7 onSdaChange
401005f8 l     F .text1	000002fd onSclChange
401008f8 l     F .text1	0000003b onTimer

0x0000000040107468                _text_end = ABSOLUTE (.)
twi_reply is a chunk of code that can be inlined and actually save IRAM
space because certain conditions acan be statically evaluated by gcc.

Sketch uses 270823 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.

401000cc l     F .text1	00000014 twi_delay
401000f4 g     F .text1	00000052 twi_stop
40100148 g     F .text1	0000003b twi_releaseBus
401001b0 g     F .text1	00000206 twi_onTwipEvent
401003d0 l     F .text1	000001e7 onSdaChange
401005c4 l     F .text1	000002fd onSclChange
401008c4 l     F .text1	0000003b onTimer
40100918 g     F .text1	00000085 millis
401009a0 g     F .text1	0000000f micros
401009b0 g     F .text1	00000022 micros64
401009d8 g     F .text1	00000013 delayMicroseconds
401009f0 g     F .text1	00000034 __digitalRead
401009f0  w    F .text1	00000034 digitalRead
40100a3c g     F .text1	000000e4 interrupt_handler
40100b20 g     F .text1	0000000f vPortFree

0x0000000040107434                _text_end = ABSOLUTE (.)
Sketch uses 270799 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.

401000cc l     F .text1	00000014 twi_delay
401000f4  w    F .text1	0000003b twi_releaseBus
4010015c g     F .text1	00000246 twi_onTwipEvent
401003bc l     F .text1	000001e7 onSdaChange
401005b0 l     F .text1	000002f9 onSclChange
401008ac l     F .text1	0000003b onTimer

0x000000004010741c                _text_end = ABSOLUTE (.)
GCC won't use a lookup table for the TWI state machine, so it ends up
using a series of straight line compare-jump, compare-jumps to figure
out which branch of code to execute for each state.  For branches that
have multiple states that call them, this can expand to a lot of code.

Short-circuit the whole thing by converting the FSM to a 1-hot encoding
while executing it, and then just and-ing the 1-hot state with the
bitmask of states with the same code.

Sketch uses 270719 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes.

401000cc l     F .text1	00000014 twi_delay
401000f4  w    F .text1	0000003b twi_releaseBus
4010015c g     F .text1	00000246 twi_onTwipEvent
401003c0 l     F .text1	000001b1 onSdaChange
40100580 l     F .text1	000002da onSclChange
4010085c l     F .text1	0000003b onTimer

0x00000000401073cc                _text_end = ABSOLUTE (.)

Saves 228 bytes of IRAM vs. master, uses 32 additional bytes of heap.
twi_status is set immediately before  an event handler is called,
resulting in lots of duplicated code.  Set the twi_status flag inside
the handler itself.

Saves an add'l ~100 bytes of IRAM from prior changes, for a total of
~340 bytes.

earle@server:~/Arduino/hardware/esp8266com/esp8266/tools$ ./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t  -j .text1 /tmp/arduino_build_849115/*elf | sort -k1 | head -20

401000cc l     F .text1	00000014 twi_delay
401000f4  w    F .text1	0000003b twi_releaseBus
40100160 g     F .text1	0000024e twi_onTwipEvent
401003c8 l     F .text1	00000181 onSdaChange
40100558 l     F .text1	00000297 onSclChange
Thanks to the suggestion from @mhightower83, move all global objects
into a struct.  This lets a single base pointer register to be used in
place of constantly reloading the address of each individual variable.

This might be better expressed by moving this to a real C++
implementaion based on a class object (the twi.xxxx would go back to the
old xxx-only naming for vars), but there would then need to be API
wrappers since the functionality is exposed through a plain C API.

Saves 168 additional code bytes, for a grand total of 550 bytes IRAM.

earle@server:~/Arduino/hardware/esp8266com/esp8266/tools$ ./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t  -j .text1 /tmp/arduino_build_849115/*elf | sort -k1 | head -20

401000cc l     F .text1	00000014 twi_delay
401000e8  w    F .text1	00000032 twi_releaseBus
40100128 g     F .text1	00000217 twi_onTwipEvent
4010034c l     F .text1	00000149 onSdaChange
4010049c l     F .text1	00000267 onSclChange
40100704 l     F .text1	00000028 onTimer
Make the TWI states enums and not #defines, in the hope that it will
allow GCC to more easily flag problems and general good code
organization.

401000cc l     F .text1	00000014 twi_delay
401000e8  w    F .text1	00000032 twi_releaseBus
40100128 g     F .text1	00000217 twi_onTwipEvent
4010034c l     F .text1	00000149 onSdaChange
4010049c l     F .text1	00000257 onSclChange
401006f4 l     F .text1	00000028 onTimer

Looks like another 16 bytes IRAM saved from the prior push.

Sketch uses 267079 bytes (25%) of program storage space. Maximum is 1044464 bytes.
Global variables use 27696 bytes (33%) of dynamic memory, leaving 54224 bytes for local variables. Maximum is 81920 bytes.
Convert the entire file into a C++ class (with C wrappers to preserve
the ABI).  This allows for setting individual values of the global
struct(class) in-situ instead of a cryptic list at the end of the struct
definition.  It also removes a lot of redundant `twi.`s from most class
members.

Clean up the code by converting from `#defines` to inline functions, get
rid of ternarys-as-ifs, use real enums, etc.

For slave_receiver.ino, the numbers are:
GIT Master IRAM: 0x723c
This push IRAM: 0x6fc0

For a savings of 636 total IRAM bytes (note, there may be a slight flash
text increase, but we have 1MB of flash to work with and only 32K of IRAM
so the tradeoff makes sense.
Since the C++ version has significant text differences anyway, now is a
good time to clean up the mess of spaces, tabs, and differing cuddles.
@TD-er TD-er changed the title Bugfix/issue5528 i2 c reduceiram Double I2C read in one transaction skips a clock pulse (#5528) (based on I2C reduce iRAM) Oct 3, 2019
@Jason2866
Copy link
Contributor

A quick test was successful. Tasmota does run with your I2C driver
image

@TD-er
Copy link
Contributor Author

TD-er commented Oct 3, 2019

Does it add much to the iram usage?

@Jason2866
Copy link
Contributor

Jason2866 commented Oct 3, 2019

Still nice iram usage. You can try if you want, use in platformio.ini as platform:
platform = https://github.com/Jason2866/platform-espressif8266/tree/newfeature/stage

There were multiple places where the code was waiting for a slave to
finish stretching the clock.  Factor them out to an *inline* function
to reduce code smell.
Add a new twi_setSlaveMode call which actually attached the interrupts
to the slave pin change code onSdaChenge/onSclChange.  Don't attach
interrupts in the main twi_begin.

Because slave mode is only useful should a onoReceive or onRequest
callback, call twi_setSlaveMode and attach interrupts on the Wire
setters.

This allows GCC to not link in slave code unless slave mode is used,
saving over 1,000 bytes of IRAM in the common, master-only case.
@TD-er
Copy link
Contributor Author

TD-er commented Oct 14, 2019

Apparently it does fix something as reported here: #6497 (comment)

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 14, 2019

Confirmation from #6497 (comment)

@TD-er patch is huge and we cannot read it.

I propose to switch it to allman-style like in

all="libraries/ESP8266mDNS"

Just modify it like this

all="
libraries/ESP8266mDNS
cores/esp8266/core_esp8266_si2c.cpp
"

run the script and also add it to your commit

@TD-er
Copy link
Contributor Author

TD-er commented Oct 14, 2019

@d-a-v I guess it has become unreadable because I had based it on an existing PR which later was updated.
I did take care not to modify any other line I did not touch.

@TD-er
Copy link
Contributor Author

TD-er commented Oct 14, 2019

I did rebase it and removed the time stretch functions I added and @earlephilhower also implemented in his PR.
Apparently still need to run the formatting script you mentioned.
At least the last 2 commits should give you an indication of what I changed.

Edit:
They should look similar to my initial commit for this PR: 1c63358

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 14, 2019

I fixed the conflict

@Jason2866
Copy link
Contributor

Jason2866 commented Oct 14, 2019

@TD-er Updated platform = https://github.com/Jason2866/platform-espressif8266/tree/newfeature/stage
to latest changes

@TD-er
Copy link
Contributor Author

TD-er commented Oct 15, 2019

Should I rebase this PR on the master branch and force push it?

@d-a-v
Copy link
Collaborator

d-a-v commented Oct 15, 2019

No,
Here's what I just tried and works.

  • checkout your local master copy and make it up to date
  • checkout back your PR branch
  • update by hand test/restyle.sh, add libraries/Wire after cores/esp8266/core_esp8266_si2c.cpp
  • run the restyle script
  • commit the changes
  • now the magic command: cd <core-rootdir>; git merge-octopus master
  • your PR is now clean like below, just push it as is to your branch
diff --git a/cores/esp8266/core_esp8266_si2c.cpp b/cores/esp8266/core_esp8266_si2c.cpp
index 7200f631..2ad2016d 100644
--- a/cores/esp8266/core_esp8266_si2c.cpp
+++ b/cores/esp8266/core_esp8266_si2c.cpp
@@ -128,7 +128,8 @@ private:
         }
     }
 
-
+    // Generate a clock "valley" (at the end of a segment, just before a repeated start)
+    void twi_scl_valley(void);
 public:
     void setClock(unsigned int freq);
     void setClockStretchLimit(uint32_t limit);
@@ -386,13 +387,17 @@ unsigned char Twi::writeTo(unsigned char address, unsigned char * buf, unsigned
     {
         write_stop();
     }
+    else
+    {
+        twi_scl_valley();
+        twi_wait_clockStretchLimit();
+        // TD-er: Also twi_delay here?
+        // delay(twi_dcount);
+    }
     i = 0;
     while (!SDA_READ() && (i++) < 10)
     {
-        SCL_LOW();
-        busywait(twi_dcount);
-        SCL_HIGH();
-        WAIT_CLOCK_STRETCH();
+        twi_scl_valley();
         busywait(twi_dcount);
     }
     return 0;
[...]

(I just found out this octopus thing, after searching why my git diff -u looked so clean and git merge so garbaged)

@TD-er
Copy link
Contributor Author

TD-er commented Oct 15, 2019

Well I tried it, but it doesn't look like it is making it more readable.
I don't know what this octopus merge should do, but it was done with it even faster than a normal git status executes.
So I don't think it did any magic here.

@earlephilhower
Copy link
Collaborator

At stages like this, @TD-er, I usually give up on git and restart fresh from a new branch. :(

Your changes end about here. There's now a method "WAIT_CLOCK_STRETCH" which replaces your twi_wait_clockStretchLimit, and the rest of the changes look relatively simple to adjust to the class: readFrom/writeTo methods:

1c63358

@TD-er
Copy link
Contributor Author

TD-er commented Oct 18, 2019

OK, so should I make a new PR for it, or do you cut'n-paste it?

@earlephilhower
Copy link
Collaborator

It's your code, so suggest you do the new PR/branch. I'm not trying to take the git blame for it. :)

I'd start a new PR, make the 3 changes in the new branch, link it as "supersedes #6592", close this PR, and go on from there.

If that's a problem, I can make a PR from your changes, too. Just let me know.

@TD-er
Copy link
Contributor Author

TD-er commented Oct 19, 2019

Superseded by #6654

@TD-er TD-er closed this Oct 19, 2019
@TD-er TD-er deleted the bugfix/issue5528_I2C_reduceiram branch October 24, 2019 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants