-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Double I2C read in one transaction skips a clock pulse (#5528) (based on I2C reduce iRAM) #6592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The I2C code takes a large chunk of IRAM space. Attempt to reduce the size of the routines without impacting functionality. First, remove the `static` classifier on the sda/scl variables in the event handlers. The first instructions in the routines overwrite the last value stored in them, anyway, and their addresses are never taken.
Where it doesn't make a functional difference, make global variables ints and not unit8_t. Bytewide updates and extracts require multiple instructions and hence increase IRAM usage as well as runtime.
Sketch uses 270855 bytes (25%) of program storage space. Maximum is 1044464 bytes. Global variables use 27940 bytes (34%) of dynamic memory, leaving 53980 bytes for local variables. Maximum is 81920 bytes. ./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t -j .text1 /tmp/arduino_build_9615/*elf | sort -k1 | head -20 401000cc l F .text1 00000014 twi_delay 401000ec l F .text1 00000020 twi_reply$part$1 4010010c g F .text1 00000035 twi_reply 4010014c g F .text1 00000052 twi_stop 401001a0 g F .text1 0000003b twi_releaseBus 40100204 g F .text1 000001e6 twi_onTwipEvent 40100404 l F .text1 000001f7 onSdaChange 40100608 l F .text1 000002fd onSclChange 40100908 l F .text1 0000003b onTimer
If SCL is low then all branches of the case are no-ops, so factor that portion outo to remove some redundant logic in each case. Sketch uses 270843 bytes (25%) of program storage space. Maximum is 1044464 bytes. Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes. 401000cc l F .text1 00000014 twi_delay 401000ec l F .text1 00000020 twi_reply$part$1 4010010c g F .text1 00000035 twi_reply 4010014c g F .text1 00000052 twi_stop 401001a0 g F .text1 0000003b twi_releaseBus 40100204 g F .text1 000001e6 twi_onTwipEvent 40100404 l F .text1 000001e7 onSdaChange 401005f8 l F .text1 000002fd onSclChange 401008f8 l F .text1 0000003b onTimer 0x0000000040107468 _text_end = ABSOLUTE (.)
twi_reply is a chunk of code that can be inlined and actually save IRAM space because certain conditions acan be statically evaluated by gcc. Sketch uses 270823 bytes (25%) of program storage space. Maximum is 1044464 bytes. Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes. 401000cc l F .text1 00000014 twi_delay 401000f4 g F .text1 00000052 twi_stop 40100148 g F .text1 0000003b twi_releaseBus 401001b0 g F .text1 00000206 twi_onTwipEvent 401003d0 l F .text1 000001e7 onSdaChange 401005c4 l F .text1 000002fd onSclChange 401008c4 l F .text1 0000003b onTimer 40100918 g F .text1 00000085 millis 401009a0 g F .text1 0000000f micros 401009b0 g F .text1 00000022 micros64 401009d8 g F .text1 00000013 delayMicroseconds 401009f0 g F .text1 00000034 __digitalRead 401009f0 w F .text1 00000034 digitalRead 40100a3c g F .text1 000000e4 interrupt_handler 40100b20 g F .text1 0000000f vPortFree 0x0000000040107434 _text_end = ABSOLUTE (.)
Sketch uses 270799 bytes (25%) of program storage space. Maximum is 1044464 bytes. Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes. 401000cc l F .text1 00000014 twi_delay 401000f4 w F .text1 0000003b twi_releaseBus 4010015c g F .text1 00000246 twi_onTwipEvent 401003bc l F .text1 000001e7 onSdaChange 401005b0 l F .text1 000002f9 onSclChange 401008ac l F .text1 0000003b onTimer 0x000000004010741c _text_end = ABSOLUTE (.)
GCC won't use a lookup table for the TWI state machine, so it ends up using a series of straight line compare-jump, compare-jumps to figure out which branch of code to execute for each state. For branches that have multiple states that call them, this can expand to a lot of code. Short-circuit the whole thing by converting the FSM to a 1-hot encoding while executing it, and then just and-ing the 1-hot state with the bitmask of states with the same code. Sketch uses 270719 bytes (25%) of program storage space. Maximum is 1044464 bytes. Global variables use 27944 bytes (34%) of dynamic memory, leaving 53976 bytes for local variables. Maximum is 81920 bytes. 401000cc l F .text1 00000014 twi_delay 401000f4 w F .text1 0000003b twi_releaseBus 4010015c g F .text1 00000246 twi_onTwipEvent 401003c0 l F .text1 000001b1 onSdaChange 40100580 l F .text1 000002da onSclChange 4010085c l F .text1 0000003b onTimer 0x00000000401073cc _text_end = ABSOLUTE (.) Saves 228 bytes of IRAM vs. master, uses 32 additional bytes of heap.
twi_status is set immediately before an event handler is called, resulting in lots of duplicated code. Set the twi_status flag inside the handler itself. Saves an add'l ~100 bytes of IRAM from prior changes, for a total of ~340 bytes. earle@server:~/Arduino/hardware/esp8266com/esp8266/tools$ ./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t -j .text1 /tmp/arduino_build_849115/*elf | sort -k1 | head -20 401000cc l F .text1 00000014 twi_delay 401000f4 w F .text1 0000003b twi_releaseBus 40100160 g F .text1 0000024e twi_onTwipEvent 401003c8 l F .text1 00000181 onSdaChange 40100558 l F .text1 00000297 onSclChange
Thanks to the suggestion from @mhightower83, move all global objects into a struct. This lets a single base pointer register to be used in place of constantly reloading the address of each individual variable. This might be better expressed by moving this to a real C++ implementaion based on a class object (the twi.xxxx would go back to the old xxx-only naming for vars), but there would then need to be API wrappers since the functionality is exposed through a plain C API. Saves 168 additional code bytes, for a grand total of 550 bytes IRAM. earle@server:~/Arduino/hardware/esp8266com/esp8266/tools$ ./xtensa-lx106-elf/bin/xtensa-lx106-elf-objdump -t -j .text1 /tmp/arduino_build_849115/*elf | sort -k1 | head -20 401000cc l F .text1 00000014 twi_delay 401000e8 w F .text1 00000032 twi_releaseBus 40100128 g F .text1 00000217 twi_onTwipEvent 4010034c l F .text1 00000149 onSdaChange 4010049c l F .text1 00000267 onSclChange 40100704 l F .text1 00000028 onTimer
Make the TWI states enums and not #defines, in the hope that it will allow GCC to more easily flag problems and general good code organization. 401000cc l F .text1 00000014 twi_delay 401000e8 w F .text1 00000032 twi_releaseBus 40100128 g F .text1 00000217 twi_onTwipEvent 4010034c l F .text1 00000149 onSdaChange 4010049c l F .text1 00000257 onSclChange 401006f4 l F .text1 00000028 onTimer Looks like another 16 bytes IRAM saved from the prior push. Sketch uses 267079 bytes (25%) of program storage space. Maximum is 1044464 bytes. Global variables use 27696 bytes (33%) of dynamic memory, leaving 54224 bytes for local variables. Maximum is 81920 bytes.
Convert the entire file into a C++ class (with C wrappers to preserve the ABI). This allows for setting individual values of the global struct(class) in-situ instead of a cryptic list at the end of the struct definition. It also removes a lot of redundant `twi.`s from most class members. Clean up the code by converting from `#defines` to inline functions, get rid of ternarys-as-ifs, use real enums, etc. For slave_receiver.ino, the numbers are: GIT Master IRAM: 0x723c This push IRAM: 0x6fc0 For a savings of 636 total IRAM bytes (note, there may be a slight flash text increase, but we have 1MB of flash to work with and only 32K of IRAM so the tradeoff makes sense.
Since the C++ version has significant text differences anyway, now is a good time to clean up the mess of spaces, tabs, and differing cuddles.
See esp8266#5528 and the more elaborate [description](https://github.com/maarten-pennings/I2C-tool/blob/master/I2Ctest8266/README.md#how-to-fix) where @maarten-pennings did all the hard work, but as far as I could see, no PR was made.
Does it add much to the iram usage? |
Still nice iram usage. You can try if you want, use in platformio.ini as platform: |
Per review comments
There were multiple places where the code was waiting for a slave to finish stretching the clock. Factor them out to an *inline* function to reduce code smell.
Add a new twi_setSlaveMode call which actually attached the interrupts to the slave pin change code onSdaChenge/onSclChange. Don't attach interrupts in the main twi_begin. Because slave mode is only useful should a onoReceive or onRequest callback, call twi_setSlaveMode and attach interrupts on the Wire setters. This allows GCC to not link in slave code unless slave mode is used, saving over 1,000 bytes of IRAM in the common, master-only case.
Apparently it does fix something as reported here: #6497 (comment) |
Confirmation from #6497 (comment) @TD-er patch is huge and we cannot read it. I propose to switch it to allman-style like in Line 12 in 122e870
Just modify it like this
run the script and also add it to your commit |
@d-a-v I guess it has become unreadable because I had based it on an existing PR which later was updated. |
See esp8266#5528 and the more elaborate [description](https://github.com/maarten-pennings/I2C-tool/blob/master/I2Ctest8266/README.md#how-to-fix) where @maarten-pennings did all the hard work, but as far as I could see, no PR was made.
… into bugfix/issue5528_I2C_reduceiram
I did rebase it and removed the time stretch functions I added and @earlephilhower also implemented in his PR. Edit: |
I fixed the conflict |
@TD-er Updated |
Should I rebase this PR on the master branch and force push it? |
No,
(I just found out this octopus thing, after searching why my |
As suggested here: esp8266#6592 (comment)
Well I tried it, but it doesn't look like it is making it more readable. |
At stages like this, @TD-er, I usually give up on git and restart fresh from a new branch. :( Your changes end about here. There's now a method "WAIT_CLOCK_STRETCH" which replaces your twi_wait_clockStretchLimit, and the rest of the changes look relatively simple to adjust to the class: readFrom/writeTo methods: |
OK, so should I make a new PR for it, or do you cut'n-paste it? |
It's your code, so suggest you do the new PR/branch. I'm not trying to take the I'd start a new PR, make the 3 changes in the new branch, link it as "supersedes #6592", close this PR, and go on from there. If that's a problem, I can make a PR from your changes, too. Just let me know. |
Superseded by #6654 |
See #5528 and the more elaborate description
where @maarten-pennings did all the hard work, but as far as I could see, no PR was made.
I also noticed some code duplication, which I moved to separate functions.
According to this documentation on I2C clock stretching it is not allowed to have some slave keep the clock low during transmission of a byte, only after an ack.
So that's why it is not done in the while loop.
But I wondered if there should be an extra delay between the sequence of pulses and the extra added clock "valey". See my comment in the code.
Have not tested it myself, this is just the long awaited PR.
N.B. This PR was suggested by @Jason2866 here: #6579 (comment)
It is based on this PR by @earlephilhower : #6326
This PR is almost the same as #6579 but now following the changes made in #6326