Skip to content

I think we may need to add memory barrier to xt_rsil(). #6302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mhightower83 opened this issue Jul 14, 2019 · 3 comments
Closed

I think we may need to add memory barrier to xt_rsil(). #6302

mhightower83 opened this issue Jul 14, 2019 · 3 comments

Comments

@mhightower83
Copy link
Contributor

I think we may need to add memory barrier to xt_rsil().
Based on the comments I saw here:

From ESP8266_RTOS_SDK/components/esp8266/include/xtensa/xtruntime.h

/*  NOTE: these asm macros don't modify memory, but they are marked
 *  as such to act as memory access barriers to the compiler because
 *  these macros are sometimes used to delineate critical sections;
 *  function calls are natural barriers (the compiler does not know
 *  whether a function modifies memory) unless declared to be inlined.  */
# define XTOS_SET_INTLEVEL(intlevel)            ({ unsigned __tmp; \
                        __asm__ __volatile__(   "rsil   %0, " XTSTR(intlevel) "\n" \
                                                : "=a" (__tmp) : : "memory" ); \
                        __tmp;})

After trying to get a better understanding of the "memory" parameter. I think the current xt_rsil() in

#define xt_rsil(level) (__extension__({uint32_t state; __asm__ __volatile__("rsil %0," __STRINGIFY(level) : "=a" (state)); state;}))

needs it.

I wanted to see the results in assembly. So I did a quick compile to .S of this sample function with the old xt_rsil() and one with "memory" specified:

#include <Arduino.h>

#if 0
#undef xt_rsil
#define xt_rsil(level) (__extension__({uint32_t state; __asm__ __volatile__("rsil %0," __STRINGIFY(level) : "=a" (state)::"memory"); state;}))
#endif

int just4Fun(int *a, int *b) {
  uint32_t old_ps;
  int result = *a;
  old_ps = xt_rsil(15);
  *b = *a;
  xt_wsr_ps(old_ps);
  return result;
}

Using xt_rsil() without "memory" fence, trimmed .S results:

_Z8just4FunPiS_:
        l32i.n  a2, a2, 0
        rsil a4,15
        s32i.n  a2, a3, 0
        wsr a4,ps; isync
        ret.n

Using xt_rsil() with "memory" fence, trimmed .S results:

_Z8just4FunPiS_:
        l32i.n  a4, a2, 0
        rsil a5,15
        l32i.n  a2, a2, 0
        s32i.n  a2, a3, 0
        wsr a5,ps; isync
        mov.n   a2, a4
        ret.n

Notice the change.

  • The first example reads *a outside the critical area. Then saves that value to *b while in the critical area.
  • The second example reads a copy of *a outside the critical area. However, while in the critical area it reads *a again before saving it to *b.

Originally posted by @mhightower83 in #6274 (comment)

@devyte
Copy link
Collaborator

devyte commented Jul 14, 2019

This is an amazing find. Out of curiosity, what is the generated .S with optimizations off?

@devyte
Copy link
Collaborator

devyte commented Jul 14, 2019

#6301 is merged.

@mhightower83
Copy link
Contributor Author

Sorry, I am not sure of the question? The 1st .S is without the memory barrier/fence. The 2nd .S with the barrier/fence. And of course xt_wsr_ps() already had the "memory" fence parameter.

Based on what I see in the assembly, the temporary registers that are loaded with values from memory are not shared across memory barriers. I didn't fully grasp the significance of what I read until I saw how the code changed in assembly.

BTW: I assume some rules must exist on how to apply memory fences defined in a class constructor and destructor to the context of the scope. With the test function I built, using the InterruptLock class (with the "memory" parameter), the resulting .S appeared to be correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants