Skip to content

Strange behavior with Micros() and Queue #1160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Leicas opened this issue Mar 1, 2018 · 15 comments
Closed

Strange behavior with Micros() and Queue #1160

Leicas opened this issue Mar 1, 2018 · 15 comments

Comments

@Leicas
Copy link

Leicas commented Mar 1, 2018

Hardware:

Board: Adafruit Huzzah ESP32
Core Installation/update date: 20/feb/2018
IDE name: Arduino IDE
Flash Frequency: 40Mhz
Upload Speed: 115200

Description:

Hi,
I noticed some strange things happening when using Micros for timing my code.
The sketch below show a minimal exemple, if you comment the xQueueSend(queue, &j, 0); line, everything works fine and Micros and Millis show consistent values.

But with the xQueueSend(queue, &j, 0);, I cannot figure what Micros is returning, and it doesn't make sense to me why using a fifo queue would impact timing.

Did I miss something obvious ?

Thanks

Sketch:

unsigned long mesuretemps = 0;
QueueHandle_t queue;
int queueSize = 4096;
unsigned long oldmic = 0;
void setup(){
  Serial.begin(115200);
  mesuretemps = micros();
  queue = xQueueCreate( queueSize, sizeof( int ) ); 
  if(queue == NULL){
    Serial.println("Error creating the queue");
  }
  xTaskCreate(
                    consumerTask,     /* Task function. */
                    "Consumer",       /* String with name of task. */
                    10000,            /* Stack size in words. */
                    NULL,             /* Parameter passed as input of the task */
                    1,                /* Priority of the task. */
                    NULL);            /* Task handle. */
}
void loop(){
  for(int j=0;j<1000;j++) {
    xQueueSend(queue, &j, 0);
  }
 unsigned long ms = millis();
unsigned long us = micros();
unsigned long delta = ms - mesuretemps;
mesuretemps = ms;
unsigned long deltamic = us - oldmic;
oldmic = us;
  Serial.printf("done in %lu ms %lu mus\n", delta,deltamic);
}
void consumerTask( void * parameter)
{
    int element;
  while(true) {
    xQueueReceive(queue, &element, portMAX_DELAY);
    delayMicroseconds(125);
  }
}
@stickbreaker
Copy link
Contributor

@Leicas What is it doing that you don't like?
Post your output.

Chuck.

@Leicas
Copy link
Author

Leicas commented Mar 1, 2018

@stickbreaker yes, no problem,
with xQueueSend

done in 3 ms 35793660 mus
done in 2 ms 35793650 mus
done in 2 ms 35793651 mus
done in 2 ms 35793643 mus
done in 3 ms 35793659 mus
done in 2 ms 35793651 mus
done in 2 ms 35793652 mus
done in 3 ms 35793641 mus
done in 2 ms 35793659 mus
done in 2 ms 35793652 mus
done in 2 ms 35793651 mus
done in 3 ms 35793641 mus
done in 2 ms 35793660 mus
done in 2 ms 35793651 mus
done in 2 ms 35793652 mus
done in 3 ms 35793641 mus
done in 2 ms 35793660 mus
done in 2 ms 35793651 mus
done in 2 ms 35793652 mus
done in 3 ms 35793641 mus
done in 2 ms 35793659 mus
done in 2 ms 35793652 mus
done in 2 ms 35793651 mus
done in 3 ms 35793641 mus
done in 2 ms 35793660 mus

without

done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 1 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus

Leicas

@stickbreaker
Copy link
Contributor

stickbreaker commented Mar 1, 2018

@Leicas you are feeding a queue as fast as possible, you are draining the queue one element every 125us.
So, if the queue has 4095 elements and loop() starts adding 1000 more, each xQueueSend() will take 125us + task switch + queue work: So one execution of loop() will take 1000*(125us + task switch + xQueueSend() + xQueueRead())+ `loop() operations(Serial.Write() being the majority time waster)

Out of curiosity add

uint32_t count = uxQueueMessagesWaiting( QueueHandle_t xQueue ); // Return the number of messages stored in a queue.

and print the result in loop()

Chuck.

@Leicas
Copy link
Author

Leicas commented Mar 1, 2018

@stickbreaker Yeah, that's some leftover code, I used to have a function that would feed small amount of data to the queue, less than the queue size and at a much lower rate.

The problem is why does "micros()" stop counting microsecond when you use the xQueueSend()?
35793641micros is almost 36second and that doesn't reflect the time spent at all.

with the counter:

done in 4 ms 35794519 mus loop 4096
done in 3 ms 35794520 mus loop 4096
done in 3 ms 35794518 mus loop 4096
done in 3 ms 35794519 mus loop 4096
done in 3 ms 35794519 mus loop 4096
done in 3 ms 35794519 mus loop 4096

But by curiosity I removed the delayMicroseconds(125); inside the task, and the Micros() is making sense again.

@stickbreaker
Copy link
Contributor

@Leicas Well, it looks like delayMicroseconds() is not atomically handling its value.

your 3597419 is (2^32)/120 + actual us

add this to your println:

uint32_t netMic = deltaMic - ( UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);

For some reason the micros() is thinking a rollover has occurred?

Chuck.

@stickbreaker
Copy link
Contributor

@Leicas @me-no-dev I think I figured out what is happening. The micros() function grabs a cycle counter with: from esp32-hal-misc.c

    __asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );

This cycle count is a PER TASK count. So, depending on how long a task operates, its number changes.
If task 1 operates for 1500us then calls micros() which stores the 'current' microsecond tick(1500), then task 2 which has operated for 50us calls micros(), its tick is 50 which is less than 1500 so a ROLLOVER must have occurred!

To solve this each task needs a task local 'lastMicroSecondTick' and 'rolloverOffset'

Chuck.

@stickbreaker
Copy link
Contributor

@Leicas Here is a potential fix: add the __thread storage specifier to
micros() in esp32-hal-misc.c)

unsigned long IRAM_ATTR micros()
{
    static __thread unsigned long lccount = 0;
    static __thread unsigned long overflow = 0;
    unsigned long ccount;
    portENTER_CRITICAL_ISR(&microsMux);
    __asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );
    if(ccount < lccount){
        overflow += UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ;
    }
    lccount = ccount;
    portEXIT_CRITICAL_ISR(&microsMux);
    return overflow + (ccount / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);
}

This might not be the best way, because, it will reserve 8 bytes in every Task's Stack but it is simple!
The Pthread API might be a better long term solution:

Pthread API

The ESP-IDF provides the following pthread API to manage thtread local variables:

        pthread_key_create()
        pthread_key_delete()
        pthread_getspecific()
        pthread_setspecific()

Chuck.

@stickbreaker
Copy link
Contributor

@Leicas Well, __thread just caused Core Panics.

Chuck.

@stickbreaker
Copy link
Contributor

@Leicas Had to use pthread_keys:
This is what I changed micros() inside of cores\esp32\esp32-hal-misc.c

portMUX_TYPE microsMux = portMUX_INITIALIZER_UNLOCKED;
static pthread_key_t microsStore=NULL; //Thread local storage for rollover of micro() counter

void* microsStoreDelete(void * storage){  // called when a Task exits to release thread local storage
if(storage) free(storage);
}

unsigned long IRAM_ATTR micros(){

if (!microsStore){ // first Time Ever thread local not init'd
  portENTER_CRITICAL_ISR(&microsMux); // don't know if I really need it, but it only happens Once
  pthread_key_create(&microsStore,microsStoreDelete); // create initial holder
  portEXIT_CRITICAL_ISR(&microsMux);
  }
  
uint32_t *ptr;// [0] is lastCount, [1] is overFlow

ptr = pthread_getspecific(microsStore); // get address of storage
 
if(ptr == NULL){ // first time in this thread, allocate mem, init it.  
  portENTER_CRITICAL_ISR(&microsMux); // only happens once per thread (task)
  ptr = (uint32_t*)malloc(sizeof(uint32_t)*2);
  pthread_setspecific(microsStore,ptr); // store the pointer to this thread's values
  ptr[0] = 0; // lastCount value
  ptr[1] = 0; // overFlow
  portEXIT_CRITICAL_ISR(&microsMux);
  }  

unsigned long ccount;
portENTER_CRITICAL_ISR(&microsMux);
__asm__ __volatile__ ( "rsr     %0, ccount" : "=a" (ccount) );
if(ccount < ptr[0]){
  ptr[1] += UINT32_MAX / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ;
  }
ptr[0] = ccount;
portEXIT_CRITICAL_ISR(&microsMux);
return ptr[1] + (ccount / CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ);
}

Your original code now produces this:

done in 2 ms 1909 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1907 mus
done in 2 ms 1911 mus
done in 2 ms 1912 mus
done in 2 ms 1900 mus
done in 2 ms 1910 mus
done in 2 ms 1910 mus
done in 1 ms 1909 mus
done in 2 ms 1919 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1909 mus
done in 2 ms 1910 mus
done in 2 ms 1912 mus
done in 2 ms 1906 mus
done in 2 ms 1904 mus
done in 2 ms 1908 mus
done in 2 ms 1909 mus
done in 1 ms 1912 mus
done in 2 ms 1918 mus
done in 2 ms 1907 mus
done in 2 ms 1911 mus
done in 2 ms 1910 mus
done in 2 ms 1911 mus
done in 2 ms 1909 mus
done in 2 ms 1908 mus
done in 2 ms 1910 mus

@Leicas I had to change your consumerTask() to be more polite with other tasks:

void consumerTask( void * parameter){
int element;
uint32_t blockCount=0;
while(true) {
  if(!xQueueReceive(queue, &element, 0)){ // no data avail
    vTaskDelay(1); // let other task use this core
    failCount++;
    }
  else {
    delayMicroseconds(125);
    blockCount++;
    }
  if(blockCount>5000){ // be polite, let lower priority task have 1ms
    vTaskDelay(1); // task switch, wdt
    blockCount=0;
    }
  }
}

else I would get WDT timeouts on Core 0.


Task watchdog got triggered. The following tasks did not reset the watchdog in time:
 - IDLE (CPU 0)
Tasks currently running:
CPU 0: Consumer
CPU 1: loopTask

Chuck.

@Leicas
Copy link
Author

Leicas commented Mar 2, 2018 via email

@stickbreaker
Copy link
Contributor

@Leicas with more playing, it looks like 1ms is not enough time for the idle process to complete it's housekeeping, I kept getting 'out of stackspace' errors until I increased the vTaskDelay() to 10ms.

I don't know what the actual rules, recommendations are for sharing time with idle tasks. Just be aware if your loops are to tight you may need to design some vTaskDelay()'s into them.

Chuck.

@Leicas
Copy link
Author

Leicas commented Mar 5, 2018

@stickbreaker Everything is working with your last code changes.
Thank you a lot for your involvement !

@Leicas Leicas closed this as completed Mar 5, 2018
@Kasperdelasopa
Copy link

Hi together,

first, thanks for your work.

I still have problems with the delayMicroseconds() function.
I ported my code from an arduino uno to the esp32. On the Arduino i have implemented a function which played a melody by creating the sound frequencies with the delayMicroseconds() according to this tutorial:
https://www.arduino.cc/en/Tutorial/PlayMelody

This function worked fine for me on the Arduino.

But it looks to me that on the ESP using arduino-esp32 library the function to delay either block in total or it takes much longer than the expected microseconds to return.

FYI: im using PlatformIO for both projects (Arduino & ESP32) and i updated the arduino-esp32 lib there ti this branch.

Because i don't think it comes from this lib, because i cant be the first having this issue, i think i have setup my project wrong.

Does someone has an idea, where this behavior comes from?

@stickbreaker
Copy link
Contributor

The Esp32 is a multi tasking multi processor system. A call to delay() tells the OS to switch to another task and come back to this task later. You are going to have to change your program to us a Timer Interrupt.

Chuck.

@Kasperdelasopa
Copy link

Thx for your reply,

That makes totally sense to me.
I was able to solve it for me by using ledcwrite().

Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants