Skip to content

any timing is lost due interrupt disabling by wifi driver. Please, keep interrupt enabled (GIT8266O-186) #626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dsptech opened this issue Jun 12, 2019 · 9 comments

Comments

@dsptech
Copy link

dsptech commented Jun 12, 2019

Environment

  • Development Kit: NodeMCU
  • IDF version : 4c4bab2
  • Development Env: [Eclipse]
  • Operating System: [Windows]
  • Power Supply: [USB]

related to: #616

Problem Description

The wifi driver totally disable the level 1 interrupts for a great amount of time, up to 650 - 1500 ms (variable upon project configuration, test condition: STA mode, correct SSID, wrong password).
In such latency g_os_ticks receive one increment only, but esp_timer_get_time expect one increment every 1/CONFIG_FREERTOS_HZ time period instead, so, an incorrect time is returned.
Also freeRtos does not allow missing calls to xTaskIncrementTick() (see port.c), so, freeRtos loose the time coerency and any timed function work erroneusly.

EDIT: further tests are required.
Note: the Arduino framework (NONOS 2.2.1, 2.2.2 and pre 3.0.0) does not disable the interrupts (timer 1 intrrupt work correctly in any condition) so I think about a bug.

I'm still working on a workround to mitigate the timing issue, but I think that keep the level1 interrupt enabled is a better solution, due the wifi driver that already suspend the freertos scheduler via the right call.

The attached code detect the system latency using ccount, and show the coerency loss of esp_timer_get_time and xTaskGetTickCount. (I also checked the latency using a logic analyzer. The reported latency is true.)

#include <string.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "rom/ets_sys.h"
#include "esp_wifi.h"
#include "esp_event_loop.h"
#include "esp_log.h"
#include "esp_timer.h"
#include "nvs_flash.h"

//worst case for latency: correct SSID and wrong password.
#define EXAMPLE_ESP_WIFI_SSID      "correctSSID"
#define EXAMPLE_ESP_WIFI_PASS      "wrongPASS"

#ifdef CONFIG_ESP8266_DEFAULT_CPU_FREQ_160
#define CPU_MHZ 160
#else
#define CPU_MHZ 80
#endif


const char TAG1[] = "WiFiTest";

static esp_err_t event_handler(void *ctx, system_event_t *event) {
    switch (event->event_id) {
        case SYSTEM_EVENT_STA_START:
            esp_wifi_set_protocol(WIFI_IF_STA, WIFI_PROTOCOL_11B | WIFI_PROTOCOL_11G);
            esp_wifi_connect();
            break;
        case SYSTEM_EVENT_STA_GOT_IP:
            ESP_LOGI(TAG1, "got ip:%s", ip4addr_ntoa(&event->event_info.got_ip.ip_info.ip));
            break;
        case SYSTEM_EVENT_STA_DISCONNECTED:

            break;
        default:
            break;
    }
    return ESP_OK;
}


void app_main() {
    //Initialize NVS

    esp_err_t ret = nvs_flash_init();
    if (ret == ESP_ERR_NVS_NO_FREE_PAGES) {
        ESP_ERROR_CHECK(nvs_flash_erase());
        ret = nvs_flash_init();
    }
    ESP_ERROR_CHECK(ret);

    //this task has the highest prority. Lower priority to the user level.
    vTaskPrioritySet(NULL, 1);

    ESP_LOGI(TAG1, "Start Wait 5s");
    vTaskDelay(pdMS_TO_TICKS(5000));
    ESP_LOGI(TAG1, "End Wait");

    tcpip_adapter_init();
    ESP_ERROR_CHECK(esp_event_loop_init(event_handler, NULL));

    //stop wifi if autostarted
    esp_wifi_set_storage(WIFI_STORAGE_RAM); //don't care about errors
    esp_wifi_deinit();

    vTaskDelay(pdMS_TO_TICKS(200));

    wifi_init_config_t cfg = WIFI_INIT_CONFIG_DEFAULT();

    ESP_ERROR_CHECK(esp_wifi_init(&cfg));

    ESP_ERROR_CHECK(esp_wifi_set_storage(WIFI_STORAGE_RAM)); //avoid delays by flash operations
    ESP_ERROR_CHECK(esp_wifi_set_mode(WIFI_MODE_STA));
    ESP_ERROR_CHECK(esp_wifi_set_ps(WIFI_PS_NONE));

    wifi_config_t wifi_config = {.sta = {.ssid = EXAMPLE_ESP_WIFI_SSID, .password = EXAMPLE_ESP_WIFI_PASS},};
    ESP_ERROR_CHECK(esp_wifi_set_config(ESP_IF_WIFI_STA, &wifi_config));

    ESP_ERROR_CHECK(esp_wifi_start());

    //required for latency detection
    uint32_t now_cc, elapsed_cc, lastCheck_cc; //from ccount
    int64_t now_us,elapsed_us, lastCheck_us; //from esp_timer_get_time()
    TickType_t now_ticks, elapsed_ticks, lastCheck_ticks; //from xTaskGetTickCount

    //initialize
    lastCheck_cc = soc_get_ticks();
    lastCheck_us = esp_timer_get_time();
    lastCheck_ticks = xTaskGetTickCount();

    //required for timestamps
    uint32_t startprint_cc,lastprint_cc;
    int64_t startprint_us;
    TickType_t startprint_ticks;

    //initialize
    startprint_cc = lastCheck_cc;
    lastprint_cc = startprint_cc - CPU_MHZ * 1000000; //force immediate printing
    startprint_us = lastCheck_us;
    startprint_ticks = lastCheck_ticks;

    for (;;) {
        vTaskDelay(1);

        //detect latency ---------------
        now_cc = soc_get_ticks();
        now_us = esp_timer_get_time();
        now_ticks = xTaskGetTickCount();

        elapsed_cc = now_cc - lastCheck_cc;
        elapsed_us = now_us - lastCheck_us;
        elapsed_ticks = now_ticks - lastCheck_ticks;

        if (elapsed_cc > 50 * (CPU_MHZ * 1000))
            ESP_LOGI(TAG1, "****** Latency detected by ccount :%u ms", (uint32_t)(elapsed_cc/(CPU_MHZ*1000)));
        if (elapsed_us > 50000)
            ESP_LOGI(TAG1, "****** Latency detected by esp_timer_get_time :%u ms", (uint32_t) elapsed_us/1000);
        if (elapsed_ticks > pdMS_TO_TICKS(50))
            ESP_LOGI(TAG1, "****** Latency detected by xTaskGetTickCount :%u ms", (uint32_t) (elapsed_ticks * 1000/configTICK_RATE_HZ));

        lastCheck_cc = now_cc;
        lastCheck_us = now_us;
        lastCheck_ticks = now_ticks;

        //print time--------------------

        if (now_cc - lastprint_cc >= CPU_MHZ * 1000000){ //wait 1sec from last print
            uint32_t timestamp_cc_ms, timestamp_us_ms, timestamp_ticks_ms;
            lastprint_cc = now_cc;
            timestamp_cc_ms = (now_cc - startprint_cc)/(CPU_MHZ*1000); //it rollback after 26secs (@160Mhz)
            timestamp_us_ms = (now_us - startprint_us)/1000; // conversion to millis from esp_timer_get_time micros;
            timestamp_ticks_ms = (now_ticks - startprint_ticks)*1000/configTICK_RATE_HZ;

            ESP_LOGI(TAG1, "Timestamps milliseconds ----------------------");
            ESP_LOGI(TAG1, "ccount: %u, esp_timer: %u, xTaskGetTickCount %u",
                timestamp_cc_ms,timestamp_us_ms,timestamp_ticks_ms);
            ESP_LOGI(TAG1, "esp_timer vs ccount: %d,  xTaskGetTickCount vs ccount:%d",
                (int32_t) (timestamp_us_ms - timestamp_cc_ms), (int32_t) (timestamp_ticks_ms - timestamp_cc_ms));
        }
    }
}

Debug Logs

(........)
I (351) reset_reason: RTC reset 2 wakeup 0 store 0, reason is 2
I (351) WiFiTest: Start Wait 5s
I (5355) WiFiTest: End Wait
I (5559) WiFiTest: Timestamps milliseconds ----------------------
I (5559) WiFiTest: ccount: 0, esp_timer: 0, xTaskGetTickCount 4
I (5559) WiFiTest: esp_timer vs ccount: 0,  xTaskGetTickCount vs ccount:4
I (6279) WiFiTest: ****** Latency detected by ccount :680 ms
I (6279) WiFiTest: Timestamps milliseconds ----------------------
I (6279) WiFiTest: ccount: 1399, esp_timer: 720, xTaskGetTickCount 724
I (6287) WiFiTest: esp_timer vs ccount: -679,  xTaskGetTickCount vs ccount:-675
I (7279) WiFiTest: Timestamps milliseconds ----------------------
I (7279) WiFiTest: ccount: 2400, esp_timer: 1720, xTaskGetTickCount 1724
I (7279) WiFiTest: esp_timer vs ccount: -680,  xTaskGetTickCount vs ccount:-676
I (8279) WiFiTest: Timestamps milliseconds ----------------------
I (8279) WiFiTest: ccount: 3400, esp_timer: 2720, xTaskGetTickCount 2724
I (8279) WiFiTest: esp_timer vs ccount: -680,  xTaskGetTickCount vs ccount:-676
(....)
@github-actions github-actions bot changed the title any timing is lost due interrupt disabling by wifi driver. Please, keep interrupt enabled any timing is lost due interrupt disabling by wifi driver. Please, keep interrupt enabled (GIT8266O-186) Jun 12, 2019
@DKrepsky
Copy link

I think this is related to #240.

@donghengqaz
Copy link
Collaborator

I am trying my best to resolve the problem, but this is so hard because we should modify so much code. When finishing this work, I will pin a note on the issues.

@dsptech
Copy link
Author

dsptech commented Jun 14, 2019

@donghengqaz
I've already rewritten the interrupt handler to keep correctly updated g_os_tick, g_cpu_tick and xTickCount even for long latency events (and correct also little drift on the ticks too).
This allow to get correct values from esp_timer_get_time(), xTaskGetTickCount() and remove some issue in the freertos timed functions.
I'm still testing it. If you consider it useful, I could share the handler in few hours.

@dsptech
Copy link
Author

dsptech commented Jun 14, 2019

Hi,
I tested my handler for 2h (@160Mhz, -Os, full cache). In a that time, one latency of 680ms was produced every 10 secs. Esp_timer_get_time() was always in averaged sync against CCOUNT (1us resolution) with a not correlated max deviation of -2 .. +2 us.
Also xTaskGetTickCount() has returned always sycronized values (4ms resolution due Freertos 250Hz frequency).
I attached my handler here. If it could be useful, feel free to use it (use it as replacement of the handler in port.c). Any feedback to improve it is appreciated.

void TASK_SW_ATTR xPortSysTickHandle(void *p) {
    //extern esp_tick_t g_cpu_ticks;
    //extern uint64_t g_os_ticks;
    uint32_t missed_ticks = 0;

    /* notes:
     *   A) it is expected no change of _xt_tick_divisor value inside the handler (by NMI)
     *   B) it is also expected that CCOMPARE is not updated elsewhere.
     *   TODO: check B against runtime CPU frequency changes (80 <- -> 160Mhz).
     *   TODO: check the handler with sleep condition.
     */

    uint32_t new_ccompare = soc_get_ccompare() + _xt_tick_divisor;
    for (;;) {
        //OK, even if NMI shot here!
        soc_set_ccompare(new_ccompare);

        //OK, even if NMI shot here!
        uint32_t cpu_ccount = soc_get_ccount();

        //OK, even if NMI shot here!
        if (((int32_t) (cpu_ccount - new_ccompare) < 0) || (soc_get_int_mask() & (1ul << ETS_MAX_INUM))) {
            /*
             * case 1 --- no missing ticks.
             *   A) 1st condition: new_compare is valid and not yet triggered
             *   B) 2nd condition: new_compare is valid and already triggered (extremely low probability). The caller function will call this function again.
             */
            break;
        } else if ((cpu_ccount - new_ccompare) < _xt_tick_divisor) {
            /*
             * case 2: --- single interrupt tick missed,
             * to manage situations like: https://github.com/espressif/ESP8266_RTOS_SDK/issues/240 using a small amount of cpu time.
             */
            missed_ticks += 1;
            new_ccompare += _xt_tick_divisor;  //it will be checked in the next loop            
        } else {
            /* case 3: bigger latency (actually expected only by esp_wifi_connect() ),
             * to manage situations like: https://github.com/espressif/ESP8266_RTOS_SDK/issues/626
             * it is expected that the execution will never return here
             * (if extremely long latencies by NMI cannot occur)
             */
            uint32_t ticks;
            uint32_t cpuTime;
            ticks = (cpu_ccount - new_ccompare) / _xt_tick_divisor + 1;
            missed_ticks += ticks;
            new_ccompare += _xt_tick_divisor * ticks;  //it will be checked in the next loop .
        }
    } //end for

    if (missed_ticks) {
        /* Apply correction to Freertos. Prerequisite: configUSE_TICKLESS_IDLE != 0
         * note: this correction could raise an assert if configASSERT is enabled. TODO: need investigation inside task.c
         */
        vTaskStepTick(missed_ticks); //also perform g_os_ticks+=missed_ticks. We will discard the update on g_cpu_ticks
    }

    g_os_ticks += 1;
    g_cpu_ticks = new_ccompare - _xt_tick_divisor; //esp_timer_get_time() expect exactly this value.

    //normal freertos duty
    if (xTaskIncrementTick() != pdFALSE) {
        vTaskSwitchContext();
    }
}

@donghengqaz
Copy link
Collaborator

Thanks for your generous sharing, and I am starting to fix the problem, I have planed to solve this at release/3.3.

@dsptech
Copy link
Author

dsptech commented Jul 20, 2019

Hi @donghengqaz,
I'm happy to hear that.
Thank you.

@dsptech
Copy link
Author

dsptech commented Nov 24, 2019

Hi @donghengqaz,
I updated my framework today, so, I seen the changes occurred in xPortSysTickHandle and esp_timer_get_time.

It seems to me that the new implementation might produces issues due:
1a) the update of g_esp_os_us occur in the interrupt handler
1b) a 64bit "not atomic" read of g_esp_os_us is performed in esp_timer_get_time() and esp_log_early_timestamp() with interrupts enabled
(EDIT: even if the probability is very low because the issue might be triggered one time every 4295secs).

  1. ccount is cleared every interrupt tick after a number of operation, so I think that the operation could produce a drift in the time.

What do you think about it ?

@clbr
Copy link

clbr commented Apr 3, 2020

Hi @donghengqaz,
Any updates on making the wifi not disable interrupts?

I need the GPIO interrupts to respond within 4 us, and everything works when the wifi is disconnected, but as soon as it tries to connect, massive delays. I tried additional waits up to 40us, still not enough, and then I find this ticket, saying the lag can be up to 900 ms (!).

@everyone, the OP post says NONOS pre-3.0.0 did not have this bug. Can anyone confirm whether the current NONOS SDK master has the bug? I think I could rewrite my app for the non-os SDK if it's known to work.

@dsptech
Copy link
Author

dsptech commented Apr 3, 2020

Hi @clbr,
regarding "NONOS pre-3.0.0...", I tested the Timer1 (default configuration) on Arduino enviroment only. Few time ago I discovered that non-OS sdk allow Timer 1 to work in NMI mode too (RTOS does not allow that), so I can't confirm my original sentence anymore. More investigation is required.
Sorry for the inconvenience. I will edit the original message immediately.
Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants