g_cont to unused space in SYS, gives extra 4K free #4553

earlephilhower · 2018-03-23T14:31:54Z

Use a linker option to hardcode the address of the g_cont context
variable to in a region of RAM allocated for SYS use but which is not
presently used by the SDK.

Gives an additional 4144 bytes of heap to all applications.

Use a linker option to hardcode the address of the g_cont context variable to in a region of RAM allocated for SYS use but which is not presently used. Gives an additional 4K+ of heap to all apps.

earlephilhower · 2018-03-23T16:02:27Z

I'm sure there will be lively debate about this one since it's "borrowing" from a region the OS has reserved for itself, but which experimentally has been seen to be unused. Wanted to get it in before I forgot as it's so simple a change.

We can probably extend the stack to 5K for "free" with this as well since there's something like 5.2KB of unused space in the region.

d-a-v · 2018-03-23T16:58:30Z

We could setup a debug mode in which we install a signature in it and check it every now and then ?
Maybe we could also ask some well placed people one knows @ espressif to check if this information can be obtained from the dev team ? :-]
Anyway thanks again for all your work/findings around those quite underground features!

Adam5Wu · 2018-03-23T16:59:01Z

I think this one needs a bit more delicate considerations and/or configurations.
Particularly, there will be user supplied code executing inside the sys context.

For example:

Arduino/libraries/Ticker/Ticker.cpp

Line 56 in 559cb35

    
           os_timer_setfn(_timer, reinterpret_cast<ETSTimerFunc*>(callback), reinterpret_cast<void*>(arg));

The callback supplied by user will be executed in the timer callback, which will use the sys context.
And if user code put too much data on its stack, e.g. declare a 4KB buffer for flash sector caching, then it will shoot the g_cont in its foot...

devyte · 2018-03-23T17:39:41Z

@Adam5Wu you are right. However, declaring large objects like a 2KB array on the stack is against current programming guidelines.
Having said that, we need to have some canary for stack overflows.
What I would like to know is if there is any way to detect stack overflows, and do so in both stacks. If I remember correctly, it was proposed to detect a g_cont stack overflow by setting the single HW breakpoint at the end of the stack. Then, if the stack overflows, there is an access at that end address, and the breakpoint will jump. Is that viable?
So then here's another idea (sorry if it's stupid): if this breakpoint can be changed dynamically at runtime, and also set it at the beginning of the g_cont stack, shouldn't we be covered for detecting stack overflows in both cases?
Something like:
-on schedule, set it at the end of g_cont, so that if the g_cont stack overflows, it will complain
-on yield, set it at the beginning of g_cont, so that if the sys stack hits that point, it will complain

Adam5Wu · 2018-03-23T17:48:29Z

@devyte HW breakpoint sound like a great idea. I wasn't aware of this. :)
I guess I will dig more on lx106 features...

Adam5Wu · 2018-03-23T18:02:12Z

My current idea is, on yield, mark the g_cont range as protected. If any bytes were to be modified, a hardware exception will be triggered, and we can install a handler, which will remap (or copy) the data to a different memory region. And on schedule, we map (or copy) the g_cont back if it was remapped.

Remapping is only done on a case-by-case manner, that is, if sys cont did not overflow during a context switch, then no remapping is done so no performance penalty.

In this way, when sys context overflow happens, we can both get a notification, and (likely) keep everything continue to work, with a slight loss of performance.

Need to check the feasibility, though.

earlephilhower · 2018-03-23T18:15:35Z

Not sure how that's be possible without preallocating the move-to space and negating the purpose. From an exception handler you can't mess with the heap, no?

Maybe a simpler check to start:

On g_cont entrance move the breakpoint to the end of the stack to catch user code overflowing.
On g_cont exit/yield, move the breakpoint to the beginning of g_cont to catch SYS hitting it
Either one would be fatal, I think, but you'd be able to get a direct cause and call stack printed out for debugging.

devyte · 2018-03-23T18:28:26Z

My first thought wqs the same as @earlephilhower 's. Maybe we're misunderstanding your idea @Adam5Wu ?

@earlephilhower that's exactly what I said in my previous comment: move the hw breakpoint around on context switch. It does seem viable, right?

Adam5Wu · 2018-03-23T18:35:28Z

No, I think you guys are right in that we should start with a smaller and achievable goal.
My proposal is more complicated and should be done afterwards.

But my only concern is that, we don't really know for sure that the sys context memory region is just a stack. That is, overflow can only occur in one direction.

Either official confirmation from Espressif can clear up the understanding; or to be safe we need a region based hardware trap, not just a point based...

earlephilhower · 2018-03-23T19:55:52Z

@devyte Yup, my bad. We're on the same wavelength here. And, of course, if the exception can push values, we need to put the high water mark before the end/beginning to ensure that there's enough space to safely do whatever pushes needed in the exception handler + any functions it calls.

The topic at https://bbs.espressif.com/viewtopic.php?t=8879 has some experimental results on the stack size.

earlephilhower · 2018-03-23T19:58:14Z

I bet this will also break any GDB usage since there's only the one BP available in the core and my guess is GDB makes extensive use of it.

devyte · 2018-03-23T23:17:18Z

About gdb, if the breakpoint can be set dynamically, it'd be just a matteer of unsetting, or re-setting it, from inside our gdb stubs, when gdb connects

earlephilhower · 2018-03-24T02:05:20Z

I can't figure out how to install my own exception vector and have it jump to the standard NMI one, but the following code can set a data BP on an arbitrary address. For stack, 0x3f should probably be changed to be a 16-byte window match).

void SetDataBreakpoint(void *addr) {
  (void) addr; 
  __asm__ ( "wsr a2, 144" ); // 144 = DBREAKA
  __asm__ ( "movi a2, 0x3f" );
  __asm__ ( "movi a3, 0x03" );
  __asm__ ( "slli a3, a3, 30" );
  __asm__ ( "or a3, a3, a2");
  __asm__ ( "wsr a3, 160" ); // 160 == DBREAKC
}

extern uint32_t _DebugExceptionVector;
extern uint32_t _NMIExceptionVector;

void InstallBreakHandler()
{
  Serial.printf("NMI: %08x, Debug: %08x\n",_NMIExceptionVector, _NMIExceptionVector);
  // Copy the standard exception handler
  uint32_t *d = (uint32_t*)(_DebugExceptionVector);
  uint32_t *u = (uint32_t*)(_NMIExceptionVector);
  *(d++) = *(u++);
  *(d++) = *(u++);
  *(d++) = *(u++);
  *(d++) = *(u++);
}

void setup() {
  uint32_t valid[100];
  uint32_t fatal[1];
  Serial.begin(115200);
  InstallBreakHandler();
  Serial.printf("Accessing valid...");
  Serial.flush();
  memset(valid, 123, sizeof(valid));
  Serial.printf("OK\n");
  Serial.printf("Accessing fatal...");
  Serial.flush();
  memset(fatal, 123, sizeof(fatal));
  Serial.printf("OK\n");
  Serial.printf("Installing DBREAK on fatal...");
  Serial.flush();
  SetDataBreakpoint(fatal);
  Serial.printf("OK\n");
  Serial.printf("Accessing fatal (should crash)...");
  Serial.flush();
  memset(fatal, 123, sizeof(fatal));
  Serial.printf("OK\n");
}

void loop() {
  // put your main code here, to run repeatedly:

}

igrr · 2018-03-24T02:19:59Z

If you don't mind holding this until Monday, I'm going to check whether the region is really only used as stack.

earlephilhower · 2018-03-24T03:44:32Z

That would be great, thanks!

The breakpoint stuff, though, can be useful even w/o this to report corruption when it happens instead of when the values destroyed in the heap are used.

igrr · 2018-03-26T04:07:14Z

After looking at the code, my recommendation is: instead of hardcoding the address, allocate the continuation structure on the sys stack. This would mean that g_cont would become a pointer, rather than a value, but the impact seems to be limited to just two files.

To place something on the sys stack, we can wrap call_user_start or call_user_start_local using an LD flag, then allocate the structure in the wrapper function. Another (nicer) option is to set our own entry point in the LD script, and call call_user_start from there.

I.e. option 1:

// Add linker flag, -Wl,-wrap,call_user_start
extern void __real_call_user_start();
extern void ICACHE_RAM_ATTR __wrap_call_user_start();

void ICACHE_RAM_ATTR __wrap_call_user_start()
{
    cont_t cont;
    g_cont = &cont;
    __real_call_user_start();
}

Option 2:

extern void call_user_start();

void _entry()
{
    // don't do anything fancy here: BSS is not zeroed out, and exception vectors still point to the ROM code
    cont_t cont;
    g_cont = &cont;
    call_user_start();
}

// In linker flags, replace -u call_user_start with -u _entry

// In eagle.app.v6.common.ld, replace ENTRY(call_user_start) with ENTRY(_entry)

An additional watchpoint-based stack overflow check (re-configured on context switch) would also be a nice improvement, but can be done in a separate PR.

earlephilhower · 2018-03-26T16:09:22Z

@igrr I can do that simply enough, but are you sure that there's no return from call_user_start (so the stack local passed in isn't ever deallocated? I don't know enough about the startup to really trace it myself.

devyte · 2018-03-31T04:24:33Z

What's the status here? Do we understand what is to be done?

earlephilhower · 2018-04-02T14:43:18Z

@igrr I've not had any success with the wrapped functions for allocating g_cont on the SYS stack. Nothing but WDT resets without any debugging info. Are you sure the stack pointer register is even valid and pointing to the right spot at that point in device lifetime, and that it's not reset by the ROM/SDK code to the base later on (and therefore overwriting cont)? Thx!

igrr · 2018-04-03T04:34:36Z

Okay, i'll try to allocate some cycles and see what's wrong there.

igrr · 2018-04-07T06:47:28Z

@earlephilhower af09cb5 implements my proposal.

earlephilhower · 2018-04-07T17:54:13Z

Thanks, closing this one.

Adam5Wu · 2018-05-27T04:57:38Z

Found an issue with moving g_cont to sys stack -- SDK WPS function is broken.

It is because WPS involves pubkey crypto, which seems to rely on a sizable chunk of stack, which is taken away by g_cont. This results in using WPS always lead to wdt reset.

Other SDK functions that involves heavy crypto might also be affected (airkiss? smartconfig? mesh?)
Need more people and test cases to check...

d-a-v · 2018-07-05T10:12:19Z

@igrr @Adam5Wu @earlephilhower
Can you please have a look to this explanation (#4889/files) ?

edit: cont_t is not simply a struct and really contains the stack itself. I did not realized that before, sorry for the noise.
I was wondering if we made user's stack (in sys stack) start a little further (let's say a few tens or hundreds of bytes), then wps(+ticker+lwip?) would not overwrite users's stack, provided that they do not push big chunks of data on it ?
That would also allow cont_check() to verify whether stack guards are poisoned (by WPS or whatever else).

earlephilhower and others added 3 commits March 23, 2018 07:25

g_cont to unused space in SYS, gives extra 4K free

17bf98c

Use a linker option to hardcode the address of the g_cont context variable to in a region of RAM allocated for SYS use but which is not presently used. Gives an additional 4K+ of heap to all apps.

Merge branch 'master' into cont_to_sys

48c48ea

Merge branch 'master' into cont_to_sys

675e4de

Merge branch 'master' into cont_to_sys

107223d

earlephilhower closed this Apr 7, 2018

earlephilhower deleted the cont_to_sys branch April 7, 2018 17:54

igrr mentioned this pull request Apr 8, 2018

Move continuation stack from .bss onto sys stack #4622

Merged

Adam5Wu mentioned this pull request May 27, 2018

Manage 2nd stack allocation in-library w/refcnts earlephilhower/bearssl-esp8266#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

g_cont to unused space in SYS, gives extra 4K free #4553

g_cont to unused space in SYS, gives extra 4K free #4553

earlephilhower commented Mar 23, 2018

earlephilhower commented Mar 23, 2018

d-a-v commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018 •

edited

Loading

devyte commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018 •

edited

Loading

earlephilhower commented Mar 23, 2018

devyte commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018

earlephilhower commented Mar 23, 2018

earlephilhower commented Mar 23, 2018

devyte commented Mar 23, 2018

earlephilhower commented Mar 24, 2018

igrr commented Mar 24, 2018

earlephilhower commented Mar 24, 2018

igrr commented Mar 26, 2018 •

edited

Loading

earlephilhower commented Mar 26, 2018

devyte commented Mar 31, 2018

earlephilhower commented Apr 2, 2018

igrr commented Apr 3, 2018

igrr commented Apr 7, 2018

earlephilhower commented Apr 7, 2018

Adam5Wu commented May 27, 2018

d-a-v commented Jul 5, 2018 •

edited

Loading

g_cont to unused space in SYS, gives extra 4K free #4553

g_cont to unused space in SYS, gives extra 4K free #4553

Conversation

earlephilhower commented Mar 23, 2018

earlephilhower commented Mar 23, 2018

d-a-v commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018 • edited Loading

devyte commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018 • edited Loading

earlephilhower commented Mar 23, 2018

devyte commented Mar 23, 2018

Adam5Wu commented Mar 23, 2018

earlephilhower commented Mar 23, 2018

earlephilhower commented Mar 23, 2018

devyte commented Mar 23, 2018

earlephilhower commented Mar 24, 2018

igrr commented Mar 24, 2018

earlephilhower commented Mar 24, 2018

igrr commented Mar 26, 2018 • edited Loading

earlephilhower commented Mar 26, 2018

devyte commented Mar 31, 2018

earlephilhower commented Apr 2, 2018

igrr commented Apr 3, 2018

igrr commented Apr 7, 2018

earlephilhower commented Apr 7, 2018

Adam5Wu commented May 27, 2018

d-a-v commented Jul 5, 2018 • edited Loading

Adam5Wu commented Mar 23, 2018 •

edited

Loading

Adam5Wu commented Mar 23, 2018 •

edited

Loading

igrr commented Mar 26, 2018 •

edited

Loading

d-a-v commented Jul 5, 2018 •

edited

Loading