Skip to content

g_cont to unused space in SYS, gives extra 4K free #4553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

earlephilhower
Copy link
Collaborator

Use a linker option to hardcode the address of the g_cont context
variable to in a region of RAM allocated for SYS use but which is not
presently used by the SDK.

Gives an additional 4144 bytes of heap to all applications.

earlephilhower and others added 3 commits March 23, 2018 07:25
Use a linker option to hardcode the address of the g_cont context
variable to in a region of RAM allocated for SYS use but which is not
presently used.

Gives an additional 4K+ of heap to all apps.
@earlephilhower
Copy link
Collaborator Author

I'm sure there will be lively debate about this one since it's "borrowing" from a region the OS has reserved for itself, but which experimentally has been seen to be unused. Wanted to get it in before I forgot as it's so simple a change.

We can probably extend the stack to 5K for "free" with this as well since there's something like 5.2KB of unused space in the region.

@d-a-v
Copy link
Collaborator

d-a-v commented Mar 23, 2018

We could setup a debug mode in which we install a signature in it and check it every now and then ?
Maybe we could also ask some well placed people one knows @ espressif to check if this information can be obtained from the dev team ? :-]
Anyway thanks again for all your work/findings around those quite underground features!

@Adam5Wu
Copy link
Contributor

Adam5Wu commented Mar 23, 2018

I think this one needs a bit more delicate considerations and/or configurations.
Particularly, there will be user supplied code executing inside the sys context.

For example:

os_timer_setfn(_timer, reinterpret_cast<ETSTimerFunc*>(callback), reinterpret_cast<void*>(arg));

The callback supplied by user will be executed in the timer callback, which will use the sys context.
And if user code put too much data on its stack, e.g. declare a 4KB buffer for flash sector caching, then it will shoot the g_cont in its foot...

@devyte
Copy link
Collaborator

devyte commented Mar 23, 2018

@Adam5Wu you are right. However, declaring large objects like a 2KB array on the stack is against current programming guidelines.
Having said that, we need to have some canary for stack overflows.
What I would like to know is if there is any way to detect stack overflows, and do so in both stacks. If I remember correctly, it was proposed to detect a g_cont stack overflow by setting the single HW breakpoint at the end of the stack. Then, if the stack overflows, there is an access at that end address, and the breakpoint will jump. Is that viable?
So then here's another idea (sorry if it's stupid): if this breakpoint can be changed dynamically at runtime, and also set it at the beginning of the g_cont stack, shouldn't we be covered for detecting stack overflows in both cases?
Something like:
-on schedule, set it at the end of g_cont, so that if the g_cont stack overflows, it will complain
-on yield, set it at the beginning of g_cont, so that if the sys stack hits that point, it will complain

@Adam5Wu
Copy link
Contributor

Adam5Wu commented Mar 23, 2018

@devyte HW breakpoint sound like a great idea. I wasn't aware of this. :)
I guess I will dig more on lx106 features...

@Adam5Wu
Copy link
Contributor

Adam5Wu commented Mar 23, 2018

My current idea is, on yield, mark the g_cont range as protected. If any bytes were to be modified, a hardware exception will be triggered, and we can install a handler, which will remap (or copy) the data to a different memory region. And on schedule, we map (or copy) the g_cont back if it was remapped.

Remapping is only done on a case-by-case manner, that is, if sys cont did not overflow during a context switch, then no remapping is done so no performance penalty.

In this way, when sys context overflow happens, we can both get a notification, and (likely) keep everything continue to work, with a slight loss of performance.

Need to check the feasibility, though.

@earlephilhower
Copy link
Collaborator Author

Not sure how that's be possible without preallocating the move-to space and negating the purpose. From an exception handler you can't mess with the heap, no?

Maybe a simpler check to start:

  • On g_cont entrance move the breakpoint to the end of the stack to catch user code overflowing.
  • On g_cont exit/yield, move the breakpoint to the beginning of g_cont to catch SYS hitting it
    Either one would be fatal, I think, but you'd be able to get a direct cause and call stack printed out for debugging.

@devyte
Copy link
Collaborator

devyte commented Mar 23, 2018

My first thought wqs the same as @earlephilhower 's. Maybe we're misunderstanding your idea @Adam5Wu ?

@earlephilhower that's exactly what I said in my previous comment: move the hw breakpoint around on context switch. It does seem viable, right?

@Adam5Wu
Copy link
Contributor

Adam5Wu commented Mar 23, 2018

No, I think you guys are right in that we should start with a smaller and achievable goal.
My proposal is more complicated and should be done afterwards.

But my only concern is that, we don't really know for sure that the sys context memory region is just a stack. That is, overflow can only occur in one direction.

Either official confirmation from Espressif can clear up the understanding; or to be safe we need a region based hardware trap, not just a point based...

@earlephilhower
Copy link
Collaborator Author

@devyte Yup, my bad. We're on the same wavelength here. And, of course, if the exception can push values, we need to put the high water mark before the end/beginning to ensure that there's enough space to safely do whatever pushes needed in the exception handler + any functions it calls.

The topic at https://bbs.espressif.com/viewtopic.php?t=8879 has some experimental results on the stack size.

@earlephilhower
Copy link
Collaborator Author

I bet this will also break any GDB usage since there's only the one BP available in the core and my guess is GDB makes extensive use of it.

@devyte
Copy link
Collaborator

devyte commented Mar 23, 2018

About gdb, if the breakpoint can be set dynamically, it'd be just a matteer of unsetting, or re-setting it, from inside our gdb stubs, when gdb connects

@earlephilhower
Copy link
Collaborator Author

I can't figure out how to install my own exception vector and have it jump to the standard NMI one, but the following code can set a data BP on an arbitrary address. For stack, 0x3f should probably be changed to be a 16-byte window match).

void SetDataBreakpoint(void *addr) {
  (void) addr; 
  __asm__ ( "wsr a2, 144" ); // 144 = DBREAKA
  __asm__ ( "movi a2, 0x3f" );
  __asm__ ( "movi a3, 0x03" );
  __asm__ ( "slli a3, a3, 30" );
  __asm__ ( "or a3, a3, a2");
  __asm__ ( "wsr a3, 160" ); // 160 == DBREAKC
}

extern uint32_t _DebugExceptionVector;
extern uint32_t _NMIExceptionVector;

void InstallBreakHandler()
{
  Serial.printf("NMI: %08x, Debug: %08x\n",_NMIExceptionVector, _NMIExceptionVector);
  // Copy the standard exception handler
  uint32_t *d = (uint32_t*)(_DebugExceptionVector);
  uint32_t *u = (uint32_t*)(_NMIExceptionVector);
  *(d++) = *(u++);
  *(d++) = *(u++);
  *(d++) = *(u++);
  *(d++) = *(u++);
}

void setup() {
  uint32_t valid[100];
  uint32_t fatal[1];
  Serial.begin(115200);
  InstallBreakHandler();
  Serial.printf("Accessing valid...");
  Serial.flush();
  memset(valid, 123, sizeof(valid));
  Serial.printf("OK\n");
  Serial.printf("Accessing fatal...");
  Serial.flush();
  memset(fatal, 123, sizeof(fatal));
  Serial.printf("OK\n");
  Serial.printf("Installing DBREAK on fatal...");
  Serial.flush();
  SetDataBreakpoint(fatal);
  Serial.printf("OK\n");
  Serial.printf("Accessing fatal (should crash)...");
  Serial.flush();
  memset(fatal, 123, sizeof(fatal));
  Serial.printf("OK\n");
}

void loop() {
  // put your main code here, to run repeatedly:

}

@igrr
Copy link
Member

igrr commented Mar 24, 2018

If you don't mind holding this until Monday, I'm going to check whether the region is really only used as stack.

@earlephilhower
Copy link
Collaborator Author

That would be great, thanks!

The breakpoint stuff, though, can be useful even w/o this to report corruption when it happens instead of when the values destroyed in the heap are used.

@igrr
Copy link
Member

igrr commented Mar 26, 2018

After looking at the code, my recommendation is: instead of hardcoding the address, allocate the continuation structure on the sys stack. This would mean that g_cont would become a pointer, rather than a value, but the impact seems to be limited to just two files.

To place something on the sys stack, we can wrap call_user_start or call_user_start_local using an LD flag, then allocate the structure in the wrapper function. Another (nicer) option is to set our own entry point in the LD script, and call call_user_start from there.

I.e. option 1:

// Add linker flag, -Wl,-wrap,call_user_start
extern void __real_call_user_start();
extern void ICACHE_RAM_ATTR __wrap_call_user_start();

void ICACHE_RAM_ATTR __wrap_call_user_start()
{
    cont_t cont;
    g_cont = &cont;
    __real_call_user_start();
}

Option 2:

extern void call_user_start();

void _entry()
{
    // don't do anything fancy here: BSS is not zeroed out, and exception vectors still point to the ROM code
    cont_t cont;
    g_cont = &cont;
    call_user_start();
}

// In linker flags, replace -u call_user_start with -u _entry

// In eagle.app.v6.common.ld, replace ENTRY(call_user_start) with ENTRY(_entry)

An additional watchpoint-based stack overflow check (re-configured on context switch) would also be a nice improvement, but can be done in a separate PR.

@earlephilhower
Copy link
Collaborator Author

@igrr I can do that simply enough, but are you sure that there's no return from call_user_start (so the stack local passed in isn't ever deallocated? I don't know enough about the startup to really trace it myself.

@devyte
Copy link
Collaborator

devyte commented Mar 31, 2018

What's the status here? Do we understand what is to be done?

@earlephilhower
Copy link
Collaborator Author

@igrr I've not had any success with the wrapped functions for allocating g_cont on the SYS stack. Nothing but WDT resets without any debugging info. Are you sure the stack pointer register is even valid and pointing to the right spot at that point in device lifetime, and that it's not reset by the ROM/SDK code to the base later on (and therefore overwriting cont)? Thx!

@igrr
Copy link
Member

igrr commented Apr 3, 2018

Okay, i'll try to allocate some cycles and see what's wrong there.

@igrr
Copy link
Member

igrr commented Apr 7, 2018

@earlephilhower af09cb5 implements my proposal.

@earlephilhower
Copy link
Collaborator Author

Thanks, closing this one.

@Adam5Wu
Copy link
Contributor

Adam5Wu commented May 27, 2018

Found an issue with moving g_cont to sys stack -- SDK WPS function is broken.

It is because WPS involves pubkey crypto, which seems to rely on a sizable chunk of stack, which is taken away by g_cont. This results in using WPS always lead to wdt reset.

Other SDK functions that involves heavy crypto might also be affected (airkiss? smartconfig? mesh?)
Need more people and test cases to check...

@d-a-v
Copy link
Collaborator

d-a-v commented Jul 5, 2018

@igrr @Adam5Wu @earlephilhower
Can you please have a look to this explanation (#4889/files) ?

edit: cont_t is not simply a struct and really contains the stack itself. I did not realized that before, sorry for the noise.
I was wondering if we made user's stack (in sys stack) start a little further (let's say a few tens or hundreds of bytes), then wps(+ticker+lwip?) would not overwrite users's stack, provided that they do not push big chunks of data on it ?
That would also allow cont_check() to verify whether stack guards are poisoned (by WPS or whatever else).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants