Skip to content

C++ Exceptions are broken due to PROGMEM char reads #6305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
earlephilhower opened this issue Jul 15, 2019 · 8 comments
Closed

C++ Exceptions are broken due to PROGMEM char reads #6305

earlephilhower opened this issue Jul 15, 2019 · 8 comments

Comments

@earlephilhower
Copy link
Collaborator

As discussed on the gitter channel, C++ exceptions (as in throw and catch, not *(int*)0=0) are currently busted in the release.

The exception code needs to parse byte-packed structures which are stored in PROGMEM. When exceptions were first added, we were at SDK pre3.0.0 which included a fixup misaligned access handler which allowed this without anything over than a very large performance penalty.

Now that we're back to 2.x.x, decoding exception info results in LoadStoreErrors.

The fix is to backport the patches to the GCC toolchain build process in #6294 (and any others required).

@TD-er
Copy link
Contributor

TD-er commented Jul 16, 2019

Is it possible these alignment issues can also result in a very unpredictable stability per build?
For example last night's build of ESPeasy was again unusable (at least for the "normal" build) while it was working just fine when I built it on my desktop PC.
I've reported such build issues before but they are very hard to reproduce (obviously) and rather frustrating.

It is just like some randomness is present in the builds. Just changing a string somewhere (literally anywhere) may be enough to make it unable to connect to WiFi or unable to work with I2C or something else.

@earlephilhower
Copy link
Collaborator Author

No, the .eh_frames aren't even present in the build unless you enable exceptions. Even if they're present, nobody touches them until you get a throw() somewhere. It's a very repeatable crash when they are used, always in the C++ exception handler guts.

What you've got there is, I think, just run of the mill memory corruption from somewhere.

@TD-er
Copy link
Contributor

TD-er commented Jul 16, 2019

Memory corruption where?
Build environment?
Or bugs in the code?

@earlephilhower
Copy link
Collaborator Author

Bugs in the app or library code would be my first guess. It's not related to C++ exceptions, whatever it is.

@TD-er
Copy link
Contributor

TD-er commented Jul 16, 2019

I was more referring to this part:

Now that we're back to 2.x.x, decoding exception info results in LoadStoreErrors.

I see reports of these also on parts of the code where PROGMEM is used. After removing the PROGMEM part it does seem to work fine.
But on the other hand, if this is related to the start of such a block (thus alignment), it may be somewhat random per build whether this will lead to crashes.
And since such randomness is also observed with respect to WiFi functioning, it might be related?

@earlephilhower
Copy link
Collaborator Author

earlephilhower commented Jul 16, 2019

Ah, I think I get your question now.

In pre3.0.0 there was a HW LoadStoreError exception handler (unrelated to C++ exceptions, think of this like an IRQ or SYSCALL) which transparently fixed up non-32b sized reads from PROGMEM and let the main app continue. So you could do byte reads of a PSTR() using "while (*c++)...", etc. But pre3.0.0 would freeze in the binary blob for 100s of ms, causing everything else to behave poorly, so it was undone and we're back at 2.2.(yz?).

The start of blocks in progmem really doesn't matter. Most objects are 32-bit aligned in flash, including PSTRs and functions. However, the issue isn't the alignment so much as the "read less than 32-bits" which is causing the exception.

It doesn't cause flakiness, it causes complete crashes (i.e. the LoadStoreError HW exception). If you get one, you can use the ESPExceptionDecoder to finf the faulting line and work backwards to find the cause. Moving from PMEM to heap(in RAM) is another way to avoid the issue, as you mentioned.

But again, it's not going to be random per-build. The HW exception is a hard stop to the system, and it doesn't matter the alignment, only that you're non doing a l32(read32bits) from PMEM...

@TD-er
Copy link
Contributor

TD-er commented Jul 16, 2019

Code compiled on the same platform (Windows/Linux) with the same code does seem to produce the same result. So that's deterministic.
But what I'm seeing is that builds with code changes in unrelated parts of the code base do behave differently. And also change of platform with the same code may differ in functionality.
WiFi crashes almost never result in something that can be used in the Exception Decoder. (WD reboots, no stack trace)
Also I have tried several times but never got the Exception Decoder to work (have not tried it in the last few months though), so I am running in the dark here.

Do these delays in execution also happen on strings marked with the F() macro? (when the length is not modulo 4 bytes)

@earlephilhower
Copy link
Collaborator Author

Pre3.0.0 had bad issues in the blobs (and blobs are not used for any PSTR/F accesses), so was removed a long time ago. Inside the blob itself it would take 100s of ms just to call back to the main core loop, with no info on why.

The pgm_read_byte macros() simply emit assembly for 32bit reads to the aligned address and then shift the result accordingly. They're just standard instructions as far as the CPU is concerned, and they run at normal speed always. Just look at pgmspace.h to see the actual instructions.

If it's WiFi powerdown/disconnect/reconnect, there seems to be hangs in the blob and, when WDT kicks the reset switch, there's no machine state saved (since it's a hard reset and not a nice HW exception) so you can't dump anything. There's a whole bug on that being tracked.

It really looks like you've got lib or app memory corruption. ESPExceptionDecoder just needs to be unzipped in the ~/Arduino/tools dir, and it's really essential if you want to see what crash dumps are saying.

@earlephilhower earlephilhower changed the title Exceptions are broken due to PROGMEM char reads C++ Exceptions are broken due to PROGMEM char reads Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants